# Computing the Parity of a Word

The parity of a binary [word](https://www.youtube.com/watch?v=Weyv-V8xz0c) 
is 1 if the number of 1s in the word is odd; otherwise, it is 0. For example, 
the parity of 1011 is 1, and the parity of 10001000 is 0. Parity checks are 
used to detect single bit errors in data storage and communication. It is fairly
straightforward to write code that computes the parity of a single 64-bit word.

**How would you compute the parity of a very large number of 64-bit words?**

## Brute Force Algorithm

We can iteratively test the value of each bit while tracking the number of 1s seen
so far.  Since we only care if the number of 1s is even or odd, we can store the 
number mod 2.

In [38]:
import random
import uuid
import os
import secrets

random_64_bit_word = random.getrandbits(64)
uuid_64_bit_word = uuid.uuid4().int & (1<<64)-1
urandom_64_bit_word = int.from_bytes(os.urandom(8), byteorder='big')
secrets_64_bit_work = secrets.randbelow(2**64)

def parity(x):
    result = 0
    while x:
        result ^= x & 1
        x >>= 1
    return result

example_inputs = []

example_inputs.append(random_64_bit_word)
example_inputs.append(uuid_64_bit_word)
example_inputs.append(urandom_64_bit_word)
example_inputs.append(secrets_64_bit_work)

def print_outputs(inputs):
    result = []
    for i in inputs:
        result.append("input integer: %i parity: %i" % (e, parity(e)))
    return result
        
print_outputs(example_inputs)

['input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1']

The time complexity is *O(n)*, where *n* is the word size.

## Improving Performance

The first improvement is based on erasing the lowest set bit in a word in a single
operation, thereby improving performance in the best- and average-cases.

### Bit fiddling Trick

`x &(x - 1)` equals `x` with its lowest set bit erased.

You can find the lowest set bit using `x & (~x + 1):

```
    x: 01101100
 ~x+1: 10010100
       --------
       00000100
```

Dropping the lowest set bit will then become `x & ~(x & (~x + 1))`:

```
          x: 01101100
~(x&(~x+1)): 11111011
             --------
             01101000
```

This is equivalent to `x &(x - 1)` which is clearer and easier to use.

In [34]:
def parity(x):
    result = 0
    while x:
        result ^= 1
        x &= x - 1   # Drips the lowest set bit of x.
    return result

print_outputs(example_inputs)

['input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1']

And when comparing the times of each version of the `parity()` function:

In [66]:
import timeit

brute_force_parity = """
import uuid

uuid_64_bit_word = uuid.uuid4().int & (1<<64)-1

def parity(x):
    result = 0
    while x:
        result ^= x & 1
        x >>= 1
    return result
    
parity(uuid_64_bit_word)
"""

drop_lowest_set_bit_parity = """
import uuid

uuid_64_bit_word = uuid.uuid4().int & (1<<64)-1

def parity(x):
    result = 0
    while x:
        result ^= 1
        x &= x - 1   # Drops the lowest set bit of x.
    return result

parity(uuid_64_bit_word)
"""

brute_force_result = timeit.timeit(brute_force_parity, number=10)
drop_lowest_set_bit_result = timeit.timeit(drop_lowest_set_bit_parity, number=10)
print("Brute force parity function time: {0} seconds".format(brute_force_result))
print("Drop lowest set bit parity function time: {0} seconds".format(drop_lowest_set_bit_result))
percent = drop_lowest_set_bit_result / brute_force_result
difference = (1 - percent) * 100
print("Drop lowest set bit is {:.2f}% faster than brute force.".format(difference))

Brute force parity function time: 0.00363140000263229 seconds
Drop lowest set bit parity function time: 0.00022650000755675137 seconds
Drop lowest set bit is 93.76% faster than brute force.


As you can see, the performance improvement can be quite dramatic.

## Lookup Table Solution

The problem statement refers to computing the parity for a very large number of 
words.  When you have to perform a large number of parity computations, two keys 
to performance are processing multiple bits at a time and caching results in an
array-based lookup table.

First, we demonstrate caching.  When computing the parity of a collection of
bits, it does not matter how we group those bits, ie the computation is
associative.  Therfore, we can compute the parity of a 64-bit integer by grouping
its bits into four nonoverlapping 16-bit subwords, computing the parity of each
subwords, and then computing the parity of these four subresults.  We choose 16
since 2<sup>16</sup> = 65536 is relatively small, which makes it feasible to cache
the parity of all 16-bit words using an array.  Furthermore, since 16 eventually
divides 64, the code is simpler than if we were, for example, to use 10 bit
subwords.

### Lookup table for 2-bit Words

The cache is (0,1,1,0).  These are the parities of (00), (01), (10), (11),
respectively.  To compute the parity of (11001010) we would compute the parities
of (11), (00), (10), (10).  By table lookup we see these are 0,0,1,1, respectively.
So, the finally result is the parity of 0,0,1,1 which is 0.

To look up the parity of the first two bits in (11101010), we right shift by 6 to
get (00000011), and use this as an index into the cache.  To lookup the parity of
the next two bits, ie (10), we right shift by 4 to get (10) in the two
least-significant bit places.  The right shift does not remove the leading (11).
It results in (00001110).  We cannot index the cache with this; it leads to an 
out-of-bounds access.  To get the last two bits after the right shift by 4, we
bitwise-AND(00001110) with (00000011).  This is the "mask" used to extract the
last two bits.  The result is (00000010).  Similar masking is needed for the two
other 2-bit lookups.

```python
def parity(x):
    MASK_SIZE = 16
    BIT_MASK = 0xFFFF
    return (PRECOMPUTED_PARITY[x >> (3 * MASK_SIZE)] ^
            PRECOMPUTED_PARITY[(x >> (2 * MASK_SIZE)) & BIT_MASK] ^
            PRECOMPUTED_PARITY[(x >> MASK_SIZE) & BIT_MASK] ^
            PRECOMPUTED_PARITY[x & BIT_MASK])
```

The time complexity is a function of the size of the keys used to index the lookup
table.  Let *L* be the width of the words for which we cache the results, and *n* 
the word size.  Since there are *n|L* terms, the time complexity is *O{n|L)*,
assuming word-level operations, such as shifting, taking *O(1)* time.

We can improve on the worst case time complexity of the previous algorithms by 
exploiting some simple properties of parity.  Specifically, the XOR of two bits 
is defined to be 0 if both bits are 0 or both bits are 1; otherwise it is 1.
XOR has the property of being associative , ie it does not matter how we group
bits; it is also commutative, ie the order in which we perform the XORs does
not change the result.  The XOR of a group of bits is its parity.  We can exploit
this fact to use the CPU's word-level XOR instruction to process multiple bits
at a time.

In [67]:
def parity(x):
    x ^= x >> 32
    x ^= x >> 16
    x ^= x >> 8
    x ^= x >> 4
    x ^= x >> 2
    x ^= x >> 1
    return x & 0x1

print_outputs(example_inputs)



['input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1',
 'input integer: 9874610276674263814 parity: 1']

The time complexity is O(log n), where n is the word size.  The actual runtime
depends on the input data, eg the refinement of the brute force algorithm is
very fast on sparse inputs.  However, for random inputs, the refinement of the
brute force is roughly 20% faster than the brute force algorithm.  The table based
approach is four times faster still, and using associativity reduces run time by
another factor of two.

[References](../reference/4.1.md)
