# Algorithms

## Hamming Distance

Write a function that computes the hamming distance between two iterables. 
The hamming distance is defined as the number of positions where the symbols are different.
https://en.wikipedia.org/wiki/Hamming_distance

* "karolin" and "kathrin" is 3.
* "karolin" and "kerstin" is 3.
* 1011101 and 1001001 is 2.
* 2173896 and 2233796 is 3.

#### Environment
* Developed with Python 3.5.5, installed via Anaconda

### Thought process
* Haven't used typing before, but have seen it mentioned in some python articles. Now is a good time to try it out!
* Looks like an OR type operation might be easiest for a computer to do
* Converting from an iterable to a long binary representation is probably not intended
    * Might also lead to some interesting bit problems if the iterable is quite long
* I like the idea of trying an OR operation on bunches of binary, depending on the computer architecture
* Going to start writing some level of solution now and see where it goes
* Made a basic implementation which compares variables directly. Comprehensions will make it more concise but less readable, and sometimes harder to modify.
    * Reduced the code a little through a comprehension;
    * Checked if True be reliably considered as a 1: https://stackoverflow.com/questions/2764017/is-false-0-and-true-1-in-python-an-implementation-detail-or-is-it-guarante
* Looked up XOR of bytestring - https://stackoverflow.com/a/15106386 - but would end up running more operations on the data. If bigger data was being tested - maybe use numpy to do some faster comparisons, or write code in C to manipulate the binary representations of the data. That being said, maybe streaming would be worth checking out. 

#### Prerequisite imports and declarations

In [2]:
import typing

try:
    from typecheck_magic import typecheck
except:
    raise Exception("Can't import the typecheck magic for Jupyter")

#### The solution

In [5]:
def hamming_distance(vector1: typing.Iterable, vector2: typing.Iterable) -> int:
    if len(vector1) != len(vector2):
        raise Exception("Inputs lengths are not equal - can't determined hamming distance")
    return sum([int(x != y) for x, y in zip(vector1, vector2)])

# def hamming_distance(vector1: typing.Iterable, vector2: typing.Iterable) -> int:
#     if len(vector1) != len(vector2):
#         raise Exception("Inputs lengths are not equal - can't determined hamming distance")
#     hamming_dist = 0
#     for x, y in zip(vector1, vector2):
#         if x != y: # integer rep of True is 1, False is 0
#             hamming_dist += 1
#     return hamming_dist

#### Test/Time the solution
I was timing the solution to get a baseline for performance, which would be used when writing a batch XORing. In the end, I didn't go any further as conceptually it would be slower (at least in Python).

In [4]:
%%timeit
assert hamming_distance([],[]) == 0
assert hamming_distance([0,1],[0,1]) == 0
assert hamming_distance("00","01") == 1
assert hamming_distance("karolin", "kathrin") == 3
assert hamming_distance("karolin", "kerstin") == 3
assert hamming_distance((1,0,1,1,1,0,1), (1,0,0,1,0,0,1)) == 2
assert hamming_distance("2173896", "2233796") == 3

9.76 µs ± 278 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
