<div class="licence">
<span>Licence CC BY-NC-ND</span>
<span>François Rechenmann &amp; Thierry Parmentelat</span>
<span><img src="media/inria-25-alpha.png" /></span>
</div>

# Computing Hamming's distance

We are going to see how to compute Hamming's distance, which of course is a very simple algorithm. The only subtlety here is to accept an **optional** argument `length`. If this argument is provided by the caller, we compute the Hamming's distance on a segment of that size. Otherwise, by convention we take the smaller of both lengths.

In [None]:
# Hamming's distance
def hamming_distance(dna1, dna2, length=None):
    if length is None:
        length = min(len(dna1), len(dna2))
    distance = 0
    for i in range(length):
        if dna1[i] != dna2[i]:
            distance += 1 
    return distance

Which gives us, first with the samples from the video:

In [None]:
a1 = "ACCTCTGTATCTATTCGGCATCATCAT"
a2 = "ACCTCTGAATCTATTCGGGATCATCAT"
#            ^          ^

hamming_distance(a1, a2)

And with the second sample:

In [None]:
b1 = "ACCTCTGTATCTATTCGGGATCATCAT"
b2 = "ACCTCTGAATCTATCCGGGATCATGAT"
#            ^      ^         ^

hamming_distance(b1, b2)

### Comment (optional)

For those of you who are more familiar with python, I want to outline that a more *pythonic* version would read:

In [None]:
def hamming_distance_2(dna1, dna2, length = None):
    if length is None:
        length = min(len(dna1), len(dna2))
    return sum( n1 != n2 for n1, n2 in zip(dna1[:length], dna2[:length]))

In this version we use the following features:

 * the `zip` function allows to scan 2 lists in parallel, see [the online documentation](https://docs.python.org/2/library/functions.html#zip),
 * the fact that `True` and `False` are actually implemented as the `1` and `0` integers, respectively, 
 * and finally `sum` that allows to compute the sum of a a sequence of results, [like it is explained here](https://docs.python.org/2/library/functions.html#sum)

In [None]:
# Checking
hamming_distance_2(a1, a2) == hamming_distance(a1, a2)

In [None]:
hamming_distance_2(b1, b2) == hamming_distance(b1, b2)