In [2]:
from IPython.display import Image

---------------------
#### Hamming distance
--------------------------

- Compute the `Hamming distance` between two 1-D arrays.

- The `Hamming distance` between 1-D arrays u and v, is simply the proportion of disagreeing components in u and v.

- In information theory, the `Hamming distance` between two strings of equal length is the number of positions at which the corresponding symbols are different. 

    - In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.
    
The Hamming distance between two strings is the number of positions at which they have different characters.

For instance, compare these strings:

- GATTACA
- GACTATA

They are different at two locations:

|G|A|T|T|A|C|A|
|-|-|-|-|-|-|-|
|G|A|C|T|A|T|A|


So the Hamming distance is 2.

In [17]:
# Return the Hamming distance between string1 and string2.
# string1 and string2 should be the same length.
def hamming_distance(string1 , string2): 
    
    # check if the lengths of strings equal, 
    # if not return with Error
    if len(string1) != len(string2):
        return 'Given strings must have the same length'
    
    # Start with a distance of zero, and count up
    distance = 0
    
    # Loop over the indices of the string
    L = len(string1)
    
    for i in range(L):
        # Add 1 to the distance if these two characters are not equal
        if string1[i] != string2[i]:
            distance += 1
            
    # Return the final count of differences
    return distance

In [18]:
hamming_distance("GATTACA", "GACTATA")

2

In [19]:
hamming_distance("rja", "jfjfff")

'Given strings must have the same length'

#### using scipy



In [5]:
from scipy.spatial import distance

In [8]:
import pandas as pd

In [9]:
data_dict = {'num_doors':   pd.Series(data= [2, 4, 2, 2]),
             'num_cyl':     pd.Series(data= [2, 3, 4, 8]),
             'cruise_ctrl': pd.Series(data= [0, 0, 1, 1]),
             'price_cat':   pd.Series(data= [1, 2, 2, 4])
            }

In [10]:
df = pd.DataFrame(data_dict)
df

Unnamed: 0,num_doors,num_cyl,cruise_ctrl,price_cat
0,2,2,0,1
1,4,3,0,2
2,2,4,1,2
3,2,8,1,4


In [11]:
a = df.iloc[1].values
b = df.iloc[2].values

In [12]:
a

array([4, 3, 0, 2], dtype=int64)

In [13]:
distance.hamming(a, b)

0.75