# Overlaps and Edit Distance - An Analysis



> In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.

*[Edit Distance - Wikipedia](https://en.wikipedia.org/wiki/Edit_distance)*



In [1]:
from Py.geneReader import geneReader

filename = 'SeqFiles/chr1.GRCh38.excerpt.fasta'

data = open ( filename, 'r' )

reads = geneReader ( filename )

data.close ()

In [6]:
# %%time

from Py.editDistance import editDistance

import numpy as np



p = "GCGTATGC"

t = "TATTGGCTATACGGTT"

edm = np.matrix(editDistance ( t, p ) )

edm = edm.view()

print ( edm )

[[ 0  1  2  3  4  5  6  7  8]
 [ 1  1  2  3  3  4  5  6  7]
 [ 2  2  2  3  4  3  4  5  6]
 [ 3  3  3  3  3  4  3  4  5]
 [ 4  4  4  4  3  4  4  4  5]
 [ 5  4  5  4  4  4  5  4  5]
 [ 6  5  5  5  5  5  5  5  5]
 [ 7  6  5  6  6  6  6  6  5]
 [ 8  7  6  6  6  7  6  7  6]
 [ 9  8  7  7  7  6  7  7  7]
 [10  9  8  8  7  7  6  7  8]
 [11 10  9  9  8  7  7  7  8]
 [12 11 10 10  9  8  8  8  7]
 [13 12 11 10 10  9  9  8  8]
 [14 13 12 11 11 10 10  9  9]
 [15 14 13 12 11 11 10 10 10]
 [16 15 14 13 12 12 11 11 11]]


In [3]:
''' %%time 

a = 'shake spea'

b = 'Shakespear'

print ( editDistance ( a, b ) ) '''

" %%time \n\na = 'shake spea'\n\nb = 'Shakespear'\n\nprint ( editDistance ( a, b ) ) "

In [None]:
'''from Py.approximate_match import approximate_match

p = "GCGTATGC"

t = "TATTGGCTATACGGTT"

n = 0

print ( approximate_match ( p, t, n ) )
'''