Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Damerau–Levenshtein distance #62

Closed
vmstarchenko opened this issue Mar 21, 2024 · 1 comment
Closed

Damerau–Levenshtein distance #62

vmstarchenko opened this issue Mar 21, 2024 · 1 comment
Labels
question Further information is requested

Comments

@vmstarchenko
Copy link

Hello.

Is it possible to calculate distance with symbol swaps using this library?
https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance

If this is not possible now, will it be difficult to add such a feature?

Some implementation of this functionality:

def damerau_levenshtein_distance(s1, s2):
    d = {}
    lenstr1 = len(s1)
    lenstr2 = len(s2)
    for i in range(-1,lenstr1+1):
        d[(i,-1)] = i+1
    for j in range(-1,lenstr2+1):
        d[(-1,j)] = j+1
 
    for i in range(lenstr1):
        for j in range(lenstr2):
            if s1[i] == s2[j]:
                cost = 0
            else:
                cost = 1
            d[(i,j)] = min(
                           d[(i-1,j)] + 1, # deletion
                           d[(i,j-1)] + 1, # insertion
                           d[(i-1,j-1)] + cost, # substitution
                          )
            if i and j and s1[i] == s2[j-1] and s1[i-1] == s2[j]:
                d[(i,j)] = min(d[(i,j)], d[i-2,j-2] + 1) # transposition
 
    return d[lenstr1-1,lenstr2-1]
@maxbachmann maxbachmann added the question Further information is requested label Mar 24, 2024
@maxbachmann
Copy link
Member

maxbachmann commented Mar 24, 2024

You can directly use the implementation in RapidFuzz which is the basis for this library. The wikipedia describes both the damerau levenshtein distance and the optimal string alignment distance. Both of them are available:

from rapidfuzz.distance import DamerauLevenshtein, OSA

OSA.distance(s1, s2)
DamerauLevenshtein.distance(s1, s2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants