# Fuzzy matching demonstration
This notebook quickly demonstrates a few possibilities to perform approximate string matching.

We'll use rapidfuzz, a modern, MIT-licensed and well-maintained library for fuzzy matching

In [1]:
import rapidfuzz

### Define a sample list of candidate values to query

In [2]:
candidates = [
    "Abraham Lincoln",
    "John Smith Doe",
    "John Sullivan Doe",
    "Jack Sparrow",
    "Jack Spartan Doe",
    "Alberto Einstein",
    "Alberto Zweistein",
    "Zweistein Alberto",
    "wr0ng n4m3",
    "some_string"
]

### Find matches for the inexact input "John S Doe"

In [11]:
# Find the top 3 matches. 
# Print value, match score and index for each match
rapidfuzz.process.extract("John S Do", candidates, limit=3)

[('John Smith Doe', 85.5, 1),
 ('John Sullivan Doe', 85.5, 2),
 ('Jack Spartan Doe', 51.42857142857142, 4)]

In [7]:
# Find matches with a score above 75%
rapidfuzz.process.extract("John S Do", candidates, score_cutoff=75)

[('John Smith Doe', 85.5, 1), ('John Sullivan Doe', 85.5, 2)]

### Find matches for the inexact input "Albert Zweistein"

In [12]:
# Find matches with a score above 90%
rapidfuzz.process.extract("Albert Zweistein", candidates, score_cutoff=90)

[('Alberto Zweistein', 96.96969696969697, 6),
 ('Zweistein Alberto', 92.12121212121211, 7)]

### Find matches for the inexact input "Albert Zweistein" with the Levenshtein metric
Note that we could define any custom metric instead

In [23]:
rapidfuzz.process.extract("Albert Zweistein", candidates, limit=2, scorer=rapidfuzz.string_metric.Levenshtein.normalized_distance)

[('Alberto Zweistein', 0.058823529411764705, 6), ('Alberto Einstein', 0.25, 5)]