# TheFuzz Library Demo

This notebook serves as a lab environment for testing and exploring the functionalities of the **[TheFuzz](https://github.com/seatgeek/thefuzz)** Python library (formerly known as *fuzzywuzzy*).  
TheFuzz provides powerful tools for **fuzzy string matching**, enabling comparison of strings based on similarity ratios rather than exact matches.

In this lab, I will experiment with different use cases, including:
- Basic string similarity comparisons
- Partial and token-based matching
- Extracting best matches from a list of choices
- Custom scoring functions and threshold tuning

---

ðŸ“Œ This notebook is intended for learning, experimentation, and validating how TheFuzz behaves with different types of string data.

---

## Packages installation and import

In [1]:
# !pip install thefuzz faker

In [2]:
from faker import Faker
from thefuzz import fuzz, process

## TheFuzzy Matching

In [3]:
phrase1 = "Adobe Systems, Inc"
phrase2 = "Adobe Systems, Corporation"

In [4]:
scores: dict = {}

In [5]:
scores["ratio"]                    = fuzz.ratio("Adobe Systems, Inc", "Adobe Systems, Corporation")
scores["partial ratio"]            = fuzz.partial_ratio("Adobe Systems, Inc", "Adobe Systems, Corporation")
scores["token sort ratio"]         = fuzz.token_sort_ratio("Adobe Systems, Inc", "Adobe Systems, Corporation")
scores["token set ratio"]          = fuzz.token_set_ratio("Adobe Systems, Inc", "Adobe Systems, Corporation")
scores["partial token sort ratio"] = fuzz.partial_token_sort_ratio("Adobe Systems, Inc", "Adobe Systems, Corporation")
scores["partial token set ratio"]  = fuzz.partial_token_set_ratio("Adobe Systems, Inc", "Adobe Systems, Corporation")

In [6]:
scores

{'ratio': 73,
 'partial ratio': 91,
 'token sort ratio': 76,
 'token set ratio': 87,
 'partial token sort ratio': 74,
 'partial token set ratio': 100}

## TheFuzz Process

### Create sample data

In [7]:
sample_size: int = 10

In [8]:
fkr = Faker()

In [9]:
names: list = []

for i in range(0, sample_size):
    names.append(fkr.name())

In [10]:
for name in names:
    print(name)

Ronald Moore
Ryan Holloway
Daniel Morales
Christina Eaton
Amy Leach
Michelle Collier
Gene Davis
Patrick Thompson
Daniel Mueller
Bryan Owens


### Fuzzy Matchings

#### Extract

In [22]:
name_to_extract = "Patrik"

In [23]:
process.extract(name_to_extract, names)

[('Patrick Thompson', 82),
 ('Daniel Morales', 36),
 ('Christina Eaton', 36),
 ('Gene Davis', 36),
 ('Daniel Mueller', 36)]

In [24]:
process.extract(name_to_extract, names, scorer=fuzz.token_sort_ratio)

[('Patrick Thompson', 55),
 ('Gene Davis', 25),
 ('Ryan Holloway', 21),
 ('Daniel Morales', 20),
 ('Daniel Mueller', 20)]

In [25]:
process.extract(name_to_extract, names, scorer=fuzz.token_set_ratio)

[('Patrick Thompson', 55),
 ('Gene Davis', 25),
 ('Ryan Holloway', 21),
 ('Daniel Morales', 20),
 ('Daniel Mueller', 20)]

In [26]:
process.extract(name_to_extract, names, scorer=fuzz.partial_token_sort_ratio)

[('Patrick Thompson', 91),
 ('Daniel Morales', 40),
 ('Christina Eaton', 40),
 ('Gene Davis', 40),
 ('Daniel Mueller', 40)]

In [27]:
process.extract(name_to_extract, names, scorer=fuzz.partial_token_set_ratio)

[('Patrick Thompson', 91),
 ('Daniel Morales', 40),
 ('Christina Eaton', 40),
 ('Gene Davis', 40),
 ('Daniel Mueller', 40)]

#### ExtractOne

In [28]:
process.extractOne(name_to_extract, names)

('Patrick Thompson', 82)

In [29]:
process.extractOne(name_to_extract, names, scorer=fuzz.token_sort_ratio)

('Patrick Thompson', 55)

In [30]:
process.extractOne(name_to_extract, names, scorer=fuzz.token_set_ratio)

('Patrick Thompson', 55)

In [31]:
process.extractOne(name_to_extract, names, scorer=fuzz.partial_token_sort_ratio)

('Patrick Thompson', 91)

In [32]:
process.extractOne(name_to_extract, names, scorer=fuzz.partial_token_set_ratio)

('Patrick Thompson', 91)