Fuzzy logic string matching is a technique used to compare strings based on their similarity rather than an exact match. Python provides several libraries that can be used to perform fuzzy string matching, such as fuzzywuzzy and pyxDamerauLevenshtein. I'll demonstrate how to use the fuzzywuzzy library for fuzzy string matching in Python.

fuzzywuzzy library:

ratio():

The **ratio()** method calculates the simple ratio of similarity between two strings based on the Levenshtein distance algorithm.
It returns an integer value between 0 and 100, where a higher value indicates a greater similarity.
This method compares the entire strings character by character and doesn't consider word order or partial matches.
partial_ratio():<br>

**The partial_ratio()** method calculates the ratio of similarity by considering the best partial match between two strings.
It looks for the best match of a shorter string within a longer string.
It returns an integer value between 0 and 100, where a higher value indicates a better partial match.
token_sort_ratio():<br>

**The token_sort_ratio()** method calculates the ratio of similarity by tokenizing the strings and sorting the tokens alphabetically.
It compares the sorted token lists to find the similarity.
This method is useful when the order of words doesn't matter.
It returns an integer value between 0 and 100, where a higher value indicates a higher similarity.
token_set_ratio():<br>

**The token_set_ratio()** method calculates the ratio of similarity by tokenizing the strings and finding the intersection and union of the tokens.
It compares the common tokens between the strings with all the unique tokens in both strings.
This method handles cases where the order of words may differ and allows for partial matches.
It returns an integer value between 0 and 100, where a higher value indicates a higher similarity.
These methods are helpful for comparing and measuring the similarity between strings. Depending on the specific requirements and characteristics of the string data, you can choose the appropriate method to achieve the desired matching behavior.<br>

It's worth noting that these methods are part of the fuzzywuzzy library, which is no longer actively maintained. Consider using alternatives like the rapidfuzz library, which offers similar functionality with improved performance and active maintenance.<br>

## Some random generated names, having two or three words

In [3]:
import pandas as pd
df = pd.read_csv('customer_data.csv')
df

Unnamed: 0,ORIGINAL_CUST_NAME,SIMILAR_CUST_NAME
0,RAJESH KUMAR,RAJISH KAMAR
1,PRIYA GUPTA,PRIA GUPTA
2,AMIT SINGH,AMIT KUMAR
3,DEEPAK SHARMA,DIPAK SHARMA
4,SNEHA VERMA,SNEHA VARMA
5,MANISH CHAWLA,MANISH CHAWLA
6,ANJALI REDDY,MANJALI REDDY
7,VIKRAM PATEL,VIKRAM PATIL
8,NEHA MALHOTRA,NIHA MALHOTRA
9,RAVI KHANNA,RAVI KHANA


In [8]:
# Libraries to install
# !pip install fuzzywuzzy
# !pip install python-Levenshtein

In [18]:
# Define a function to calculate similarity ratio
def calculate_similarity(row):
    return fuzz.ratio(row['ORIGINAL_CUST_NAME'], row['SIMILAR_CUST_NAME'])

# Define a function to calculate partial ratio
def calculate_partial_ratio(row):
    return fuzz.partial_ratio(row['ORIGINAL_CUST_NAME'], row['SIMILAR_CUST_NAME'])

# Calculate similarity using fuzzywuzzy's token sort ratio function
def calculate_token_sort_ratio(row):
    return fuzz.token_sort_ratio(row['ORIGINAL_CUST_NAME'], row['SIMILAR_CUST_NAME'])

# Calculate similarity using fuzzywuzzy's token set ratio function
def calculate_token_set_ratio(row):
    return fuzz.token_set_ratio(row['ORIGINAL_CUST_NAME'], row['SIMILAR_CUST_NAME'])

In [19]:
# Calculate similarity using fuzzywuzzy's partial ratio function
df['similarity_ratio'] = df.apply(calculate_similarity, axis=1)
df['partial_ratio'] = df.apply(calculate_partial_ratio, axis=1)
df['token_sort_ratio'] = df.apply(calculate_token_sort_ratio, axis=1)
df['token_set_ratio'] = df.apply(calculate_token_set_ratio, axis=1)

In [20]:
df

Unnamed: 0,ORIGINAL_CUST_NAME,SIMILAR_CUST_NAME,similarity_ratio,partial_ratio,token_sort_ratio,token_set_ratio
0,RAJESH KUMAR,RAJISH KAMAR,83,83,83,83
1,PRIYA GUPTA,PRIA GUPTA,95,90,95,95
2,AMIT SINGH,AMIT KUMAR,50,50,50,57
3,DEEPAK SHARMA,DIPAK SHARMA,88,83,88,88
4,SNEHA VERMA,SNEHA VARMA,91,91,91,91
5,MANISH CHAWLA,MANISH CHAWLA,100,100,100,100
6,ANJALI REDDY,MANJALI REDDY,96,100,96,96
7,VIKRAM PATEL,VIKRAM PATIL,92,92,92,92
8,NEHA MALHOTRA,NIHA MALHOTRA,92,92,92,92
9,RAVI KHANNA,RAVI KHANA,95,90,95,95
