<a href="https://colab.research.google.com/github/lazarinastoy/quick-fire-automations/blob/main/fuzzymatching_match_keywords_to_a_list_of_seed_terms_or_main_terms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Step 1: Install Required Libraries
The first part of the code installs three necessary libraries:

fuzzywuzzy: A library used for fuzzy string matching, which compares how similar two strings are.
pandas: A library for data manipulation, particularly useful for reading, writing, and working with CSV files.
python-Levenshtein: A performance optimization for fuzzywuzzy that speeds up string comparisons.
## Step 2: Import Libraries
This section imports the libraries that will be used throughout the script:

fuzz from fuzzywuzzy is used for calculating the fuzzy match score between two strings.
pandas is used for loading the CSV files and handling the data as DataFrames.
files from google.colab is used to upload files into the Colab environment.
BytesIO is used to read the uploaded CSV files as in-memory file-like objects.
## Step 3: Upload CSV Files
The code prompts the user to upload two CSV files:

Seed Keywords CSV (seed.csv): This file contains the 100 seed keywords you want to match against.
Match Keywords CSV (match.csv): This file contains the keywords that you want to match with the seed keywords. The goal is to find the closest matching seed keyword for each keyword in this file.
## Step 4: Read the Seed Keywords CSV File
Once the files are uploaded, the code reads the seed.csv file using pandas. It loads the seed keywords into a DataFrame, which is a table-like structure in pandas. The first few rows of the DataFrame are displayed to ensure the file is loaded correctly.

## Step 5: Read the Match Keywords CSV File
Similarly, the code reads the match.csv file and loads the match keywords into a DataFrame. Again, the first few rows are displayed to confirm that the file has been loaded correctly.


## Step 6: Extract Keywords into Lists
After loading the CSV files, the code extracts the actual keyword data:

The seed keywords are stored as a list in the variable seed_keywords.
The match keywords are stored as a list in the variable match_keywords.

## Step 7: Define the Matching Function
The code defines a function called find_best_match, which compares a single match keyword to all seed keywords to find the closest match. It does this by calculating a similarity score using the fuzzy matching algorithm provided by fuzzywuzzy. The function returns the best matching seed keyword along with the similarity score.


## Step 8: Apply Fuzzy Matching for Each Match Keyword
For each keyword in the match_keywords list, the code uses the find_best_match function to find the best matching seed keyword from the seed_keywords list. The results, including the match keyword, the best seed keyword, and the similarity score, are stored in a list called matches.


## Step 9: Create a DataFrame with Results
The list of matches is converted into a new pandas DataFrame. This DataFrame contains three columns:

Match Keyword: The original keyword from match.csv.
Best Seed Keyword: The closest matching seed keyword from seed.csv.
Match Score: The similarity score between the match keyword and the best seed keyword.

## Step 10: Save the Results to a New CSV File
The results are saved as a new CSV file named fuzzy_matching_result.csv. This file contains the matched keywords and their scores.


## Step 11: Provide the Result File for Download
Finally, the code makes the resulting CSV file available for download, allowing the user to download the file containing the matched keywords and their similarity scores.

This process ensures that each keyword in the match.csv is matched to the most similar keyword from the seed.csv using fuzzy string matching, with the results saved for further analysis or use.


In [None]:
# Install required libraries
!pip install fuzzywuzzy
!pip install pandas
!pip install python-Levenshtein

# Import necessary libraries
from fuzzywuzzy import fuzz
import pandas as pd
from google.colab import files
from io import BytesIO

# Upload the 'seed.csv' (seed keywords) and 'match.csv' (keywords to be matched)
uploaded = files.upload()  # Upload seed.csv and match.csv files

# Read the uploaded seed keywords CSV file (seed.csv)
seed_file = next(iter(uploaded.values()))
seed_keywords_df = pd.read_csv(BytesIO(seed_file))
print("Seed Keywords CSV:")
print(seed_keywords_df.head())  # Show first few rows of seed keywords

# Extract the seed keywords into a list
seed_keywords = seed_keywords_df['Keywords'].tolist()

# Upload the match keywords CSV file (match.csv)
uploaded_files = files.upload()
match_file = next(iter(uploaded_files.values()))
match_keywords_df = pd.read_csv(BytesIO(match_file))
print("Match Keywords CSV:")
print(match_keywords_df.head())  # Show first few rows of match keywords

# Extract the match keywords into a list
match_keywords = match_keywords_df['Keywords'].tolist()

# Function to find the best fuzzy match for each keyword
def find_best_match(seed_keyword, keyword_list):
    best_match = None
    highest_score = 0
    for keyword in keyword_list:
        score = fuzz.partial_ratio(seed_keyword.lower(), keyword.lower())  # Fuzzy matching score
        if score > highest_score:
            highest_score = score
            best_match = keyword
    return best_match, highest_score

# For each match keyword, find the best match from seed keywords
matches = []
for match in match_keywords:
    matched_seed_keyword, score = find_best_match(match, seed_keywords)  # Find best match from seed
    matches.append((match, matched_seed_keyword, score))

# Create a DataFrame with the results
result_df = pd.DataFrame(matches, columns=['Match Keyword', 'Best Seed Keyword', 'Match Score'])

# Display the resulting matches
print(result_df.head())  # Show first few rows of results

# Save the result to a new CSV file
result_df.to_csv('fuzzy_matching_result.csv', index=False)

# Provide the file for download
files.download('fuzzy_matching_result.csv')




Saving seed.csv to seed (1).csv
Seed Keywords CSV:
             Keywords
0           sme loans
1  business financing
2       startup loans
3          term loans
4         micro loans


Saving match.csv to match.csv
Match Keywords CSV:
                    Keywords
0     export trade financing
1  types of export factoring
2      export credit finance
3       export trade finance
4     trade credit insurance
               Match Keyword        Best Seed Keyword  Match Score
0     export trade financing  export-import financing           73
1  types of export factoring         export financing           81
2      export credit finance  export-import financing           71
3       export trade finance  export-import financing           70
4     trade credit insurance   trade credit financing           82


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>