<a href="https://colab.research.google.com/github/simodepth/Keyword-Research/blob/main/%F0%9F%A7%B9Query_Preprocessing_and_Spelling_Check.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Query Preprocessing and spelling Check


---


For each keyword in the list, we are going to run an Exploratory Data Analysis (EDA)aimed to detox low-value terms from large datasets

In particular, the following lines of code will do:

- **Singulars**: Create singular inflections of all words in the keyword
- **Isomers**: Order words in the phrase alphabetically
- **Spellcheck**: correct spelling of words in keywords

##Requirements


---
- A file containing Keywords that can either be:
  - A fresh list of queries from Google Autosuggest - that you can perform with Ecommercetools
  - A CSV file with **only** keywords listed in the first column



In [None]:
# install additional python packages

!pip install inflect
!pip install fuzzywuzzy
!pip install pyspellchecker
!pip install python-Levenshtein
!pip install ecommercetools


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import pandas as pd
import numpy as np
from ecommercetools import seo
import inflect
import pandas as pd
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from spellchecker import SpellChecker


In [None]:
#@title Scrape Google Autosuggest
suggestions = seo.google_autocomplete('coffee', include_expanded=True)
suggestions.to_csv('coffee.csv', index=False)
suggestions.head()

Unnamed: 0,term,relevance
0,coffee near me,1250
1,cheap coffee tables,1250
2,when coffee meets bagel,1250
3,why coffee makes you poop,1250
4,coffee bean,1250


In [None]:
#@title Upload the File and Create a Pandas Data frame
file_path = pd.read_csv("/content/coffee.csv")
df = pd.DataFrame(file_path,columns=['term','relevance'])
df.columns = ['keyword', 'relevance']
df.columns = df.columns.str.lower()    # change col headers to lower case for compatibility
df.head()


Unnamed: 0,keyword,relevance
0,coffee near me,1250
1,cheap coffee tables,1250
2,when coffee meets bagel,1250
3,why coffee makes you poop,1250
4,coffee bean,1250


In [None]:
p = inflect.engine()
spell = SpellChecker()

# functions to run on keywords
def singularise(keyword):
    s = ''
    keyword = str(keyword).split()
    for i in keyword:
        if p.singular_noun(i):
            s += f' {p.singular_noun(i)}'
        else:
            s += ' '+ i
    s = s.strip()
    return s

def isomerise(keyword):
    isomer = ' '.join(sorted(keyword.split(), key=str.lower))
    return isomer

def spellcheck(keyword):
    s = ''
    keyword = str(keyword).split()
    for i in keyword:
        s += f" {spell.correction(i)}"
    s = s.strip()
    return s

# remove keywords that've already been done
# (in case of error on previous runthrough)
try:
    incomplete = df[df["singular"].isin([np.nan])]
except KeyError:
    incomplete = df.copy()

# iterate through rows to get singulars, isomers and spellcheck for all keywords
for i in df.index:
    keyword = str(df["keyword"][i].strip())
    print(f"{i+1} of {len(df)} -", keyword)

    df.at[i, "keyword"] = keyword
    df.at[i, "singular"] = singularise(keyword)
    df.at[i, "isomer"] = isomerise(keyword)
    df.at[i, "spellcheck"] = spellcheck(keyword)
    if df["keyword"][i] == df["spellcheck"][i]:
        df.at[i,"spellcheck match"] = "yes"
    else:
        df.at[i,"spellcheck match"] = "no"

# save file
df.to_csv('keyword_normalization.csv', index=False)

print('DONE!')

1 of 336 - coffee near me
2 of 336 - cheap coffee tables
3 of 336 - when coffee meets bagel
4 of 336 - why coffee makes you poop
5 of 336 - coffee bean
6 of 336 - coffee jelly
7 of 336 - coffee near me
8 of 336 - coffee quotes
9 of 336 - coffee shops near me
10 of 336 - coffee x change
11 of 336 - coffee zone
12 of 336 - when coffee and kale compete
13 of 336 - how coffee is made
14 of 336 - coffee house
15 of 336 - coffee table
16 of 336 - coffee places near me
17 of 336 - coffee urn
18 of 336 - worst coffee creamer
19 of 336 - coffee underground
20 of 336 - why coffee is bad for you
21 of 336 - coffee open near me
22 of 336 - coffee zone menu
23 of 336 - coffee klatch
24 of 336 - coffee jokes
25 of 336 - coffee liqueur
26 of 336 - coffee emoji
27 of 336 - cheap coffee maker
28 of 336 - cheap coffee near me
29 of 336 - coffee house near me
30 of 336 - worst coffee in the world
31 of 336 - is coffee good for you
32 of 336 - coffee jelly recipe
33 of 336 - coffee vs espresso
34 of 336 -