<a href="https://colab.research.google.com/github/simodepth/Keyword-Research/blob/main/Keyword_Research_Autosuggest_%2B_Clustering_%F0%9F%86%8E.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Make a Keyword Research in 5 minutes using Google Suggest**

---


Unlikely the [Google Autosuggest Python scraper](https://colab.research.google.com/drive/12vqI1SlcbNjGywVgzvQFPekVhHtxN_ar#scrollTo=BsqHNPuHHcZ1), this Python framework makes a step forward as it clusters Google Autosuggest scraped results.

After quickly generating keyword ideas from Google Suggest, the script will use the `nltk` python package- the official NLP library - to read  the **unstructured data** that we provide via our search queries and cluster them in human-readable text.

## Step 1: Change Settings

Fill in the field lang_code with your language code (e.g. en, fr, es, nl) and enter up to 5 seed keywords you want to use.



In [None]:
#language code and keywords
lang_code="en"#@param {type:"string"}
keyword1="retirement properties for sale" #@param {type:"string"}
keyword2="retirement properties for rent" #@param {type:"string"}
keyword3="care home" #@param {type:"string"}
keyword4="retirement villages" #@param {type:"string"}
keyword5="" #@param {type:"string"}
keyword5="bolton at home council bungalows to rent" 
keyword5="council bungalows for rent near me" 
keyword5="council bungalows to rent in bridlington" 
keyword5="council bungalows to rent in scarborough" 
keyword5="council bungalows to rent in sunderland" 
keyword5="council bungalows to rent near me" 
keyword5="extra care housing" 
keyword5="housing association bungalows to rent in hull" 
keyword5="independent living" 
keyword5="new build retirement bungalows" 
keyword5="over 55 bungalows for sale in doncaster" 
keyword5="over 55 housing" 
keyword5="over 55 retirement bungalows" 
keyword5="over 55 retirement bungalows for sale" 
keyword5="over 55 retirement bungalows to buy near me" 
keyword5="retirement bungalows" 
keyword5="retirement properties" 
keyword5="retirement bungalows for sale near me" 
keyword5="retirement flats for sale" 
keyword5="retirement homes" 
keyword5="retirement homes for sale" 
keyword5="retirement property for sale" 
keyword5="retirement villages" 
keyword5="sheltered accommodation" 
keyword5="sheltered accommodation liverpool" 
keyword5="sheltered housing" 
keyword5="sheltered housing for the elderly to rent" 
keyword5=""
keyword5=""  

## Step 2: Run The Code

In [None]:
#@title Generate keyword list
keywords=[keyword1,keyword2,keyword3,keyword4,keyword5]
keywordlist = list(filter(None, keywords))
keywordlist

['retirement properties for sale',
 'retirement properties for rent',
 'care home',
 'retirement villages']

In [None]:
#@title Import modules
import pandas as pd
import requests
import json
import time
import string
import nltk
nltk.download('punkt')
!pip install stop_words
from stop_words import get_stop_words
from google.colab import files
%load_ext google.colab.data_table
from collections import Counter
from json import loads

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting stop_words
  Downloading stop-words-2018.7.23.tar.gz (31 kB)
Building wheels for collected packages: stop-words
  Building wheel for stop-words (setup.py) ... [?25l[?25hdone
  Created wheel for stop-words: filename=stop_words-2018.7.23-py3-none-any.whl size=32911 sha256=296c4949bb4239e58cc14fd86434b60488e6c006a7ef9849f77e85cdfc6e1559
  Stored in directory: /root/.cache/pip/wheels/fb/86/b2/277b10b1ce9f73ce15059bf6975d4547cc4ec3feeb651978e9
Successfully built stop-words
Installing collected packages: stop-words
Successfully installed stop-words-2018.7.23


In [None]:
#@title Make a list of letters to use for Google Suggest
letterlist=["a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,t,u,v,w,x,y,z"] #@param {type:"string"}
letterlist=letterlist+list(string.ascii_lowercase)

In [None]:
#@title Prompt Google Suggest to merge keywords with letters
keywordsuggestions=[]
for keyword in keywordlist: 
  for letter in letterlist :
    URL="http://suggestqueries.google.com/complete/search?client=firefox&hl="+str(lang_code)+"&q="+keyword+" "+letter
    headers = {'User-agent':'Mozilla/5.0'} 
    response = requests.get(URL, headers=headers) 
    result = json.loads(response.content.decode('utf-8'))
    keywordsuggest=[keyword,letter] 
    for word in result[1]:
      if(word!=keyword):
        keywordsuggest.append(word)
    time.sleep(1) #slowdown keywords combination to let sleeping dogs lie with Google
    keywordsuggestions.append(keywordsuggest)
#create a dataframe from this list
keywordsuggestions_df = pd.DataFrame(keywordsuggestions)

In [None]:
#@title Rename columns of dataframe
columnnames=["Keyword","Letter"]
for i in range(1,len(keywordsuggestions_df.columns)-1):
  columnnames.append("Suggestion"+str(i))
keywordsuggestions_df.columns=columnnames

In [None]:
#@title Make a list of all suggestions
allkeywords = keywordlist
for i in range(1,len(keywordsuggestions_df.columns)-1):
  suggestlist = keywordsuggestions_df["Suggestion"+str(i)].values.tolist()
  for suggestion in suggestlist:
    allkeywords.append(suggestion)

In [None]:
#@title Exclude stopwords and seed keywords from this list
stop_words=get_stop_words(lang_code)
wordlist=[]
seed_words=[]
for keyword in keywords:
   for seed_word in nltk.word_tokenize(str(keyword).lower()):
     if(len(seed_word)>0):
       seed_words.append(seed_word)
for keyword in allkeywords:
   words = nltk.word_tokenize(str(keyword).lower()) 
   #word tokenizer
   for word in words:
     if(word not in stop_words and word not in seed_words and len(word)>1):
      wordlist.append(word)

In [None]:
#@title find the most common words in the suggestions
most_common_words= [word for word, word_count in Counter(wordlist).most_common(200)]

In [None]:
#@title Assign each suggestion to a common keyword
clusters=[]
for common_word in most_common_words:
    for keyword in allkeywords:
      if(common_word in str(keyword)):
         clusters.append([keyword,common_word])
clusterdf = pd.DataFrame(clusters,columns=['Keyword', 'Cluster'])

## Step 3: End Result

In [None]:
#@title Get clustered keyword and download the dataframe 
clusterdf.to_csv("keywords_clustered.csv")
files.download("keywords_clustered.csv") 
clusterdf

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Unnamed: 0,Keyword,Cluster
0,retirement homes for sale queensland,homes
1,retirement homes for sale ipswich qld,homes
2,retirement homes for sale victoria,homes
3,retirement homes for rent perth,homes
4,retirement homes for rent scotland,homes
...,...,...
1761,retirement properties for sale lincolnshire,lincolnshire
1762,retirement properties for sale tunbridge wells,tunbridge
1763,retirement properties for sale tunbridge wells,tunbridge
1764,retirement properties for sale tunbridge wells,wells
