<a href="https://colab.research.google.com/github/simodepth/Keyword-Research/blob/main/Google_Autosuggest_with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##This framework is property of [Simone De Palma](https://seodepths.com/about/).


## Make a copy to use it and if you find it helpful, you could [buy me a coffee](https://www.buymeacoffee.com/depasimo96) ☕️





#🚀 Run Google Autosuggest to find keywords ideas in bulk 


---

The following framework uses `urllib` and `requests_html` Python libraries to scrape Google Autosuggest and retrieve endless chunks of keywords combinations produced by term suffixes and content angles of your choice.




#Requirements & Assumptions

- Please note the framework is not set on a specific target location, so the output does not provide local results
- The output is messed with unwanted **punctuations**. You will be able to copy and paste the output onto a txt file, thus into a spreadsheet where you want to play around with filters to phase the punctuation.

**Should you find a way to clean up the output, please feel free to fork up the last part of the script!** 

In [1]:
!pip install requests_html

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [16]:
#@title Run Import Modules
import requests
import urllib
import json
import operator
import pandas as pd
from requests_html import HTML
from requests_html import HTMLSession
from urllib.parse import (parse_qsl, urlsplit)
import time
import string
import nltk
nltk.download('punkt')
!pip install stop_words
from stop_words import get_stop_words
from google.colab import files
%load_ext google.colab.data_table
from collections import Counter
from json import loads
import numpy as np


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
The google.colab.data_table extension is already loaded. To reload it, use:
  %reload_ext google.colab.data_table


#Set up connections with Google SERPs

In [3]:
def get_source(url):

    try:
        session = HTMLSession()
        response = session.get(url)
        return response
    except requests.exceptions.RequestException as e:
        print(e)

In [4]:
def get_results(query):
    query = urllib.parse.quote_plus(query)
    response = get_source("https://suggestqueries.google.com/complete/search?output=chrome&hl=en&q=" + query)
    results = json.loads(response.text)
    return results

In [27]:
#@title Type your Keyword
lang_code = "en-gb"  #@param {type:"string"}
search_term = "retirement villages" #@param {type:"string"}
results = get_results(search_term)
results

['retirement villages',
 ['retirement villages california',
  'retirement villages north carolina',
  'retirement villages near me',
  'retirement villages in florida',
  'retirement villages in arizona',
  'retirement villages in south carolina',
  'retirement villages in tennessee',
  'retirement villages in texas'],
 ['', '', '', '', '', '', '', ''],
 [],
 {'google:clientdata': {'bpc': False, 'phi': 0, 'tlw': False},
  'google:suggestrelevance': [601, 600, 555, 554, 553, 552, 551, 550],
  'google:suggestsubtypes': [[402],
   [402],
   [512],
   [512],
   [512],
   [512],
   [512],
   [512]],
  'google:suggesttype': ['QUERY',
   'QUERY',
   'QUERY',
   'QUERY',
   'QUERY',
   'QUERY',
   'QUERY',
   'QUERY'],
  'google:verbatimrelevance': 1300}]

#Let's format the results

In [6]:
def format_results(results):
    suggestions = []
    for index, value in enumerate(results[1]):
        suggestion = {'term': value, 'relevance': results[4]['google:suggestrelevance'][index]}
        suggestions.append(suggestion)
    return suggestions

In [7]:
formatted_results = format_results(results)
formatted_results


[{'relevance': 601, 'term': 'retirement villages california'},
 {'relevance': 600, 'term': 'retirement villages north carolina'},
 {'relevance': 555, 'term': 'retirement villages near me'},
 {'relevance': 554, 'term': 'retirement villages in florida'},
 {'relevance': 553, 'term': 'retirement villages in arizona'},
 {'relevance': 552, 'term': 'retirement villages in south carolina'},
 {'relevance': 551, 'term': 'retirement villages in tennessee'},
 {'relevance': 550, 'term': 'retirement villages in texas'}]

🔗 **"Relevance"**  refers to an automated estimation based on the frequency that specific term is subjected to in the search results page

#Spice-up the research by adding some term suffixes 🌶

In [8]:
def get_expanded_term_suffixes():
    expanded_term_suffixes = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
                             'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
    return expanded_term_suffixes

#Define your Content Angle according to the target funnel stage

In [9]:
def get_expanded_term_prefixes():
    expanded_term_prefixes = ['what *', 'where *', 'how to *', 'why *', 'buy*', 'how much*',
                              'best *', 'worse *', 'rent*', 'sale*', 'offer*','vs*','or*'
                             ]
    return expanded_term_prefixes

Content angle default's definition: 


```
# ['what *', 'where *', 'how to *', 'why *', 'vs*', 'or*', 'buy*', 'how much*',
                              'best *', 'worse *', 'tutorial *', 'tips *', 'ideas *', 'review *', 'guide *' 
                             ]
```



#Make sure to expand the search 

In [10]:
def get_expanded_terms(query):

    expanded_term_prefixes = get_expanded_term_prefixes()
    expanded_term_suffixes = get_expanded_term_suffixes()   

    terms = []
    terms.append(query)

    for term in expanded_term_prefixes:
        terms.append(term + ' ' + query)

    for term in expanded_term_suffixes:
        terms.append(query + ' ' + term)

    return terms


In [11]:
get_expanded_terms(search_term)


['retirement villages',
 'what * retirement villages',
 'where * retirement villages',
 'how to * retirement villages',
 'why * retirement villages',
 'buy* retirement villages',
 'how much* retirement villages',
 'best * retirement villages',
 'worse * retirement villages',
 'rent* retirement villages',
 'sale* retirement villages',
 'offer* retirement villages',
 'vs* retirement villages',
 'or* retirement villages',
 'retirement villages a',
 'retirement villages b',
 'retirement villages c',
 'retirement villages d',
 'retirement villages e',
 'retirement villages f',
 'retirement villages g',
 'retirement villages h',
 'retirement villages i',
 'retirement villages j',
 'retirement villages k',
 'retirement villages l',
 'retirement villages m',
 'retirement villages n',
 'retirement villages o',
 'retirement villages p',
 'retirement villages q',
 'retirement villages r',
 'retirement villages s',
 'retirement villages t',
 'retirement villages u',
 'retirement villages v',
 'ret

#Look for further suggestions




In [12]:
def get_expanded_suggestions(query):

    all_results = []

    expanded_terms = get_expanded_terms(query)
    for term in expanded_terms:
        results = get_results(term)
        results = format_results(results)
        all_results = all_results + results
        all_results = sorted(all_results, key=lambda k: k['relevance'], reverse=True)
    return all_results


In [17]:
#@title Obtain the Output and Save the Data frame
expanded_results = get_expanded_suggestions(search_term)
expanded_results_df = pd.DataFrame(expanded_results)
expanded_results_df.columns = ['Keywords', 'Relevance']
expanded_results_df.to_csv('keywords.csv')
expanded_results_df

Unnamed: 0,Keywords,Relevance
0,retirement villages victoria,1252
1,retirement villages arizona,1251
2,retirement villages colorado,1251
3,retirement villages ocala florida,1251
4,retirement villages queensland,1251
...,...,...
272,retirement villages vermont,550
273,retirement villages wollongong,550
274,are retirement villages worth it,550
275,retirement villages yeppoon qld,550


#Optional - Style the Keyword Autosuggest Output with Pandas Styling

In [23]:
#@title Style Pandas tables with CSS
expanded_results_df = pd.read_csv('/content/keywords.csv') ] # you need to paste the file path from the previously saved data frame
selection = ['Keywords','Relevance']
df = expanded_results_df[selection]
df.head(20).style.set_table_styles(
[{'selector': 'th',
  'props': [('background', '#7CAE00'), 
            ('color', 'white'),
            ('font-family', 'verdana')]},
 
 {'selector': 'td',
  'props': [('font-family', 'verdana')]},

 {'selector': 'tr:nth-of-type(odd)',
  'props': [('background', '#DCDCDC')]}, 
 
 {'selector': 'tr:nth-of-type(even)',
  'props': [('background', 'white')]},
 
]
).hide_index()

Keywords,Relevance
retirement villages victoria,1252
retirement villages arizona,1251
retirement villages colorado,1251
retirement villages ocala florida,1251
retirement villages queensland,1251
retirement villages south carolina,1251
retirement villages uk,1251
retirement villages victoria point,1251
retirement villages near me,1250
retirement villages arkansas,1250


In [24]:
#@title Save the Output w/o Style Adjustments
df.to_csv(r'iCloud Drive\Scrivania\cluster.csv', index = False, header=True)