# Google Search Console API (With Python)
author:    Jean-Christophe Chouinard

Role:      Sr. SEO Specialist at SEEK.com.au

Website:   jcchouinard.com

LinkedIn:  linkedin.com/in/jeanchristophechouinard/ 

Twitter:   twitter.com/@ChouinardJC

## Why Use The Google Search Console API?

Google limits the amount of data they report to the user. 

In the search performance report, you can only see **1000 rows** and **16 months** of data.

With GSC UI, it is also not possible to get **keywords per page** as they are reported in separate pages.

The Google Search Console API lets you extract a lot more than 1000 rows of data.

## Get Started
### Clone Github Repository
`$ git clone https://github.com/jcchouinard/GoogleSearchConsole-Tutorial.git`

### Install Requirements
`pip install -r requirements.txt`

### Learn Python for SEO
[jcchouinard.com/python-for-seo](https://www.jcchouinard.com/python-for-seo)

### Get API Keys
[jcchouinard.com/how-to-get-google-search-console-api-keys/](https://www.jcchouinard.com/how-to-get-google-search-console-api-keys/)

### How to format your request
[jcchouinard.com/what-is-google-search-console-api/](https://www.jcchouinard.com/what-is-google-search-console-api/)

In [1]:
site = 'https://www.jcchouinard.com'
creds = 'client_secrets.json'
output = 'gsc_data.csv'
start_date = '2020-07-15' 
end_date = '2020-07-25' # Default 3 days before today

## Authorize Your Credentials

In [2]:
from oauth import authorize_creds

webmasters_service = authorize_creds(creds) 

Authorizing Creds
Auth Successful


## Extract GSC Data by URL

In [3]:
from gsc_by_url import gsc_by_url

list_of_urls = [
    '/chrome-devtools-commands-for-seo/',
    '/learn-selenium-python-seo-automation/'
    ]

list_of_urls = [site + x for x in list_of_urls]
args = webmasters_service,site,list_of_urls,creds,start_date,end_date

gsc_by_url(*args)

Unnamed: 0,page,clicks,impressions
0,https://www.jcchouinard.com/chrome-devtools-co...,4,2762
1,https://www.jcchouinard.com/learn-selenium-pyt...,135,12836


## Extract Filtered Data from Google Search Console

Possible combinations:

Dimension: query, page.

Operator: contains, equals, notEquals, notContains


In [None]:
from gsc_with_filters import gsc_with_filters

# Filters
dimension = 'query' 
operator = 'contains'
expression = 'python'
args = webmasters_service,site,creds,dimension,operator,expression,start_date,end_date

gsc_with_filters(*args)

## Extract 100% of the data from Google Search Console

What the script does?

1. Creates an output folder if it does not exist using my site.

2. Checks output folder if dates are already extracted.

3. Dates that are already extracted are skipped.

4. Day by day, it requests lines by batch of 25K.

5. It iterates until all lines are extracted for that day.

6. New dates are appended to the existing CSV

In [3]:
from gsc_to_csv_by_month import gsc_to_csv

args = webmasters_service,site,output,creds,start_date,end_date
gsc_to_csv(*args)

Create project: www_jcchouinard_com
Checking existing dates in www_jcchouinard_com/
Start date at beginning: 2020-07-01 00:00:00
date = 2020-07-01
successful at 0
Numrows at the start of loop: 0
Numrows at the end of loop: 4126
Start date at beginning: 2020-07-02 00:00:00
date = 2020-07-02
successful at 0
Numrows at the start of loop: 0
Numrows at the end of loop: 4275
Start date at beginning: 2020-07-03 00:00:00
date = 2020-07-03
successful at 0
Numrows at the start of loop: 0
Numrows at the end of loop: 4500
Start date at beginning: 2020-07-04 00:00:00
date = 2020-07-04
successful at 0
Numrows at the start of loop: 0
Numrows at the end of loop: 3985
Start date at beginning: 2020-07-05 00:00:00
date = 2020-07-05
successful at 0
Numrows at the start of loop: 0
Numrows at the end of loop: 3825
Start date at beginning: 2020-07-06 00:00:00
date = 2020-07-06
successful at 0
Numrows at the start of loop: 0
Numrows at the end of loop: 4493
Start date at beginning: 2020-07-07 00:00:00
date = 

KeyboardInterrupt: 

In [None]:
from gsc_to_csv_by_month import gsc_to_csv

end_date = '2020-08-05' 
args = webmasters_service,site,output,creds,start_date
gsc_to_csv(*args)

## Group and Plot Keywords

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

from file_manip import date_to_index, return_df

In [None]:
site = 'https://www.jcchouinard.com'
filename = 'gsc_data.csv'

df = return_df(site,filename) # Reads all Saved CSVs
df

In [None]:
r = r'.*python.*'
df['query_type'] = ''
df['query_type'][df['query'].str.contains(r,regex=True)] = 'Python'
df['query_type'][~df['query'].str.contains(r,regex=True)] = 'Not-Python'
df['query_type'].head(5)

In [None]:
df = df.groupby(['date','query_type'])['clicks'].sum().reset_index()
df.head(5)

In [None]:
df = df.set_index(['date','query_type'])['clicks'].unstack()
df.head(5)

In [None]:
df = df.reset_index().rename_axis(None, axis=1)
df.head(5)

In [None]:
df = date_to_index(df,'date')
df.head(5)

In [None]:
df.plot(subplots=True,
        sharex=True,
        figsize=(6,6))
plt.title('Python VS Non-python Related Keywords')
plt.xlabel('Date')
plt.ylabel('Clicks')
plt.show()