<a href="https://colab.research.google.com/github/hamletbatista/semrush/blob/master/SEMRush_Weekly_Wisdom_Competitor_SERP_Features.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#What SERP features does my competitor has that I don't have?

We will learn:

1. How to pull data programatically from SEMRush
2. How to analyze the data 
3. Export the results to a CSV for further analysis





###Extract data from SEMRush


You can find SEMrush API reference here
https://www.semrush.com/api-analytics/

You can find your API key here https://www.semrush.com/api-use/

You can create forms to feed input into the notebook. See https://colab.research.google.com/notebooks/forms.ipynb

In [0]:
#@title Provide input values


domain = "advanceautoparts.com" #@param {type:"string"}

csv_file = "advanceautoparts.csv" #@param {type:"string"}

key = "TYPE SEMRush API key here" #@param {type:"string"}


### Leverage existing code

We will borrow my SEMrush code from this article https://searchengineland.com/brands-can-better-understand-users-on-third-party-sites-by-using-a-keyword-overlap-analysis-316157

In [0]:
import requests
from urllib.parse import urlencode, urlparse, urlunparse, quote
import pandas as pd

#def get_seo_branded_data(brand, domain, database="us", export_columns="Ph,Po,Nq,Ur,Tg,Td,Ts", display_limit=10000, display_filter="+|Ph|Co|{brand}"):
#Found that SERP features is -> Fl in https://www.semrush.com/api-analytics/#columns
def get_serp_features(domain, database="us", export_columns="Fl", display_limit=10000):
   
  #global key
  
  url_params={"type": "domain_organic",
             "key": key,
             "display_limit": display_limit,
             "export_columns": export_columns,
             "domain": domain,
             "database": database
             }

  api_url="https://api.semrush.com/"

  qs = urlencode(url_params)

  u = urlparse(api_url)

  api_request = urlunparse((u.scheme, u.netloc, u.path, u.params, qs, u.fragment))
  
  #print(api_request)
  
  r = requests.get(api_request)
  
  if r.status_code == 200:

    results = r.text.split("\r\n") #
    headers = results[0].split(";") # save result headers to list
    table = [x.split(";") for x in results[1:]] #save columns to list of lists
    
    df = pd.DataFrame(table, columns=headers).dropna() #remove null types
    
    return df
  
  else:
    print("API call failed with code {code}".format(code=r.status_code))
    
    return None
  

###Test your changes work

In [0]:
df = get_serp_features(domain)
df.head()

Unnamed: 0,SERP Features
0,1356
1,1356
2,36
3,13456
4,13456


###Scrape the SERP Features reference
We need to check the SERP Features reference to map the numbers to the name of the feature. See https://www.semrush.com/api-analytics/#serp_features


I used JavaScript code from https://gist.github.com/hamletbatista/b6424dac3801befbef56604a291cf2e3

In [0]:
#https://www.semrush.com/api-analytics/#serp_features)

#First element selector
#serp_features > div.api-body-table > table > tbody > tr:nth-child(1) > td:nth-child(2)

#We change it to #serp_features > div.api-body-table > table > tbody > tr > td:nth-child(2)

serp_index=["Instant answer", "Knowledge panel", "Carousel", "Local pack", "Top stories", "Image pack", "Site links", "Reviews", "Tweet", "Video", "Featured video", "Featured Snippet", "AMP", "Image", "AdWords top", "AdWords bottom", "Shopping ads", "Hotels Pack", "Jobs search", "Featured images", "Video Carousel", "People also ask"]


In [0]:
example= df["SERP Features"].iloc[0]
example

'1,3,5,6'

### Map feature ids to their names
Next we want to map this list of features to their names

In [0]:
example_list = example.split(",")
example_list

['1', '3', '5', '6']

In [0]:
for feature in example_list:
  print(feature)

1
3
5
6


In [0]:
#for feature in example_list:
#  print(serp_index[feature])

In [0]:
for feature in example_list:
  print(serp_index[int(feature)])

Knowledge panel
Local pack
Image pack
Site links


### Generalize solution
Now we will create a function to generalize this

In [0]:
def get_feature_names(indices):

  serp_index=["Instant answer", "Knowledge panel", "Carousel", "Local pack", "Top stories", "Image pack", "Site links", "Reviews", "Tweet", "Video", "Featured video", "Featured Snippet", "AMP", "Image", "AdWords top", "AdWords bottom", "Shopping ads", "Hotels Pack", "Jobs search", "Featured images", "Video Carousel", "People also ask"]
  
  index_list = indices.split(",")

  feature_names = list()

  for i in index_list:
      if len(i) > 0:
        feature_names.append(serp_index[int(i)])

  return feature_names

In [0]:
example_names = get_feature_names(example)
example_names

['Knowledge panel', 'Local pack', 'Image pack', 'Site links']

We can convert this list back to a string

In [0]:
",".join(example_names)

'Knowledge panel,Local pack,Image pack,Site links'

###Update dataframe with feature names
Now, we can add a new column to the dataframe with the names

In [0]:
df["SERP Features"].apply(lambda x: ",".join(get_feature_names(x)) )

0         Knowledge panel,Local pack,Image pack,Site links
1         Knowledge panel,Local pack,Image pack,Site links
2                                    Local pack,Site links
3        Knowledge panel,Local pack,Top stories,Image p...
4        Instant answer,Knowledge panel,Local pack,Top ...
                               ...                        
9996                                            Image pack
9997                                                      
9998                            Knowledge panel,Image pack
9999                                            Image pack
10000                                                     
Name: SERP Features, Length: 10001, dtype: object

In [0]:
df["SERP Feature Names"] = df["SERP Features"].apply(lambda x: ",".join(get_feature_names(x)) )

In [0]:
df.head()

Unnamed: 0,SERP Features,SERP Feature Names
0,1356,"Knowledge panel,Local pack,Image pack,Site links"
1,1356,"Knowledge panel,Local pack,Image pack,Site links"
2,36,"Local pack,Site links"
3,13456,"Knowledge panel,Local pack,Top stories,Image p..."
4,13456,"Instant answer,Knowledge panel,Local pack,Top ..."


###Simplify to get only the SERP names 

In [0]:
df_consolidated = df.groupby("SERP Feature Names").count()
df_consolidated

Unnamed: 0_level_0,SERP Features
SERP Feature Names,Unnamed: 1_level_1
,1399
Carousel,4
"Carousel,Image pack",11
"Carousel,Image pack,Featured Snippet",1
"Carousel,Top stories,Image pack",1
Featured Snippet,56
Image pack,3366
"Image pack,Featured Snippet",54
"Image pack,Reviews",8
"Image pack,Site links",89


### Download analysis to a CSV


In [0]:
df_consolidated.to_csv(csv_file)

In [0]:
!ls

advanceautoparts.csv  autozone.csv  sample_data


In [0]:
from google.colab import files

files.download(csv_file)