<a href="https://colab.research.google.com/github/drohe/Twittorials/blob/master/interestingFindsMobileSerpFeatureAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Understanding Interesting Finds Features in Mobile SERPs

First, let's install Selenium and load up the Chrome webdriver!

In [1]:
%%capture
!pip install selenium
!apt-get update # to update ubuntu to correctly run apt install
!apt install chromium-chromedriver
!ln -s /usr/lib/chromium-browser/chromedriver /usr/bin

##Mount Your Google Drive
In Google Drive, access your Colab Notebooks folder and create a new folder called "Interesting Finds"

Within that folder, you can upload a .csv file of your list of queries to run. Each query should be row. Save the file as "queries.csv" and upload it to your "Colab Notebooks > Interesting Finds" Drive folder.

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


Once the drive is mounted, select the folder icon in the left menu of the notebook and then open the "gdrive" folder. Expand "My Drive" and then "Colab Notebooks" to ensure that your queries.csv file is saved at the path of '/content/gdrive/My Drive/Colab Notebooks/Interesting Finds/queries.csv'. If not, right click to copy the path and replace it in the "with open" line below.

##Import the libraries and Selenium

In [4]:
import csv
import os
import requests
import urllib
import sys
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from tqdm import tqdm



## Set Up Chromedriver
This section will past the appropriate options into our Chromedriver in order to get mobile emulation. Changing the browser size alone isn't enough to mimic the mobile environment for this feature to appear.

In [5]:
mobile_emulation = {'deviceName': 'Pixel 2'}
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option("mobileEmulation", mobile_emulation)
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
browser = webdriver.Chrome('chromedriver',options=chrome_options)

##Create The Output Directory and csv
This section will create a new directory titled "Interesting Finds" for all of our outputs (screenshots) and the .csv file indicating which queries included the feature and what links were in the feature when present).

In [None]:
os.mkdir('/content/Interesting Finds/')
outputdir = '/content/Interesting Finds/'
f = open('/content/Interesting Finds/output.csv', "w+", newline="\n", encoding="utf-8-sig")
fw = csv.writer(f)
fw.writerow(["Query","Interesting Finds\?", "Finds Links"])

##Get The Data From queries.csv
Open (and read) the queries.csv file that was uploaded to your Google Drive > Colab Notebooks > Interesting Finds folder

In [7]:
with open('/content/gdrive/My Drive/Colab Notebooks/Interesting Finds/queries.csv',newline='') as r:
    reader = csv.reader(r)
    queries = list(reader)

##Open Google.com via Chrome Driver (named browser)
In addition to fetching google.com, we will set our browser window size so that we can use that size for the math used to capture the full page in screenshots later.

In [9]:
browser.get('https://www.google.com')
browser.set_window_size(360,640)

##Loop Through Each Query
With the browser open to google.com, we will input each query from our "queries.csv", look on the SERP for classes that signfiy an Interesting Finds element is present and pull the links from within if it is.
**NOTE** Scraping data from Google search results violates their Terms of Service. If you'd rather not test that, you can remove everything between the "Interesting Finds" and "Start Screenshots" comments below to ONLY pull screenshots of the search results page (screenshots are okay as long as you don't alter them in anyway to share publicly). You'll then just have to review all of your image files to determine if the feature is present rather than filtering the data in the output.csv.

In [None]:
### BEGIN THE LOOP THROUGH EACH QUERY OF YOUR QUERIES ARRAY
for query in tqdm(queries):
  query = str(query[0])
  print('running ' + query)
  e = browser.find_element_by_name("q") #Find the input field to have Selenium type in the query
  e.clear()
  e.send_keys(query) #Selenium
  e = browser.find_element_by_name("q")
  e.send_keys(Keys.RETURN)
  e = browser.find_element_by_name("q")
  e = WebDriverWait(browser, 10).until(
        EC.presence_of_element_located((By.ID, "botstuff"))
    )
  body = browser.find_elements_by_tag_name('body')[0] #This is used for screenshots further down

#Interesting Finds
  if browser.find_elements_by_css_selector('g-card.I7zR5'):
    interestingFinds = True
    findsCode = browser.find_elements_by_css_selector('g-card.I7zR5')
    findsLinks = []
    for card in findsCode:
        links = card.find_elements_by_css_selector('div.mnr-c div div div a')
        for link in links:
            href = link.get_attribute('href')
            amp = False
            if link.get_attribute('class'):
                classes = link.get_attribute('class')
                if 'amp_r' in classes:
                    amp = True
            link = {'link': href, 'amp': amp}
            findsLinks.append(link)
  else:
    interestingFinds = False
    findsLinks = ""

  fw.writerow([query,interestingFinds,findsLinks]) #create row in CSV


### START SCREENSHOTS
  imageName = query.replace(' ','-')
  browser.save_screenshot(outputdir + imageName + '.png')
  totalLength = body.size['height']
  lastClip = totalLength -500
  clipsNeeded = round(lastClip/500)
  i = 0
  clipHeight = 500
  while i < clipsNeeded:
    browser.execute_script('window.scrollTo(360,arguments[0])',clipHeight)
    clipNum = str(i).replace(' ','')
    browser.save_screenshot(outputdir + imageName + clipNum + '.png')
    clipHeight = clipHeight + 500
    i += 1

##Zip Up and Download The Screenshots and output.csv Files

In [None]:
f.close()
print('ready to zip')
!zip -r /content/interestingFinds.zip '/content/Interesting Finds/'
from google.colab import files
files.download('/content/interestingFinds.zip')