### Good Guides ###
- https://stackoverflow.com/questions/46361494/how-to-get-the-localstorage-with-python-and-selenium-webdriver
- https://www.w3schools.com/python/python_datetime.asp
- https://stackoverflow.com/questions/4196971/how-to-get-the-html-tag-html-with-javascript-jquery
- https://stackoverflow.com/questions/34562095/scrollintoview-vs-movetoelement
- https://stackoverflow.com/questions/41744368/scrolling-to-element-using-webdriver
- https://selenium-python.readthedocs.io/locating-elements.html
- https://www.crummy.com/software/BeautifulSoup/bs4/doc/

### Inspiration For Future Features 

- https://github.com/socialbotspy/LinkedinPy
- https://github.com/ZiaUrR3hman/LinkedSocialToolkit
- https://github.com/manjurulhoque/python-linkedin-bot/blob/master/main.py      

In [6]:
#Module Dependencies
import time
from time import sleep
import datetime
import selenium
import urllib
import re
from bs4 import BeautifulSoup
from random import randint
from random import uniform

#Import Selenium Dependencies
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By

import os, sys
import pickle
import json

from collections import defaultdict

import pandas as pd

# 1. Let's get the data set that we need to scrape

In [8]:
df = pd.read_csv('website_data.csv')

In [9]:
df.head(15)

Unnamed: 0,Site,Domain Rating,Total Backlinks,Total Keywords,Total Traffic
0,ferryads.com,1.0,44,0,0
1,gotlice.nyc,0.0,2,0,0
2,gradoassociates.com,0.0,5,0,0
3,kemenycpa.com,0.0,5,0,0
4,rainbowcct.com,0.0,10,0,0
5,selectpop.com,0.0,12,0,0
6,framboisecatering.com#home,,0,0,0
7,lionsprideleadership.com,8.0,55,1,0
8,prcision.com,7.0,170,1,0
9,postbot.us,6.0,113,1,0


------------------------------------------------------------------------------------------------------------------------

# 2. Let's study a Google Search Query so that we can understand how to structure our target URL:

In [16]:
single_keyword_url = 'https://www.google.com/search?q=vidioh&oq=vidioh+&aqs=chrome..69i57j69i60j69i61l2j69i60l2.3654j0j1&sourceid=chrome&ie=UTF-8'

In [13]:
single_word_query = 'https://www.google.com/search?q={0}&oq={1}+&aqs=chrome..69i57j69i60j69i61l2j69i60l2.3654j0j1&sourceid=chrome&ie=UTF-8'.format('vidioh', 'vidioh')

------------------------------------------------------------------------------------------------------------------------

In [17]:
multiple_keyword_url = 'https://www.google.com/search?q=video+brochures&oq=video+brochures&aqs=chrome..69i57j69i60j69i61.1545j0j1&sourceid=chrome&ie=UTF-8'

In [19]:
multiple_word_query = 'https://www.google.com/search?q={0}&oq={1}&aqs=chrome..69i57j69i60j69i61.1545j0j1&sourceid=chrome&ie=UTF-8'.format('video+brochures', 'video+videobrochures')

------------------------------------------------------------------------------------------------------------------------

# 3. Identify a GMB knowledge panel div 

If there is a div with the class = knowledge-panel on the web page then we can assume that the business brand query is triggered by showcasing a Google My Business page. Therefore we can use this to help with our digital marketing prospecting for local businesses that have yet to invest within a Google My Business page.

------------------------------------------------------------------------------------------------------------------------

# 4. Extract specific queries for every brand

Now we will remove all of the website extensions such as .org, .co.uk or .com by simply looking for the first mention of the character: . Now we have some brand names that can be used inside of a custom google search with Selenium!

In [47]:
df['Queries'] = df['Site'].apply(lambda x: x[0 : x.find('.')])

# 5. Scrape Google Search Engine Results Page with A BIG TIMER

In [70]:
driver = webdriver.Chrome(executable_path='chromedrivers/chromedriver')

In [68]:
#urls = ['ferryads' , 'matthewfuneralhome'] <-- I built the method to work on two search queries before moving to the entire 300+ list.

In [71]:
knowledge_panel_results = []

query_string = 'https://www.google.co.uk/search?source=hp&ei=_aegXcHTIsOVsAfNpLf4Bg&q={}&oq={}&gs_l=psy-ab.3..0i131j0l3j0i131j0l3j0i131j0.302585.302825..302898...0.0..0.46.172.4......0....1..gws-wiz.....0.GfS7vSMN0Qs&ved=0ahUKEwiBxtPaypTlAhXDCuwKHU3SDW8Q4dUDCAg&uact=5'

for url in list(df['Queries']):
    query = query_string.format(url, url)
    driver.get(query)
    
    try: 
        element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "knowledge-panel"))
            )

        knowledge_panel_results.append('True')


    except:
        knowledge_panel_results.append('False')
        
    sleep(randint(10,29))   

In [72]:
len(knowledge_panel_results)

362

In [73]:
df.shape

(362, 6)

In [74]:
df['Knowledge_Panel_Results'] = knowledge_panel_results

In [77]:
df.head(12)

Unnamed: 0,Site,Domain Rating,Total Backlinks,Total Keywords,Total Traffic,Queries,Knowledge_Panel_Results
0,ferryads.com,1.0,44,0,0,ferryads,False
1,gotlice.nyc,0.0,2,0,0,gotlice,True
2,gradoassociates.com,0.0,5,0,0,gradoassociates,False
3,kemenycpa.com,0.0,5,0,0,kemenycpa,False
4,rainbowcct.com,0.0,10,0,0,rainbowcct,False
5,selectpop.com,0.0,12,0,0,selectpop,False
6,framboisecatering.com#home,,0,0,0,framboisecatering,True
7,lionsprideleadership.com,8.0,55,1,0,lionsprideleadership,False
8,prcision.com,7.0,170,1,0,prcision,True
9,postbot.us,6.0,113,1,0,postbot,False


In [76]:
df.to_csv('results.csv')