# Data Retrieval Direct from Source

## Installation
Open a conda terminal and install db.py

pip install db.py

In the docs for db.py, a Python 2 only module is listed as the one to use for MySQL. Instead, use the following package:

pip install mysqlclient

There are a few other options but this is the only one I've seen working

In [18]:
from db import DB
import pandas as pd
import numpy as np
import matplotlib as plt

## Establishing connection with the database
Store your db credentials in a local profile using our shared name, mariadb_ro. This way, whenever we share notebooks, our machines will all automatically use our own credentials.

In [19]:
db = DB(username="imama_ro", password="NYCnL5jJ4xSqcWU/6JUJog==", hostname="db-dw.lvh.systems",dbtype="mysql")
db.save_credentials(profile="mariadb_ro")

You only need to run this once. After that, execute the following code to load your profile

In [20]:
db = DB(profile="mariadb_ro")

# Finding Functions

In [21]:
#db.find_table("A*")

# Querying the Database

All results are automatically placed into a DataFrame.
Either run the query directly from a file or from a string 

In [22]:
#from file
#df_from_file = db.query_from_file("myscript.sql")

# Downloading the Search console data first

Note: Everything is easier to read when the query string is stored seperately and errors will be reduced when this is encapsulated in triple quotes. 

In [8]:
q1 = """
select google_search_performance.page as 'page', google_search_performance.query as 'query',
       sum(clicks) as 'clicks',
       sum(impressions) as 'impressions',
       max(impressions) as 'max_impressions',
        sum(clicks)/sum(impressions) as 'ctr',
        avg(google_search_performance.position) as 'position',
       sum(impressions* position)/sum(impressions) as 'weighted_position'
from dw.google_search_performance
where _date>=date_sub(curdate(),interval 30 day ) and query NOT LIKE '%love%'
group by 1,2
order by impressions desc ;

"""

In [12]:
df_query = db.query(q1)

In [23]:
df_query

Unnamed: 0,page,query,clicks,impressions,max_impressions,ctr,position,weighted_position
0,https://www.loveholidays.com/holidays/beach-ho...,on the beach,1407.0,456300.0,15650.0,0.00308,7.352495,3.930509
1,https://www.loveholidays.com/holidays/thomson-...,thomson,525.0,131082.0,3588.0,0.00401,9.536455,6.402396
2,https://www.loveholidays.com/holidays/cheap-ho...,cheap holidays,3475.0,125233.0,4311.0,0.02775,7.674504,5.438362
3,https://www.loveholidays.com/holidays/,holidays,1621.0,108718.0,3567.0,0.01491,10.462910,8.079742
4,https://www.loveholidays.com/holidays/first-ch...,first choice,77.0,97986.0,5409.0,0.00079,9.497444,9.054521
...,...,...,...,...,...,...,...,...
143148,https://www.loveholidays.com/holidays/st-lucia...,cheap all inclusive holidays to st lucia,1.0,1.0,1.0,1.00000,9.000000,9.000000
143149,https://www.loveholidays.com/holidays/gran-can...,parque mar gran canaria,1.0,1.0,1.0,1.00000,16.000000,16.000000
143150,https://www.loveholidays.com/holidays/corfu/an...,angela beach resort corfu,1.0,1.0,1.0,1.00000,11.000000,11.000000
143151,https://www.loveholidays.com/holidays/tenerife...,city break tenerife 2020,1.0,1.0,1.0,1.00000,4.000000,4.000000


# Screaming frog

In [25]:
df_excel =df_query.to_csv("query_output1.csv")

In [26]:
df_meta= pd.read_excel("list_mode_export_details.xlsx")

In [27]:
df_meta

Unnamed: 0,page,Status Code,Title 1,Title 1 Length,Title 1 Pixel Width,Meta Description 1,Meta Description 1 Length,Meta Description 1 Pixel Width
0,https://www.loveholidays.com/holidays/beach-ho...,200,Beach Holidays 2019 / 2020 | Holidays from £81...,67,613,Picking the right beach holiday is an incredib...,159,1005
1,https://www.loveholidays.com/holidays/thomson-...,200,Thomson Holidays 2019 / 2020 | loveholidays.com,47,439,Looking for the best deals on Thomson Holidays...,113,750
2,https://www.loveholidays.com/holidays/cheap-ho...,200,Cheap Holidays 2019 / 2020 | Holidays from £68...,67,615,Finding your cheap holiday is easy. Save money...,149,959
3,https://www.loveholidays.com/holidays/,200,Holidays 2019 | Holiday Search | loveholidays.com,49,441,,0,0
4,https://www.loveholidays.com/holidays/first-ch...,200,First Choice Holidays 2019 / 2020 | loveholida...,52,464,Looking for the best deals on First Choice Hol...,118,767
...,...,...,...,...,...,...,...,...
12303,https://www.loveholidays.com/holidays/menorca/...,404,,0,0,,0,0
12304,https://www.loveholidays.com/holidays/turkey/a...,200,"Armas Gul Beach - All Inclusive in Kemer, Turk...",87,770,Book your holiday at the Armas Gul Beach - All...,161,1050
12305,https://www.loveholidays.com/holidays/tunisia/...,200,"One Resort Aqua Park & Spa, Skanes | loveholidays",49,461,Book your holiday at the One Resort Aqua Park ...,157,1045
12306,https://www.loveholidays.com/holidays/egypt/re...,200,"Coral Sea Sensatori in Sharm el Sheikh, Egypt ...",84,748,Book your holiday at the Coral Sea Sensatori i...,159,1043


In [28]:
df_combined = df_query.merge(df_meta,on='page',how='left')

In [29]:
df_combined

Unnamed: 0,page,query,clicks,impressions,max_impressions,ctr,position,weighted_position,Status Code,Title 1,Title 1 Length,Title 1 Pixel Width,Meta Description 1,Meta Description 1 Length,Meta Description 1 Pixel Width
0,https://www.loveholidays.com/holidays/beach-ho...,on the beach,1407.0,456300.0,15650.0,0.00308,7.352495,3.930509,200.0,Beach Holidays 2019 / 2020 | Holidays from £81...,67.0,613.0,Picking the right beach holiday is an incredib...,159.0,1005.0
1,https://www.loveholidays.com/holidays/thomson-...,thomson,525.0,131082.0,3588.0,0.00401,9.536455,6.402396,200.0,Thomson Holidays 2019 / 2020 | loveholidays.com,47.0,439.0,Looking for the best deals on Thomson Holidays...,113.0,750.0
2,https://www.loveholidays.com/holidays/cheap-ho...,cheap holidays,3475.0,125233.0,4311.0,0.02775,7.674504,5.438362,200.0,Cheap Holidays 2019 / 2020 | Holidays from £68...,67.0,615.0,Finding your cheap holiday is easy. Save money...,149.0,959.0
3,https://www.loveholidays.com/holidays/,holidays,1621.0,108718.0,3567.0,0.01491,10.462910,8.079742,200.0,Holidays 2019 | Holiday Search | loveholidays.com,49.0,441.0,,0.0,0.0
4,https://www.loveholidays.com/holidays/first-ch...,first choice,77.0,97986.0,5409.0,0.00079,9.497444,9.054521,200.0,First Choice Holidays 2019 / 2020 | loveholida...,52.0,464.0,Looking for the best deals on First Choice Hol...,118.0,767.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143148,https://www.loveholidays.com/holidays/st-lucia...,cheap all inclusive holidays to st lucia,1.0,1.0,1.0,1.00000,9.000000,9.000000,200.0,St Lucia Holidays 2019 / 2020 | Holidays from ...,71.0,640.0,St Lucia holidays offer a paradise of rainfore...,152.0,978.0
143149,https://www.loveholidays.com/holidays/gran-can...,parque mar gran canaria,1.0,1.0,1.0,1.00000,16.000000,16.000000,200.0,"Parquemar in Gran Canaria, Playa del Ingles | ...",82.0,732.0,Book your holiday at the Parquemar in Playa de...,150.0,982.0
143150,https://www.loveholidays.com/holidays/corfu/an...,angela beach resort corfu,1.0,1.0,1.0,1.00000,11.000000,11.000000,200.0,"Angela Beach in Corfu, Mediterranean | Holiday...",74.0,668.0,Book your holiday at the Angela Beach in Corfu...,142.0,936.0
143151,https://www.loveholidays.com/holidays/tenerife...,city break tenerife 2020,1.0,1.0,1.0,1.00000,4.000000,4.000000,200.0,Tenerife Holidays 2019 / 2020 | Holidays from ...,71.0,641.0,"Tenerife holidays offer year round sunshine, m...",152.0,960.0


In [33]:
df_combined.drop(['Title 1 Pixel Width','Meta Description 1 Pixel Width'],axis=1)

Unnamed: 0,page,query,clicks,impressions,max_impressions,ctr,position,weighted_position,Status Code,Title 1,Title 1 Length,Meta Description 1,Meta Description 1 Length
0,https://www.loveholidays.com/holidays/beach-ho...,on the beach,1407.0,456300.0,15650.0,0.00308,7.352495,3.930509,200.0,Beach Holidays 2019 / 2020 | Holidays from £81...,67.0,Picking the right beach holiday is an incredib...,159.0
1,https://www.loveholidays.com/holidays/thomson-...,thomson,525.0,131082.0,3588.0,0.00401,9.536455,6.402396,200.0,Thomson Holidays 2019 / 2020 | loveholidays.com,47.0,Looking for the best deals on Thomson Holidays...,113.0
2,https://www.loveholidays.com/holidays/cheap-ho...,cheap holidays,3475.0,125233.0,4311.0,0.02775,7.674504,5.438362,200.0,Cheap Holidays 2019 / 2020 | Holidays from £68...,67.0,Finding your cheap holiday is easy. Save money...,149.0
3,https://www.loveholidays.com/holidays/,holidays,1621.0,108718.0,3567.0,0.01491,10.462910,8.079742,200.0,Holidays 2019 | Holiday Search | loveholidays.com,49.0,,0.0
4,https://www.loveholidays.com/holidays/first-ch...,first choice,77.0,97986.0,5409.0,0.00079,9.497444,9.054521,200.0,First Choice Holidays 2019 / 2020 | loveholida...,52.0,Looking for the best deals on First Choice Hol...,118.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
143148,https://www.loveholidays.com/holidays/st-lucia...,cheap all inclusive holidays to st lucia,1.0,1.0,1.0,1.00000,9.000000,9.000000,200.0,St Lucia Holidays 2019 / 2020 | Holidays from ...,71.0,St Lucia holidays offer a paradise of rainfore...,152.0
143149,https://www.loveholidays.com/holidays/gran-can...,parque mar gran canaria,1.0,1.0,1.0,1.00000,16.000000,16.000000,200.0,"Parquemar in Gran Canaria, Playa del Ingles | ...",82.0,Book your holiday at the Parquemar in Playa de...,150.0
143150,https://www.loveholidays.com/holidays/corfu/an...,angela beach resort corfu,1.0,1.0,1.0,1.00000,11.000000,11.000000,200.0,"Angela Beach in Corfu, Mediterranean | Holiday...",74.0,Book your holiday at the Angela Beach in Corfu...,142.0
143151,https://www.loveholidays.com/holidays/tenerife...,city break tenerife 2020,1.0,1.0,1.0,1.00000,4.000000,4.000000,200.0,Tenerife Holidays 2019 / 2020 | Holidays from ...,71.0,"Tenerife holidays offer year round sunshine, m...",152.0


In [None]:
df_combined.to_csv("title_clicks_data.")

# Installing headless chrome 

In [None]:
# Using headless chrome we'll extract the page title of all URLs
# Installing the relevant drivers



In [None]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd
import time

# Installing the chrome driver

In [None]:
# chrome_options = Options()
# chrome_options.add_argument("--headless")
# driver = webdriver.Chrome(r'C:\Users\Asad\seowork\chromedriver.exe',options=chrome_options)

In [None]:
# page_title= []
# desc = []
# for u in page_list:
#     driver.get(u)
# #     Getting the page title
#     pt = driver.title
#     page_title.append(pt)
    
# #     Getting the page description
#     pd = driver.find_elements_by_xpath('/html/head/meta[9]')[0]
#     desc.append(pd.get_attribute("content"))
    
#     time.sleep(1)
