# Selenium

A tool that lets you automate clicking/typing

Great for scraping websites that are protected by user/pass, because you can use selenium to get past that screen before starting to scrape.

https://selenium-python.readthedocs.io/navigating.html (if you forget commands / need to find new commands)

In [0]:
import selenium.webdriver
import pandas as pd

To install selenium, `pip install selenium`

To install the chromedriver, `brew cask install chromedriver`

or download from https://chromedriver.storage.googleapis.com/index.html?path=2.42/

In [0]:
driver = selenium.webdriver.Chrome()

Let's get it up and running!

In [0]:
driver.get("http://www.google.com")

## Meal Plan Usage Data

The demo we'll be showing today is scraping __meal plan__ usage from the CBORD website. As we all know the interface is ugly and you can't just download the data, so Selenium will be perfect for this.

In [0]:
driver.get('https://csg-web1.eservices.virginia.edu/student/svc_history.php')

### Login with NetBadge
(could do manually, but we will show for example's sake)

In [0]:
element = driver.find_element_by_xpath('//*[@id="divContent"]/table[2]/tbody/tr/td[2]/div/a')
element.click()

XPath is a query language for selecting elements from a webpage. Every element will have a unique xpath.

In [0]:
username = driver.find_element_by_xpath('//*[@id="user"]')
username.send_keys('ag5ym')

In [0]:
password = driver.find_element_by_xpath('//*[@id="pass"]')
import getpass
pw = getpass.getpass()

In [0]:
password.send_keys(pw)

In [0]:
#click login
driver.find_element_by_xpath('//*[@id="loginBoxes"]/fieldset[2]/span[2]/form/p[3]/input[1]').click()

### Modify Buttons and Dropdown Menus using Selenium

Change month in the dropdown menu

In [0]:
element = driver.find_element_by_xpath('//*[@id="svcHistForm"]/table/tbody/tr[1]/td[2]/nobr/select[1]')
all_options = element.find_elements_by_tag_name("option")
for option in all_options:
    print("Value is: %s" % option.get_attribute("value")) # it's good to see what the option names actually are
    if option.get_attribute('value') == "08":
        option.click()

Change year in the dropdown menu

In [0]:
element = driver.find_element_by_xpath('//*[@id="svcHistForm"]/table/tbody/tr[1]/td[2]/nobr/select[3]')
all_options = element.find_elements_by_tag_name("option")
for option in all_options:
    print("Value is: %s" % option.get_attribute("value"))
    if option.get_attribute('value') == "2017":
        option.click()

Change meal plan option in the dropdown menu

In [0]:
element = driver.find_element_by_xpath('//*[@id="mnuPlan"]')
all_options = element.find_elements_by_tag_name("option")
for option in all_options:
    print("Value is: %s" % option.get_attribute("value"))
    if option.get_attribute('value') == "M_All_":
        option.click()

Click the "view history" button

In [0]:
driver.find_element_by_xpath('//*[@id="svcHistForm"]/input[1]').click()

Get the table's xpath. The "outerHTML" means that we want the whole table, and not the contents inside the `<table>` tag

In [0]:
datatable1 = driver.find_element_by_xpath('//*[@id="divHist"]/table').get_attribute('outerHTML')

In [0]:
datatable1 # we got the raw html

In [0]:
meal_swipes = pd.read_html(datatable1, header=0) # returns a list of dfs
meal_swipes = meal_swipes[0] # there was only 1 datatable in the list at the 0th index

In [0]:
meal_swipes.head(10)

## Data Analysis and Visualization

In [0]:
meal_swipes.groupby('Location').count().sort_values('Amount', ascending=False)[['Amount']]

Get the plus dollars usage now

In [0]:
element = driver.find_element_by_xpath('//*[@id="mnuPlan"]')
all_options = element.find_elements_by_tag_name("option")
for option in all_options:
    print("Value is: %s" % option.get_attribute("value"))
    if option.get_attribute('value') == "S31":
        option.click()

In [0]:
driver.find_element_by_xpath('//*[@id="svcHistForm"]/input[1]').click()

In [0]:
datatable2 = driver.find_element_by_xpath('//*[@id="divHist"]/table[2]').get_attribute('outerHTML')

In [0]:
plus_dollars = pd.read_html(datatable2, header=0) # returns a list of dfs
plus_dollars = plus_dollars[0]

In [0]:
plus_dollars.head()

In [0]:
plus_dollars.sort_values('Amount', ascending=True).head()

In [0]:
plus_dollars['Amount'] = plus_dollars['Amount'].apply(lambda x: float(x[1:]))

In [0]:
plus_dollars[plus_dollars['Amount'] >= 0].sort_values('Amount', ascending=False).head()

In [0]:
import matplotlib.pyplot as plt
import seaborn as sns

In [0]:
merged = pd.concat([plus_dollars, meal_swipes])

In [0]:
merged['Hour'] = merged['Post Date'].apply(lambda x: int(x.split(' ')[1].split(':')[0]))
merged.head()

In [0]:
df = merged.groupby('Hour').count()[['Post Date']].sort_index()

In [0]:
df

In [0]:
sns.barplot(df.index, df['Post Date'])

In [0]:
driver.close() #closes the Chrome browser currently being controlled