Check [this link](https://colab.research.google.com/notebooks/snippets/importing_libraries.ipynb)  to see how to import different libraries into Google Colab. 

NOTE: Python 3.7+ is necessary for Selenium webscraping package.



**Drivers**

Selenium requires a driver to interface with the chosen browser. Firefox, for example, requires geckodriver, which needs to be installed before the below examples can be run. Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin.

Failure to observe this step will give you an error:

*selenium.common.exceptions.WebDriverException: Message: ‘geckodriver’ executable needs to be in PATH.*

Links to other webdrivers for other browser services can be found in the Selenium website: 

https://pypi.org/project/selenium/

We will be using a Google Chrome driver. You can download it from this site:  https://chromedriver.chromium.org/downloads

In [None]:
# install chromium, its driver, and selenium
!apt update
!apt install chromium-chromedriver
!pip install selenium
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
from selenium import webdriver


While using Google Colab, you ** must** run selenium in headless mode because it cannot display new browsers. Therefore, the only way to run it without headless mode would be through your own device, declaring your webdriver not specifying the headless option.

Headless mode is also a must if you plan to use the code in a server-side environment. 



In [None]:
# set options to be headless
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')


In [None]:
# Create a webdriver instance, ready to use
wd = webdriver.Chrome(options=options)
wd.get("https://www.google.com")

Even though it is run in headless mode, there's still a way to print the page's URL:

In [None]:
# Display webdriver's current url 
# This is useful when there are redirections on the website and you need the final URL.
print(wd.current_url)

https://www.google.com/


In [None]:

# Display the title of the page your webdriver is visiting
print(wd.title)


Google


In [None]:
# Display the full HTML of the page your webdriver is visting
print(wd.page_source)

In [None]:
# We need also time and pandas Python packages
import time
import pandas as pd

Locating data on a website is one of the main use cases for Selenium, either for a test suite (making sure that a specific element is present/absent on the page) or to extract data and save it for further analysis (web scraping).

There are many methods available in the Selenium API to select elements on the page. You can use:

Tag name

*   Class name
*   IDs
*   XPath
*   CSS selectors


The easiest way to locate an element is to open your Chrome dev tools and inspect the element that you need. 

A cool shortcut for this is (instead of right Click + inspect) to highlight the element you want with your mouse and then press **Ctrl + Shift + C** 

OR on MAC  **Cmd + Shift + C**.



**EXAMPLE 1:**

*   Open a new Chrome browser
*   Load the Yahoo homepage
*   Search for “seleniumhq”
*   Close the browser




In [None]:
#Example 1

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys


wd.get('http://www.yahoo.com')
assert 'Yahoo' in wd.title

elem = wd.find_element(By.NAME, 'p')  # Find the search box
elem.send_keys('seleniumhq' + Keys.RETURN)

wd.quit()

#Example 2: Self practice at home

In [None]:
driver = webdriver.Chrome(options=options)
driver.get('https://hoopshype.com/salaries/players/')


In [None]:
df = pd.DataFrame(columns=['Player','Salary','Year']) # creates master dataframe 

driver = webdriver.Chrome('/Users/MyUsername/Downloads/chromedriver')

for yr in range(1990,2019):
    page_num = str(yr) + '-' + str(yr+1) +'/'
    url = 'https://hoopshype.com/salaries/players/' + page_num
    driver.get(url)
    
    players = driver.find_elements_by_xpath('//td[@class="name"]')
    salaries = driver.find_elements_by_xpath('//td[@class="hh-salaries-sorted"]') 
    
    players_list = []
    for p in range(len(players)):
        players_list.append(players[p].text)
    
    salaries_list = []
    for s in range(len(salaries)):
        salaries_list.append(salaries[s].text)
    
    data_tuples = list(zip(players_list[1:],salaries_list[1:])) # list of each players name and salary paired together
    temp_df = pd.DataFrame(data_tuples, columns=['Player','Salary']) # creates dataframe of each tuple in list
    temp_df['Year'] = yr # adds season beginning year to each dataframe
    df = df.append(temp_df) # appends to master dataframe
    
driver.close()