This is a code break down for the land_scrape.py script located in this repository. For further context, please see my LinkedIn article (link here) and the accompanying Youtube video. (link here)

The goal of this code break down is to be descriptive and accessible to any reader. If you are a tech nerd and you see a glaring issue in the way I have architected this solution, please make a pull request or reach and let me know. (email address)

In [16]:
''' There are a whole bunch of handy features from Selenium needed to drive our browser and control the pace of execution. '''
from selenium import webdriver as wd
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

For the sake of readibility in this breakdown, I will import libraries as they are used. Let's open up a browser and move it to the side.

In [17]:
''' open a new chrome browser using the webdriver service '''
chromedriver_service = Service('./chromedriver.exe')
browser = wd.Chrome(service=chromedriver_service)
browser.set_window_rect(x=930, y=0, width=1200, height=1125) # this is specific to my monitor

{'height': 1100, 'width': 1200, 'x': 930, 'y': 0}

Now that we have a browser, we could go the events page on the Higher Landing website. For the sake of security, I have stored all credentials and sensitive information into a JSON file. Loading this JSON into a Python dictionary object allows us to use key/value pairs to access private data.

In [18]:
import json
''' get secret test information from a json file '''
base_path = r"H:\2021-11-03-HIGHER-LANDING\\"
test_info_file_name = 'secret_test_info.json'
with open(base_path + './' + test_info_file_name, 'r') as t_f:
    json_string = t_f.read()
test_info_dict = json.loads(json_string)
print(test_info_dict.keys())

dict_keys(['base_path', 'user_name', 'user_password', 'required_sessions_file', 'events_url', 'locators'])


The keys listed above will be used for data access and field location. Below, we __get__ the value of the events url from our test info dictionary and then use the browser's __get__ function to load the url. 

In [19]:
''' open the browser to the events page of the course '''
url = test_info_dict.get('events_url')
browser.get(url)

This will take us to the log in page, but it is good practice to make the browser wait until everything is loaded. The WebdriverWait module is used to achieve this. Let's wait until the Log In button is ready to click. 

In [20]:
''' wait until we can click on the log in button'''
waiter = WebDriverWait(browser, 10)
ready_for_input = waiter.until(EC.element_to_be_clickable((
    By.ID, test_info_dict.get('locators').get('login_submit_button'))))

The By.ID selector is used multiple times here in conjunction with the locators from our test info dictionary. Selenium's find element function returns an html element that can be interacted with.

In [21]:
''' enter the user name and password by using locators, all data is stored in test_info_dict '''
user_name_field = browser.find_element(By.ID, test_info_dict.get('locators').get('user_name_field'))
user_name_field.send_keys(test_info_dict.get('user_name'))
user_password_field = browser.find_element(By.ID, test_info_dict.get('locators').get('user_pass_field'))
user_password_field.send_keys(test_info_dict.get('user_password'))

''' locate the login button and click it '''
login_button = browser.find_element(By.ID, test_info_dict.get('locators').get('login_submit_button'))
login_button.click()

Since we directly loaded the event page url, we will wait for the Upcoming Events link to be ready before clicking it.

In [22]:
''' wait until the Upcoming Events link is ready to be clicked '''
upcoming_events_locator = test_info_dict.get('locators').get('upcoming_events_link')
calendar_ready = waiter.until(EC.element_to_be_clickable((By.ID, upcoming_events_locator)))
upcoming_events_link = browser.find_element(By.ID, upcoming_events_locator)
upcoming_events_link.click()