# Selenium Snippets

You can remove the need for one of your try-except statements by selecting tr tags in such a way as to exclude the header rows.  It looks like it requires Selenium to actually load the page because some javascript modifies the DOM after the initial request.  So simply using the requests library won't work.

In [1]:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver

# Demonstrate utility of Selenium

## Make request

In [2]:
url = 'https://www.sports-reference.com/cbb/seasons/2019-school-stats.html'
response = requests.get(url)

## Make soup

In [3]:
html = response.content
soup = BeautifulSoup(html,'lxml')

## Get and count tr tags

In [4]:
all_trs = soup.find_all('tr')
print('there are %d total tr elements' % len(all_trs))

there are 389 total tr elements


## Count how many have "data-row" attribute

In [5]:
data_trs = [tr for tr in all_trs if tr.has_attr('data-row')]
print('there are %d tr elements with class "data-row"' % len(data_trs))

there are 0 tr elements with class "data-row"


# Using Selenium

## Open chrome and navigate to page

In [6]:
driver = webdriver.Chrome()
driver.get(url)

## Get page source and make soup

In [7]:
page_source = driver.page_source
soup = BeautifulSoup(page_source)

## Get all tr elements

In [8]:
all_trs = soup.find_all('tr')
print('there are %d total tr elements' % len(all_trs))

there are 395 total tr elements


## Get all tr elements with class "data-row"

In [9]:
data_trs = [tr for tr in all_trs if tr.has_attr('data-row')]
print('there are %d tr elements with class "data-row"' % len(data_trs))

there are 387 tr elements with class "data-row"


## Close driver

In [10]:
driver.close()