## WSA Web Scraping with BeautifulSoup Demo

### Imports
For the necessary libraries to be available, make sure to run in your terminal:
* `pip install requests`
* `pip install bs4`
* `pip install mysql-connector-python`

In [None]:
import requests
from bs4 import BeautifulSoup

### Example: Michigan Football All-Time Passing Leaders
**Website**: https://www.sports-reference.com/cfb/schools/michigan/passing.html

This example shows the basics of web scraping using **Sports Reference**. Sports Reference is a website that contains detailed statistics for a large variety of sports, and could be a valuable source of data for your projects. 

Important steps include:
* Accessing the website using `requests.get`
* Creating a `soup` object
* Using `find` and `find_all` functions to get the specific data from the website

#### Step 1: Make a request to the specific URL

In [None]:
url = requests.get('https://www.sports-reference.com/cfb/schools/michigan/passing.html')

#### Step 2: Create the soup object

In [None]:
# This obtains all of the HTML content from the above URL request

soup = BeautifulSoup(url.text, 'html.parser')
#print(soup.prettify())

#### Step 3: Find table rows and columns
Use `.find` and `.find_all` to get the table, then the rows, then the columns from the HTML content

In [None]:
# The following line of code will find the first div tag with an id of div_passing
# It then iterates further to find the nested table and tbody tags
table = soup.find('div', attrs = {'id' : 'div_passing'}).find('table').find('tbody')
#print(table)

In [None]:
# The following line of code will find all tr tags that are nested within the tbody tag
rows = table.find_all('tr')

# Example to display column elements more clearly
ex_row = rows[0]
for col in ex_row:
    ind = ex_row.index(col)
    print('index', ind, ' ' * (2-len(str(ind))), ' | ', col.text, ' ' * (12-len(col.text)), ' | ', col)

# column   |  data         |  HTML 

#### Step 4: Index into table to save specific data

In [None]:
# This may require other functions as well
# For each value, we use .text to parse out seperators such as new line characters
# Remember to consider default values for the case that a table entry is null!

michigan_qb_data = []

for row in rows:
    columns = row.find_all('td')
    if len(columns) > 0:
        name = columns[0].find('a').text
        
        start_year = columns[1].text
        end_year = columns[2].text
        years_played = start_year + '-' + end_year
        
        pass_pct = float(columns[5].text)
        pass_yrds = int(columns[6].text)
        if columns[9].text == '':
            pass_td = 0
        else:
            pass_td = int(columns[9].text)
        ints = int(columns[10].text)

        if columns[11].text == '':
            qbr = 0.0
        else:
            qbr = float(columns[11].text)

    michigan_qb_data.append([name, years_played, pass_pct, pass_yrds, pass_td, ints, qbr])    

Now, let's view all of the scraped data!

In [None]:
for qb in michigan_qb_data:
    print(qb)

### Exercise: NFL Quarterback Data
**Website:** https://www.nfl.com/stats/player-stats/

Scrape the NFL website to gather the following data for the NFL's top quarterbacks so far this season:
* Name 
* Passing Yards
* Completion Percentage
* \# of Touchdowns 
* QB Rating

You will need to cast some string table entries into numerical values!

In [None]:
nfl_qbs = []

url = requests.get('https://www.nfl.com/stats/player-stats/')
soup = BeautifulSoup(url.text, 'html.parser')
#print(soup.prettify())

In [None]:
# What attribute and tags can we use to locate table data?

qb_table = soup.find('div', attrs={'REPLACE' : 'REPLACE'}).find('table')
qb_rows = qb_table.find('REPLACE').find_all('REPLACE')

In [None]:
# Example to display column elements more clearly
ex_row = qb_rows[0]
for col in ex_row:
    ind = ex_row.index(col)
    print('index', ind, ' ' * (2-len(str(ind))), ' | ', col.text.strip())

# column   |  data 

In [None]:
# Index into each row and save data to variables here!
for row in qb_rows:
    cols = row.find_all('REPLACE')
    
    # name = 
    #pass_yards = 
    # comp_per = 
    # td = 
    # qbr = 

    nfl_qbs.append([name, pass_yards, comp_per, td, qbr])

In [None]:
for qb in nfl_qbs:
    print(qb)