## WSA - Web Scraping with BeautifulSoup Demo Answers

### Imports
For the necessary libraries to be available, make sure to run in your terminal:
* `pip install requests`
* `pip install bs4`
* `pip install mysql-connector-python`

In [3]:
import requests
from bs4 import BeautifulSoup

### Example: Michigan Football All-Time Passing Leaders
**Website:** https://www.sports-reference.com/cfb/schools/michigan/passing.html

This example shows the basics of web scraping using **Sports Reference**. Sports Reference is a website that contains detailed statistics for a large variety of sports, and could be a valuable source of data for your projects. 

Important steps include:
* Accessing the website using `requests.get`
* Creating a `soup` object
* Using `find` and `find_all` functions to get the specific data from the website

#### Step 1: make a request to the specific URL

In [6]:
url = requests.get('https://www.sports-reference.com/cfb/schools/michigan/passing.html')

#### Step 2: create the soup object

In [8]:
# This obtains all of the HTML content from the above URL request

soup = BeautifulSoup(url.text, 'html.parser')
#print(soup.prettify())

#### Step 3: Find table rows and columns
Use `.find` and `.find_all` to get the table, then the rows, then the columns from the HTML content

In [10]:
# The following line of code will find the first div tag with an id of div_passing
# It then iterates further to find the nested table and tbody tags
table = soup.find('div', attrs = {'id' : 'div_passing'}).find('table').find('tbody')
#print(table)

In [11]:
# The following line of code will find all tr tags that are nested within the tbody tag
rows = table.find_all('tr')

for row in rows:
    columns = row.find_all('td')

In [12]:
# Example to display column elements more clearly
ex_row = rows[0]
for col in ex_row:
    ind = ex_row.index(col)
    print('index', ind, ' ' * (2-len(str(ind))), ' | ', col.text, ' ' * (12-len(col.text)), ' | ', col)

# column   |  data         |  HTML 

index 0    |  1              |  <th class="right" csk="1" data-stat="ranker" scope="row">1</th>
index 1    |  Chad Henne*    |  <td class="left" csk="Henne,Chad" data-append-csv="chad-henne-1" data-stat="player"><a href="/cfb/players/chad-henne-1.html">Chad Henne</a>*</td>
index 2    |  2004           |  <td class="center" data-stat="year_min">2004</td>
index 3    |  2007           |  <td class="center" data-stat="year_max">2007</td>
index 4    |  828            |  <td class="right" data-stat="pass_cmp">828</td>
index 5    |  1387           |  <td class="right" data-stat="pass_att">1387</td>
index 6    |  59.7           |  <td class="right" data-stat="pass_cmp_pct">59.7</td>
index 7    |  9715           |  <td class="right" data-stat="pass_yds">9715</td>
index 8    |  7.0            |  <td class="right" data-stat="pass_yds_per_att">7.0</td>
index 9    |  7.1            |  <td class="right" data-stat="adj_pass_yds_per_att">7.1</td>
index 10   |  87             |  <td class="right" data-

#### Step 4: Index into table to save specific data

In [14]:
# This may require other functions as well
# For each value, we use .text to parse out seperators such as new line characters
# Remember to consider default values for the case that a table entry is null!

michigan_qb_data = []

for row in rows:
    columns = row.find_all('td')
    if len(columns) > 0:
        name = columns[0].find('a').text
        
        start_year = columns[1].text
        end_year = columns[2].text
        years_played = start_year + '-' + end_year
        
        pass_pct = float(columns[5].text)
        pass_yrds = int(columns[6].text)
        if columns[9].text == '':
            pass_td = 0
        else:
            pass_td = int(columns[9].text)
        ints = int(columns[10].text)

        if columns[11].text == '':
            qbr = 0.0
        else:
            qbr = float(columns[11].text)

    michigan_qb_data.append([name, years_played, pass_pct, pass_yrds, pass_td, ints, qbr])    

Now, let's view all of the scraped data!

In [16]:
for qb in michigan_qb_data:
    print(qb)

['Chad Henne', '2004-2007', 59.7, 9715, 87, 37, 133.9]
['John Navarre', '2000-2003', 56.1, 9014, 70, 30, 126.0]
['Devin Gardner', '2010-2014', 60.4, 6336, 44, 32, 138.3]
['Denard Robinson', '2009-2012', 57.2, 6250, 49, 39, 138.6]
['J.J. McCarthy', '2021-2023', 67.6, 6226, 49, 11, 160.5]
['Elvis Grbac', '1989-1992', 63.1, 5859, 64, 29, 148.7]
['Shea Patterson', '2018-2019', 60.1, 5661, 45, 15, 144.2]
['Todd Collins', '1991-1994', 65.0, 5504, 34, 17, 146.5]
['Jim Harbaugh', '1983-1986', 63.2, 5214, 31, 19, 149.5]
['Tom Brady', '1996-1999', 61.9, 4773, 30, 17, 134.9]
['Steve Smith', '1980-1983', 50.1, 4529, 41, 30, 126.2]
['Rick Leach', '1975-1978', 47.6, 3799, 45, 29, 136.3]
['Brian Griese', '1995-1997', 59.5, 3663, 27, 15, 130.6]
['Wilton Speight', '2015-2017', 58.8, 3192, 22, 10, 132.2]
['Cade McNamara', '2020-2022', 63.1, 3181, 21, 7, 139.4]
['Jake Rudock', '2015-2015', 64.0, 3017, 20, 9, 141.5]
['Scott Dreisbach', '1995-1998', 54.7, 2920, 15, 12, 126.0]
['Tate Forcier', '2009-2010', 

### Exercise: NFL Quarterback Data
**Website:** https://www.nfl.com/stats/player-stats/

Scrape the NFL website to gather the following data for the NFL's top quarterbacks so far this season:
* Name 
* Passing Yards
* Completion Percentage
* \# of Touchdowns 
* QB Rating

You will need to cast some string table entries into numerical values!

In [18]:
nfl_qbs = []

url = requests.get('https://www.nfl.com/stats/player-stats/')
soup = BeautifulSoup(url.text, 'html.parser')

In [19]:
qb_table = soup.find('div', attrs={'class' : 'd3-o-table--horizontal-scroll'}).find('table')
qb_rows = qb_table.find('tbody').find_all('tr')

In [20]:
# Index into each row and save data to variables here!
for row in qb_rows:
    cols = row.find_all('td')

    name = cols[0].text.strip()
    pass_yards = int(cols[1].text)
    comp_per = float(cols[5].text)
    td = int(cols[6].text)
    qbr = float(cols[8].text)

    nfl_qbs.append([name, pass_yards, comp_per, td, qbr])

In [21]:
for qb in nfl_qbs:
    print(qb)

['Brock Purdy', 1130, 68.8, 5, 104.9]
['Dak Prescott', 1072, 64.4, 6, 93.6]
['C.J. Stroud', 1054, 67.6, 6, 98.4]
['Baker Mayfield', 984, 70.5, 8, 106.9]
['Joe Burrow', 978, 70.9, 7, 105.9]
['Matthew Stafford', 978, 68.5, 2, 89.2]
['Sam Darnold', 932, 68.9, 11, 118.9]
['Jalen Hurts', 930, 68.2, 4, 85.7]
['Patrick Mahomes', 904, 68.6, 6, 89.7]
['Jayden Daniels', 897, 82.1, 3, 107.4]
['Daniel Jones', 881, 63.2, 4, 80.8]
['Gardner Minshew', 877, 70.7, 3, 88.7]
['Kirk Cousins', 864, 64.7, 4, 83.5]
['Lamar Jackson', 858, 66.7, 5, 102.3]
['Aaron Rodgers', 849, 64.1, 5, 92.9]
['Justin Fields', 830, 70.6, 3, 98.0]
['Derek Carr', 824, 72.0, 6, 103.9]
['Josh Allen', 814, 69.3, 7, 116.5]
['Geno Smith', 787, 74.8, 3, 93.8]
['Caleb Williams', 787, 61.7, 3, 72.0]
['Kyler Murray', 777, 69.4, 6, 104.6]
['Trevor Lawrence', 729, 53.3, 4, 78.9]
['Deshaun Watson', 727, 61.5, 4, 74.4]
['Jared Goff', 723, 66.0, 3, 79.2]
['Bo Nix', 660, 60.1, 1, 62.5]
