<h2> Part A: Web Scraping Table From www.fantasypros.com </h2>

<h> Gianna Capriotti, Blanche Chung, Filip Perkowski, Jillianna Richcrick </h>


The code below pulls NFL wide receiver data from the 2021-2022 fantasy football season from fantasypros.com. 
It first downloads the HTML from the website. It then pulls all the relevant data from the Wide Receiver Table and writes it to a file titled 'wideRec2021.csv' for analysis in Part B. 

In [1]:
#requests etc. are installed so there no need to install them in our terminal -> pip install requests etc.
#All we need to do is import/include these packages in our juptyer notebook/python file since they aren't packages that come basic when running python

# web-scraping wide receiver data
# Imports pandas, requests, and bs4 packages and creates a soup object from the url request response

import pandas as pd #Importing our modules; pd alias/another name for panda
import requests
from bs4 import BeautifulSoup
 
URL = "https://www.fantasypros.com/nfl/stats/wr.php" #Setting up our URL from which we wish to extract data from
r = requests.get(URL) #Creating our response object with the URL as the argument
 
#Using content on r on our response object r we get the html content in byte form
soup = BeautifulSoup(r.content, 'html5lib') # If this line causes an error, run 'pip install html5lib' or install html5lib
print(soup.prettify()) #We create our Beautiful Soup object soup using html5lib as the parser

#Then we use the pretify() function on our Beautiful Soup object to get our html code formatted to a single unicode string





<!DOCTYPE html>
<html lang="en">
 <head>
  <title>
   2021 NFL WR Statistics | Fantasy Football | FantasyPros
  </title>
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <meta content="View wide receiver stats for the 2021 NFL season. Find out who the leaders are in standard scoring formats and see which players are available in your fantasy football league." name="description"/>
  <link href="https://www.fantasypros.com/nfl/stats/wr.php" rel="canonical"/>
  <link href="/apple-touch-icon.png" rel="icon" sizes="192x192"/>
  <link href="/favicon.ico" rel="icon" sizes="32x32"/>
  <meta content="184352014941166" property="fb:pages"/>
  <meta content="width=device-width, initial-scale=1.0" name="viewport"/>
  <script type="text/javascript">
   window.FP=window.FP||{pageData:{width:window.innerWidth},readCookie:function(e){e=e.replace(/([.*+?^=!:${}()|[\]/\\])/g,"\\$1");var t=new RegExp("(?:^|;)\\s?"+e+"=(.*?)(?:;|$)","i"),a=document.cookie.match(t);return a&&unescape

In [8]:
# Locates the table of interest and prints the portion of html containing the table

#Below we List whose elements are tags that fufill the following class attribute
table = soup.find('div', attrs = {'class':'mobile-table double-header'}) 

for row in table: #Printing out the elements in our list
    print(row)


                    
<table border="0" cellpadding="0" cellspacing="0" class="table table-bordered table-striped table-hover" id="data">
                        <thead>
                        	                        	<!-- omit from excel download with tier-row -->
                        	<tr class="tier-row">
                        		<td> </td>
                                <td> </td>
                                <td colspan="7" style="text-align:center;"><small><b>RECEIVING</b></small></td><td colspan="3" style="text-align:center;"><small><b>RUSHING</b></small></td><td colspan="5" style="text-align:center;"><small><b>MISC</b></small></td>                            </tr>
                                                        <tr>
                            <th class="rank">Rank</th><th class="player-label">Player</th><th><small>REC</small></th>
<th><small>TGT</small></th>
<th><small>YDS</small></th>
<th><small>Y/R</small></th>
<th><small>LG</small></th>
<th><small>20+</sma

In [9]:
df = pd.DataFrame(columns=['Rank', 'Player Name', 'Receptions', 'Targets', 'Yards', 'Yards per Reception', 'Longest', 
                           'Receiving Touchdowns', 'Rushing Attempts', 'Rushing Yards', 'Rushing Touchdowns', 'Fumbles Lost', 
                           'Games Played', 'FPTS', 'FPTS/Game', 'Rostered'])

# Collecting Ddata
for row in table.tbody.find_all('tr'):    
    # Find all data for each column
    
    columns = row.find_all('td')
    
    if(columns != []):
        rank = columns[0].text.strip()
        player_name = columns[1].text.strip()
        rec = columns[2].text.strip()
        target = columns[3].text.strip()
        yards = columns[4].text.strip()
        ypr = columns[5].text.strip()
        lg = columns[6].text.strip()
        touchdowns = columns[8].text.strip()
        attempts = columns[9].text.strip()
        rYards = columns[10].text.strip()
        rTDs = columns[11].text.strip()
        fl = columns[12].text.strip()
        games = columns[13].text.strip()
        fpts = columns[14].text.strip()
        fptsPG = columns[15].text.strip()
        rostered = columns[16].text.strip()
        
        
        df = df.append({'Rank': rank, 'Player Name': player_name, 'Receptions': rec, 'Targets': target, 'Yards': yards, 
                        'Yards per Reception': ypr, 'Longest': lg, 'Receiving Touchdowns': touchdowns, 'Rushing Attempts': attempts, 
                        'Rushing Yards': rYards, 'Rushing Touchdowns': rTDs, 'Fumbles Lost': fl, 'Games Played': games, 'FPTS': fpts,
                        'FPTS/Game': fptsPG, 'Rostered': rostered}, ignore_index=True)

df.head()


Unnamed: 0,Rank,Player Name,Receptions,Targets,Yards,Yards per Reception,Longest,Receiving Touchdowns,Rushing Attempts,Rushing Yards,Rushing Touchdowns,Fumbles Lost,Games Played,FPTS,FPTS/Game,Rostered
0,1,Cooper Kupp (LAR),145,191,1947,13.4,59,16,4,18,0,0,17,294.5,17.3,99.9%
1,2,Deebo Samuel (SF),77,121,1405,18.2,83,6,59,365,8,2,16,262.0,16.4,99.9%
2,3,Ja'Marr Chase (CIN),81,128,1455,18.0,82,13,7,21,0,1,17,223.6,13.2,99.9%
3,4,Justin Jefferson (MIN),108,167,1616,15.0,56,10,6,14,0,1,17,222.4,13.1,99.9%
4,5,Davante Adams (LV),123,169,1553,12.6,59,11,0,0,0,0,16,221.3,13.8,99.9%


In [10]:
# Creates a new csv file with the data

import os.path
from os import path

#if statement can be deleted
if not path.exists('wideRec2021.csv'):
    df.to_csv('wideRec2021.csv', encoding='utf-8',index=False)        