# Pro Football Reference Lookup Tool

Pro Football Reference uses inconsistent player identification for the URL. Because of this, trying to automate some portion of making player queries can be a headache. Luckily, PFR lists all players by their last name in their "Players" section. From here, the names can be tabled, the url root can be assigned to each name, and other columns can be built to further distinguish players for lookup.

In [1]:
import pandas as pd
import requests
import httplib2
from bs4 import BeautifulSoup, SoupStrainer
from string import ascii_uppercase
import re

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))

## Step One: Create a Table with Name and URL Root

[Pro Football Reference's "Player" page](https://www.pro-football-reference.com/players/) will be used to get the name of every player in the site's database. This landing page only lists some players, but when a letter is selected, all players with last names starting with the letter will be available for viewing (and scraping!). Iterating through the alphabet will give access to every name, so a for loop will be utilized to accomplish that. From there, results need to be stored outside the loop.

In [2]:
names = [] # Empty list that will hold the plain text name for each player
links = [] # Empty list that will hold the url root for each player 


# Loop through every letter and strip the players and their url links 
for c in ascii_uppercase:
    url = f'https://www.pro-football-reference.com/players/{c}/' 
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')
    soup = soup.find(id='div_players') # the html element with id of "div_players" has all the links for each players page
    
    # find all links and store them into the lists
    for link in soup.findAll('a'):
        names.append(link.contents[0]) # add the text name to the names list
        links.append(link.get('href')) # add the url root to the links list

At this point, two corresponding links are populated where the data at index n of one list is the matching value for the data at index n of the other list. The lists are of matching lengths suggesting that there is no missing or errored value. From here, the lists can be combined into a table that can be built up with other identifiers.

In [6]:
reference_df = pd.DataFrame(zip(names, links), columns=['name','url_root'])

In [7]:
reference_df

Unnamed: 0,name,url_root
0,Isaako Aaitui,/players/A/AaitIs00.htm
1,Joe Abbey,/players/A/AbbeJo20.htm
2,Fay Abbott,/players/A/AbboFa20.htm
3,Vince Abbott,/players/A/abbotvin01.htm
4,Jared Abbrederis,/players/A/AbbrJa00.htm
...,...,...
26475,Jeremy Zuttah,/players/Z/ZuttJe20.htm
26476,Merle Zuver,/players/Z/ZuveMe20.htm
26477,Tony Zuzzio,/players/Z/ZuzzTo20.htm
26478,Brandon Zylstra,/players/Z/ZylsBr00.htm


The result of the code up to this point is a 26480 by 2 table where every name has a corresponding URL root. The URL root is valuable, but the full url link is even more useful for later operations. That full URL can readily be produced by building a column that combines the website address prefix with the URL root.

In [8]:
reference_df['full_url'] = 'https://www.pro-football-reference.com' + reference_df['url_root']

In [9]:
reference_df

Unnamed: 0,name,url_root,full_url
0,Isaako Aaitui,/players/A/AaitIs00.htm,https://www.pro-football-reference.com/players...
1,Joe Abbey,/players/A/AbbeJo20.htm,https://www.pro-football-reference.com/players...
2,Fay Abbott,/players/A/AbboFa20.htm,https://www.pro-football-reference.com/players...
3,Vince Abbott,/players/A/abbotvin01.htm,https://www.pro-football-reference.com/players...
4,Jared Abbrederis,/players/A/AbbrJa00.htm,https://www.pro-football-reference.com/players...
...,...,...,...
26475,Jeremy Zuttah,/players/Z/ZuttJe20.htm,https://www.pro-football-reference.com/players...
26476,Merle Zuver,/players/Z/ZuveMe20.htm,https://www.pro-football-reference.com/players...
26477,Tony Zuzzio,/players/Z/ZuzzTo20.htm,https://www.pro-football-reference.com/players...
26478,Brandon Zylstra,/players/Z/ZylsBr00.htm,https://www.pro-football-reference.com/players...


At this point, a player's name is enough information to retrieve a link to view their PFR page. This would be enough for lookup use, but it notably does not handle players with the same name well (it does not give the user a means to specify which player by the name they are looking for). Asking the user to provide the player name, team, and year should be sufficient for all but the rarest edge cases.  