# Basketball-Reference Scraper Overview
The following code will walk you through how to scrape NBA player and game data from www.basketball-reference.com and input it into a Pandas database. In other scripts within this directory, we will be using the data captured here to run analyses that will help the average user ask both broad and specific questions related to the NBA. We will probe on what statistics and criteria are important for an NBA team to win an NBA championship, how the league has evolved year-over-year, touch on the GOAT debate, and ultimately, build algorithms that can (hopefully) help us all beat Vegas lines consistently so that we can all retire from our day jobs and gamble on the NBA for the rest of our careers. 

None of this could have been done without the tireless and comprehensive effort of those who work at [Basketball Reference](http://www.basketball-reference.com) providing an open-source, API-friendly database containing millions of datapoints from which the entirety of this codebase is built. 

For any questions/concerns, feel free to reach out to me directly at rahim.hashim@columbia.edu. And in the case that this is useful to anyone for future projects, please give credit where credit is due, both to [Basketball Reference](http://www.basketball-reference.com) and myself. Enjoy!

***
## The Basics
__Jupyter Notebook__: All of the following code is hosted in a Python 3 Jupyter Notebook. It is recommended to use Anaconda to access the Notebook in order to have synchronously have access to all Python Libraries used in the rest of the code. 

In order to execute and compile code in the notebook, go to the desired code box and press _Shift_ + _Enter_ at the same time. All code below is recommended to be executed from top to bottom in order.

__Python Libraries__: Python is a beautiful language for a number of reasons, one of which is it's immense
amount of pre-built libraries that do much of the heavy lifting in any web-scraping /
data analysis project. When getting familiar with Python and starting a new project, be
sure to look through the internet for a Python library that may help. A comprehensive list
that I often refer to before starting a project is here: [https://github.com/vinta/awesome-python](https://github.com/vinta/awesome-python)

__Installing Libraries__: In case you receive an error upon trying to execute the following box, such as _ModuleNotFoundError: No module named 'numpy'_, go back to your terminal and open a new tab, and install the library using pip: _pip install numpy_

In [1]:
%reload_ext autoreload
import re
import os
import sys
import requests
import datetime
import time
import threading
import importlib
import numpy as np
import pandas as pd
import pickle
from bs4 import BeautifulSoup
from pprint import pprint
from timeit import timeit
import matplotlib.pyplot as plt
from collections import Counter, OrderedDict, defaultdict
from string import ascii_lowercase

ROOT = '/content/drive/MyDrive/Projects/nba-prediction-algorithm/NBA-Prediction-Algorithms/'

def add_helpers():
  '''
  add_helper mounts google drive and adds
  helper functions to the sys.path
  '''

  # if running on juypter/google colab, mount to google drive
  if 'google.colab' in str(get_ipython()): 
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)

  helper_dir_path = ROOT + 'helper/'
  print('\nHelpers:')
  pprint(sorted(os.listdir(helper_dir_path)))
  sys.path.append(helper_dir_path) # set to path of notebook

add_helpers()

Mounted at /content/drive

Helpers:
['PlayerObject.py',
 'Regions.py',
 'TeamNames.py',
 '__pycache__',
 'bettingLinesScraper.py',
 'gameLogScraper.py',
 'meta_info_scraper.py',
 'player_info_scraper.py',
 'player_meta_scraper.py',
 'player_table_scraper.py',
 'seasonScraper.py',
 'teamsScraper.py']


***
## Class Instantiation
Another reason why Python is awesome is it's easy-to-use object-oriented programming. 
In case you aren't familiar with object oriented programming - _Classes_ and 
_Objects_ are the two main aspects of object oriented programming. A class creates a 
new unique and malleable type (e.g. int, string, list) with user-designated attributes. Objects are simply instances of the class. 

Here, the __Player__ class is initiated (from playerStatObjects), with defined attributes (e.g. name, draftYear...).
Once we scrape www.basketball-reference.com, we will create type-specific objects that will each have the following attributes. 

In [2]:
from PlayerObject import Player

***
## Creating Databases
Pandas databases are a powerful tool to query large amounts of data, as we will be doing here. For that reason, we are going to insert all of the data scraped above into a Pandas database. The below code will take player overview data from playerHash and insert it into player_df<br>
>For documentation on pandas: https://pypi.org/project/pandas/

***
## Scraping Player Data
### Biometrics and season + career statistics

playerScraper and metaDataScraper will be doing most of the work to scrape data on each player's background and physical attributes.<br>
> Example Overview Source (last name starting with a): https://www.basketball-reference.com/players/a/<br>
> Example meta-data (Karim Abdul-Jabbar): https://www.basketball-reference.com/players/a/abdulka01.html<br>
> For documentation on requests(): https://realpython.com/python-requests/<br>
> For documentation on BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

___Time Estimates:___ This is the most computationally-intensive function in the program, requiring many url requests in order to complete.<br>
>*Without threading:* ~1hr<br>
>*With threading:* ~15min<br>

In [9]:
from meta_info_scraper import meta_info_scraper
from tqdm.notebook import tqdm
import pickle as pickle

# Thread flag decides whether you want to use parallel processing or standard
THREAD_FLAG = True
WEBSITE_URL = 'https://www.basketball-reference.com/'
PLAYERS_ROOT_URL = 'https://www.basketball-reference.com/players/'
PLAYER_META_PICKLE = 'players_df_meta.pkl'
PLAYER_DATA_PICKLE = 'players_df_data.pkl'
SAVE_PATH = ROOT + PLAYER_DATA_PICKLE

def sizeof_fmt(num):
  for unit in [' ','KB','MB','GB','TB','PB','EB','ZB']:
    if abs(num) < 1024.0:
      return '%3.1f%s' % (num, unit)
    num /= 1024.0
  return '%.1f%s' % (num, 'Yi')

def scrape_player_data():
  
  # Read pickle
  if PLAYER_META_PICKLE in os.listdir(ROOT) and PLAYER_DATA_PICKLE in os.listdir(ROOT): 
    print('{} and {} already exists'.format(PLAYER_META_PICKLE, PLAYER_DATA_PICKLE))
    print('  Uploading...')
    df_players_meta = pd.read_pickle(ROOT+PLAYER_META_PICKLE)
    df_players_data = pd.read_pickle(ROOT+PLAYER_DATA_PICKLE)
  
  # Scrape all basketball-reference player data and pickle
  else:
    list_players_meta = []
    list_players_data = []
    urls_players = []
    for letter in ascii_lowercase:
        url = PLAYERS_ROOT_URL + letter
        urls_players.append(url)

    start_datetime = datetime.datetime.now()
    start_time = time.time()
    print ('Running meta_info_scraper.py')
    print ('  Start Time:', str(start_datetime.time())[:11])

    # Sequential-Processing
    if THREAD_FLAG == False:
      print('  Threading inactivated...')
      for url in urls_players:
        list_players_meta, list_players_data = meta_info_scraper(url, list_players_meta, list_players_data)

    # Parallel-Processing
    else:
      print('  Threading activated...')
      threads = []
      for url in urls_players:
        thread = threading.Thread(target=meta_info_scraper, args=(url,list_players_meta,list_players_data,))
        threads += [thread]
        thread.start()
      for thread in threads:
        thread.join() # makes sure that the main program waits until all threads have terminated
    end_time = time.time()
    print ('  Run Time: {} min'.format(str((end_time - start_time)/60)[:6]))
    
    # Concatenate all meta info and player data into two DataFrames
    print ('  Concatenating DataFrames')
    df_players_meta = None
    df_players_data = None
    for (df_meta, df_data) in tqdm(list(zip(list_players_meta, list_players_data))):
      df_players_meta = pd.concat([df_players_meta,df_meta])
      df_players_data = pd.concat([df_players_data,df_data])
    print ('  Concatenating complete')

    print('Saving {}'.format(PLAYER_META_PICKLE))
    print('  Path: {}'.format(ROOT+PLAYER_META_PICKLE))
    df_players_meta.to_pickle(ROOT+PLAYER_META_PICKLE)
    print('Saving {}'.format(PLAYER_DATA_PICKLE))
    print('  Path: {}'.format(ROOT+PLAYER_DATA_PICKLE))
    df_players_data.to_pickle(ROOT+PLAYER_DATA_PICKLE)

  print('  Size (meta info): {}'.format(sizeof_fmt(sys.getsizeof(df_players_meta))))
  print('  Size (player data): {}'.format(sizeof_fmt(sys.getsizeof(df_players_data))))
  print('Complete.')

  # Return Players DataFrame   
  return df_players_meta, df_players_data

df_players_meta, df_players_data = scrape_player_data()

Running meta_info_scraper.py
  Start Time: 17:47:52.25
  Threading activated...
	  x' Players Captured:  0
	  q' Players Captured:  6
	  u' Players Captured:  11
	  z' Players Captured:  20
	  y' Players Captured:  19
	  i' Players Captured:  26
	  v' Players Captured:  59
	  o' Players Captured:  95
	  e' Players Captured:  106
	  n' Players Captured:  105
	  f' Players Captured:  148
	  k' Players Captured:  170
	  a' Players Captured:  172
	  t' Players Captured:  193
	  l' Players Captured:  195
	  p' Players Captured:  217
	  d' Players Captured:  242
	  g' Players Captured:  246
	  r' Players Captured:  253
	  j' Players Captured:  238
	  c' Players Captured:  301
	  h' Players Captured:  351
	  w' Players Captured:  373
	  s' Players Captured:  420
	  b' Players Captured:  468
	  m' Players Captured:  463
  Run Time: 44.255 min
  Concatenating DataFrames


HBox(children=(FloatProgress(value=0.0, max=4897.0), HTML(value='')))


  Concatenating complete
Saving players_df_data.pkl
  Path: /content/drive/MyDrive/Projects/nba-prediction-algorithm/NBA-Prediction-Algorithms/players_df_data.pkl
  Size (meta info): 1.7MB
  Size (player data): 451.1MB
Complete.


In [7]:
from bs4 import BeautifulSoup
from player_table_scraper import player_table_scraper

def single_player_search():
  player_name = input()
  last_name_letter = player_name.split()[0]
  letter_url = PLAYERS_ROOT_URL + last_name_letter
  letter_response = requests.get(letter_url)
  playerTableAll = letter_response.find_all('tr')
  for index, row in enumerate(playerTable):
   row_player_name = re.findall('.html">(.*?)</a>', str(row))[0]
   if player_name == row_player_name:
    playerURL = re.findall('a href="(.*?)">', str(row))
    playerURL = WEBSITE_URL + playerURL[0]
    player_meta_info, df_player = player_info_scraper(player_name, playerURL)
    print('{} found. DataFrames generated'.format(player_name))
    return player_meta_info, df_player
  else:
    print('{} not found'.format(player_name))
    return None, None

single_player_info, single_player_df = single_player_search()

0    Michael Bytzura
1    Michael Bytzura
2    Michael Bytzura
3    Michael Bytzura
4    Michael Bytzura
5    Michael Bytzura
Name: player_name, dtype: object
  data_type season_playoffs      player_name  ...   dws    ws ws_per_48
0  per_game          season  Michael Bytzura  ...   NaN   NaN       NaN
1  per_game          season  Michael Bytzura  ...   NaN   NaN       NaN
2    totals          season  Michael Bytzura  ...   NaN   NaN       NaN
3    totals          season  Michael Bytzura  ...   NaN   NaN       NaN
4  advanced          season  Michael Bytzura  ...  -0.4  -1.4          
5  advanced          season  Michael Bytzura  ...  -0.4  -1.4          

[6 rows x 42 columns]


***
## Example Queries (Simple)

The following are example queries we can make across all of the generated tables. As can be seen below, the structure of the DataFrame allows for immense flexibility and speed gains as compared to looking at the website itself. We will utilize this structure for more specific trend-, team-, and era- related investigations.

In [10]:
# Player Meta Query
df_career = df_players_data.loc[(df_players_data['season']=='Career') & 
                   (df_players_data['season_playoffs']=='season') &
                   (df_players_data['data_type']=='advanced')]

nan_value = float("NaN")
df_career.replace("", nan_value, inplace=True)
df_career.dropna(how='all', axis='columns')
df_career.head(10)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  method=method,


Unnamed: 0,data_type,season_playoffs,player_name,season,age,team_id,lg_id,pos,g,gs,mp_per_g,fg_per_g,fga_per_g,fg_pct,ft_per_g,fta_per_g,ft_pct,trb_per_g,ast_per_g,pf_per_g,pts_per_g,mp,fg,fga,ft,fta,trb,ast,pf,pts,per,ts_pct,fta_per_fga_pct,orb_pct,drb_pct,trb_pct,ast_pct,DUMMY,ows,dws,ws,ws_per_48,fg3_per_g,fg3a_per_g,fg3_pct,fg2_per_g,fg2a_per_g,fg2_pct,efg_pct,orb_per_g,drb_per_g,stl_per_g,blk_per_g,tov_per_g,fg3,fg3a,fg2,fg2a,orb,drb,stl,blk,tov,fg3a_per_fga_pct,stl_pct,blk_pct,tov_pct,usg_pct,obpm,dbpm,bpm,vorp,trp_dbl
5,advanced,season,Joe Fabel,Career,,,BAA,,30,,,,,,,,,,,,,,,,,,,,,,,0.293,0.271,,,,,,-0.1,-0.1,-0.2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
5,advanced,season,Edwin Ubiles,Career,,,NBA,,4,,,,,,,,,,,,,52.0,,,,,,,,,7.3,0.362,0.167,4.3,17.5,10.9,2.9,,-0.1,0.0,0.0,-0.026,,,,,,,,,,,,,,,,,,,,,,0.278,1.0,1.5,4.9,17.5,-2.4,-1.5,-3.9,0.0,
5,advanced,season,Chris Babb,Career,,,NBA,,14,,,,,,,,,,,,,132.0,,,,,,,,,3.7,0.367,0.0,3.3,11.5,7.3,3.3,,-0.1,0.1,0.0,-0.008,,,,,,,,,,,,,,,,,,,,,,0.9,2.3,0.0,9.1,11.1,-6.0,-0.1,-6.1,-0.1,
11,advanced,season,Hamady N'Diaye,Career,,,NBA,,33,,,,,,,,,,,,,157.0,,,,,,,,,3.9,0.483,0.867,5.0,13.2,9.0,2.6,,-0.1,0.1,0.0,0.013,,,,,,,,,,,,,,,,,,,,,,0.0,0.7,4.5,22.5,7.5,-7.6,0.3,-7.3,-0.2,
12,advanced,season,Zhou Qi,Career,,,NBA,,19,,,,,,,,,,,,,125.0,,,,,,,,,2.5,0.313,0.364,5.5,14.3,9.9,2.1,,-0.5,0.2,-0.2,-0.093,,,,,,,,,,,,,,,,,,,,,,0.576,0.8,9.5,20.7,17.0,-9.9,0.3,-9.6,-0.2,
8,advanced,season,Ivan Rabb,Career,,,NBA,,85,,,,,,,,,,,,,1237.0,,,,,,,,,16.3,0.591,0.288,11.1,22.6,16.7,11.3,,1.9,1.4,3.2,0.125,,,,,,,,,,,,,,,,,,,,,,0.041,1.2,2.1,14.6,17.4,-1.0,-0.7,-1.7,0.1,
8,advanced,season,Dean Wade,Career,,,NBA,,75,,,,,,,,,,,,,1283.0,,,,,,,,,11.5,0.58,0.123,3.5,17.2,10.2,8.6,,1.2,1.0,2.2,0.082,,,,,,,,,,,,,,,,,,,,,,0.67,1.4,1.7,8.8,12.7,-1.1,-0.2,-1.2,0.3,
14,advanced,season,Mfiondu Kabengele,Career,,,NBA,,51,,,,,,,,,,,,,344.0,,,,,,,,,11.2,0.519,0.207,4.1,18.7,11.4,8.0,,0.0,0.4,0.4,0.054,,,,,,,,,,,,,,,,,,,,,,0.579,1.4,3.9,11.4,18.8,-3.2,0.0,-3.3,-0.1,
12,advanced,season,Charles O'Bannon,Career,,,NBA,,48,,,,,,,,,,,,,399.0,,,,,,,,,11.7,0.444,0.184,9.6,10.6,10.1,12.6,,0.2,0.4,0.6,0.07,,,,,,,,,,,,,,,,,,,,,,0.032,1.5,0.8,11.2,17.9,-1.6,-0.5,-2.0,0.0,
12,advanced,season,Joe Pace,Career,,,NBA,,79,,,,,,,,,,,,,557.0,,,,,,,,,14.3,0.513,0.626,11.9,18.3,15.0,6.6,,0.3,0.8,1.1,0.092,,,,,,,,,,,,,,,,,,,,,,,1.1,3.4,19.6,19.3,-2.7,0.6,-2.1,0.0,


In [11]:
# Player Data Query
df_large = df_players_meta.loc[(df_players_meta['height']>80) & 
                   (df_players_meta['weight']>30)]

df_large.replace("", nan_value, inplace=True)
df_large.dropna(how='all', axis='columns')
df_large.head(10)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  method=method,


Unnamed: 0,player_name,draft_year,retire_year,height,weight,birth_date,college
0,Hamady N'Diaye,2011,2014,84,235,"January 12, 1987",Rutgers University
0,Zhou Qi,2018,2019,85,210,"January 16, 1996",
0,Ivan Rabb,2018,2019,82,220,"February 4, 1997",California
0,Dean Wade,2020,2021,81,228,"November 20, 1996",Kansas State
0,Mfiondu Kabengele,2020,2021,81,250,"August 14, 1997",Florida State
0,Joe Pace,1977,1978,82,220,"December 18, 1953",Coppin State University
0,Žarko Čabarkapa,2004,2006,83,235,"May 21, 1981",
0,Alaa Abdelnaby,1991,1995,82,240,"June 24, 1968",Duke
0,Rudy Hackett,1976,1977,81,210,"May 10, 1953",Syracuse
0,Arvydas Sabonis,1996,2003,87,279,"December 19, 1964",


***
## Scraping Game Data
### Game-logs and team statistics

In [None]:
import seasonScraper
from teamsScraper import teamsScraper

seasonsHash = defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))

YEAR_START = 1947
YEAR_CURRENT = 2021
LEAGUES = ['NBA', 'ABA']

urls_seasons = []
for year in range(YEAR_START, YEAR_CURRENT):
    # Easiest solution for exception years in which both the NBA and ABA existed (i.e. 1967-1976)
    stem = 'https://www.basketball-reference.com/leagues/'
    for league in LEAGUES:
        # Example url = https://www.basketball-reference.com/leagues/NBA_2020.html
        url = stem + league + '_'+ str(year) + '.html'
        urls_seasons.append(url)

start_datetime = datetime.datetime.now()
start_time = time.time()
print ('seasonScraper')
print ('   Start Time:', str(start_datetime.time())[:11])

'''
Thread flag decides whether you want to
use parallel processing or standard
'''
thread_flag = False

'''Dictionary of all NBA teams'''
teamsHash = teamsScraper() 

# Sequential-Processing
if thread_flag == False:
    print('    Threading inactivated')
    for url in urls_seasons:
        league = url[-13:-10]
        year = url[-9:-5]
        seasonsHash[league][year] = seasonScraper.seasonInfoScraper(url, seasonsHash)
        print(f'      Scraping NBA Season: {year}\r', end="")
    print()
# Parallel-Processing
else:
    print('    Threading activated')
    threads = []
    for url in urls_seasons:
        thread = threading.Thread(target=seasonInfoScraper, args=(url,seasonsHash,))
        threads += [thread]
        thread.start()
    for thread in threads:
        thread.join() # makes sure that the main program waits until all threads have terminated
end_time = time.time()
print ('   Run Time:', str((end_time - start_time)/60)[:6], 'min')

ModuleNotFoundError: ignored

***
## Data Organization
To help us understand how all the data is organized, here's a visual:

In [None]:
df_career = df_players_all.loc[(df_players_all['season']=='Career') & 
                   (df_players_all['season_playoffs']=='season') &
                   (df_players_all['data_type']=='advanced')]

df_career.dropna(axis='columns')

***
## Meta-Data Analysis
Now that we've scraped all the meta-info on each player, we can start running analyses.

Below, a few simple analyses are included to help you get started. The first set of graphs examine height distribution (left), weight distribution (middle), and shooting handedness (right).

In [None]:
from Regions import stateDict #stateDict is a Dictionary to help with geography-based analyses
def metaPlot():
    height_list = []; weight_list = []
    rightCount = 0; leftCount = 0; noHandCount = 0
    for player in playersHash.keys():
        try:
            height_list.append(int(playersHash[player]['meta_info'].height))
        except:
            pass
        try:
            weight_list.append(int(playersHash[player]['meta_info'].weight))
        except:
            pass
        if playersHash[player]['meta_info'].shootingHand == 'Right':
            rightCount+=1
        elif playersHash[player]['meta_info'].shootingHand == 'Left':
            leftCount+=1
        else:
            noHandCount+=1

    #Plot Height Distribution (1, Left)
    f, ax = plt.subplots(1,3)
    #Sets default plot size
    plt.rcParams['figure.figsize'] = (10,8)
    n1, bins1, patches1 = ax[0].hist(height_list, bins=20, density=True, histtype='bar', ec='black')
    #Converting y-axis labels from decimals to percents
    y_vals = ax[0].get_yticks(); ax[0].set_yticklabels(['{:3.1f}%'.format(y*100) for y in y_vals])
    #Converting x-axis labels from inches back to feet
    xticks1 = ['5-0', '5-6', '6-0', '6-6', '7-0', '7-6', '8-0']
    ax[0].set_xticks([60, 66, 72, 78, 84, 90, 96])
    ax[0].set_xticklabels(xticks1)
    ax[0].set_xlim([56,100])
    ax[0].set_xlabel('Height', fontweight='bold', labelpad=10)
    ax[0].set_ylabel('Percent of Players', fontweight='bold', labelpad=10)

    #Plot Weight Distribution (1, Middle)
    ax[1].hist(weight_list, bins='auto', density=True, histtype='bar', ec='black')
    y_vals = ax[1].get_yticks()
    ax[1].set_yticklabels(['{:3.1f}%'.format(y*100) for y in y_vals])
    xticks2 = ['150', '180', '210', '240', '270', '300', '330']
    ax[1].set_xticks([150, 180, 210, 240, 270, 300, 330])
    ax[1].set_xticklabels(xticks2)
    ax[1].set_xlim([120,360])
    ax[1].set_xlabel('Weight', fontweight='bold', labelpad=10)
    ax[1].set_ylabel('Percent of Players', fontweight='bold', labelpad=10)

    #Plot Shooting Handedness (1, Right)
    ax[2].bar([1,2,3], [rightCount,leftCount,noHandCount], ec='black')
    ax[2].set_xticks([1,2,3]); ax[2].set_xticklabels(['Right','Left', 'N/A'])
    ax[2].set_xlabel('Shooting Handedness', fontweight='bold', labelpad=10)
    ax[2].set_ylabel('Number of Players', fontweight='bold', labelpad=10)
    
    plt.tight_layout(pad=0.05, w_pad=4, h_pad=1.0)
    f.set_size_inches(18.5, 10.5, forward=True)
    plt.show()
        
metaPlot()

In [None]:
def geographyPlot():
    stateList = []; countryList = []
    for player in playersHash.keys():
        stateList.append(playersHash[player]['meta_info'].birthState)
        countryList.append(playersHash[player]['meta_info'].birthCountry)
    #stateList contains all players born in the US
    stateList = filter(lambda x: x != '', stateList)
    stateHash = dict(Counter(stateList))
    stateHash = OrderedDict(sorted(stateHash.items(), reverse=True, key=lambda t: t[1]))
    #countryList contains all players born in ex-US
    countryList = filter(lambda x: x != 'United States of America', countryList)
    countryList = filter(lambda x: x != '', countryList)
    countryHash = dict(Counter(countryList))
    countryHash = OrderedDict(sorted(countryHash.items(), reverse=True, key=lambda t: t[1]))


    #Plot Birth State of US-Born Players (2)
    f, ax = plt.subplots(1)
    stateList = stateHash.keys(); stateVals = stateHash.values()
    ax.bar(np.arange(len(stateList)), stateVals, ec='black')
    ax.set_xticks(np.arange(len(stateList)))
    ax.set_xticklabels(stateList, rotation=90, ha='right', fontsize=7)
    ax.set_xlabel('US State of Birth', fontweight='bold', labelpad=10)
    ax.set_ylabel('Number of Players', fontweight='bold', labelpad=10)
    plt.show();

    #Plot Birth Countries of non-US-Born Players (3)
    f, ax = plt.subplots(1)
    countryList = countryHash.keys(); countryVals = countryHash.values()
    ax.bar(np.arange(len(countryList)), countryVals, ec='black')
    ax.set_xticks(np.arange(len(countryList)))
    ax.set_xticklabels(countryList, rotation=90, ha='right', fontsize=7)
    ax.set_xlabel('Country of Birth', fontweight='bold', labelpad=10)
    ax.set_ylabel('Number of Players', fontweight='bold', labelpad=10)
    
    f.set_size_inches(18.5, 10.5, forward=True)
    plt.show()
    
geographyPlot()