# Merging All-NBA data with advanced and per game data

- In this notebook we are simply adding a column to our advanced and per game datasets that will represent if a player was selected to an All-NBA team. The value will be a '1' for a first-team selection, '2' for a second-team selection, '3' for a third-team selection, and '0' by default.

In [1]:
# Load imports
import numpy as np
import pandas as pd

- All-NBA data uses a string to represent the year, i.e. '2018-19'. We create a simple dictionary to map last two chars of season to numerical year. This allows for easier lookup into our stats datasets, since those use numerical years.
- We do the same for our team, simple dictionary map from string to numerical

In [2]:
# Year dictionary
YEAR = {'19':2019, '18':2018, '17':2017, '16':2016, '15':2015, '14':2014,
        '13':2013, '12':2012, '11':2011, '10':2010, '09':2009, '08':2008,
        '07':2007, '06':2006, '05':2005, '04':2004, '03':2003, '02':2002,
        '01':2001, '00':2000, '99':1999, '98':1998, '97':1997, '96':1996,
        '95':1995, '94':1994, '93':1993, '92':1992, '91':1991, '90':1990}

# Team dictionary
TEAM = {'1st':1, '2nd':2, '3rd':3}

- Read in our data. We only have statistical data from 1990, so we drop all-nba data from before then:

In [3]:
# Read in data
all_nba = pd.read_csv("../data/all_nba_teams.csv")
adv = pd.read_csv("../data/advanced.csv")
pgm = pd.read_csv("../data/per_game.csv")

# Drop all years before 1990 (last 18 rows)
all_nba.drop(all_nba.tail(18).index,inplace=True) # drop last n rows

# Double check (last three years should be from 89-90)
all_nba.tail()

Unnamed: 0,Season,Lg,Tm,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7
85,1990-91,NBA,2nd,Patrick Ewing,Chris Mullin,Dominique Wilkins,Clyde Drexler,Kevin Johnson
86,1990-91,NBA,3rd,Hakeem Olajuwon,Bernard King,James Worthy,Joe Dumars,John Stockton
87,1989-90,NBA,1st,Patrick Ewing,Charles Barkley,Karl Malone,Magic Johnson,Michael Jordan
88,1989-90,NBA,2nd,Hakeem Olajuwon,Larry Bird,Tom Chambers,Kevin Johnson,John Stockton
89,1989-90,NBA,3rd,David Robinson,Chris Mullin,James Worthy,Clyde Drexler,Joe Dumars


- Add a column to our stat datasets to represent the all_nba team, set all to '0' by default

In [4]:
# Add all_nba column to advanced and per games
adv['All_NBA'] = 0
pgm['All_NBA'] = 0

### Iterate through our all_nba data and update the advanced and per_game datasets


In [5]:
# This is a helper function that takes a season, list of players, and what team from 
# our All-NBA data, looks it up and updates column accordingle
def update(year, players, team):
    for player in players:
        adv.loc[(adv['Season'] == year) & (adv['Player'] == player), 'All_NBA'] = TEAM[team]
        pgm.loc[(pgm['Season'] == year) & (pgm['Player'] == player), 'All_NBA'] = TEAM[team]

In [6]:
# Iterate through all rows in all-nba data and call update to update advanced and per_game datasets
for index, row in all_nba.iterrows():
    season = YEAR[row['Season'][-2:]]
    players = [row[3], row[4], row[5], row[6], row[7]]
    team = row[2]
    update(season, players, team)

In [7]:
# Save data
adv.to_csv("../data/advanced.csv", index=False)
pgm.to_csv("../data/per_game.csv", index=False)