**Theory**
Stephen Curry and co. changed the narrative for defense wins championships. What if offense always won championships, and a good defense helped? What if the phrase was always backwards - a myth? So to check this, I'd like to do a data exploration to see if this is somewhat true by looking at offense and defense features among other characteristics of a team and its players and coaches

In [1]:
import pandas as pd
import numpy as np

### Player and Coach Data
Below we extract player and coach data for the last ten seasons (2010-2019) from Sports Reference site. This includes:
- Player average statistics over the course of a season (includes postseason)
- Coach record data over the course of season and career for experience tracking (includes postseason)
- Last ten NBA champions and runner up along with main performers in that year's postseason

In [2]:
df_players = pd.DataFrame()
df_coaches = pd.DataFrame()

#get data - interested in last ten full seasons; remember 2011 lockout season
for i in range(2010, 2020):
    players = 'https://www.basketball-reference.com/leagues/NBA_'+str(i)+'_per_game.html#per_game_stats::none'
    coaches = 'https://www.basketball-reference.com/leagues/NBA_'+str(i)+'_coaches.html#NBA_coaches::none'
    df_player = pd.read_html(players, header=0)[0]
    df_coach = pd.read_html(coaches, header=0)[0]
    df_player['Year'] = i
    df_coach['Year'] = i
    df_players = pd.concat([df_players, df_player], ignore_index=True) #only get the dataframe
    df_coaches = pd.concat([df_coaches, df_coach], ignore_index=True) #only last ten seasons   

print(df_players.shape)
print(df_coaches.shape)

(6371, 31)
(352, 27)


In [3]:
#get data for the last ten champions (first table, first ten rows)
champions = pd.read_html('https://www.basketball-reference.com/playoffs/', header=1)[0][1:11]
del champions['Unnamed: 5']
print(champions.shape)
champions.head()

(10, 9)


Unnamed: 0,Year,Lg,Champion,Runner-Up,Finals MVP,Points,Rebounds,Assists,Win Shares
1,2019.0,NBA,Toronto Raptors,Golden State Warriors,K. Leonard,K. Leonard (732),D. Green (223),D. Green (187),K. Leonard (4.9)
2,2018.0,NBA,Golden State Warriors,Cleveland Cavaliers,K. Durant,L. James (748),D. Green (222),L. James (198),L. James (5.2)
3,2017.0,NBA,Golden State Warriors,Cleveland Cavaliers,K. Durant,L. James (591),K. Love (191),L. James (141),L. James (4.3)
4,2016.0,NBA,Cleveland Cavaliers,Golden State Warriors,L. James,K. Thompson (582),D. Green (228),R. Westbrook (198),L. James (4.7)
5,2015.0,NBA,Golden State Warriors,Cleveland Cavaliers,A. Iguodala,L. James (601),D. Howard (238),L. James (169),S. Curry (3.9)


### Players
High count, but players switch teams and get traded. More concerned with coaches. After reviewing structure of players tables, the **Rk** variable that is just a count is removed because we can count players in a season by using the newly added year column which is more informative.

### Coaches
The count seems higher than what is expected. The above implies that on average 35 coaches had a job each season. However, that is a higher turnover rate. There might be some columns with header values in there...

### Champions
All of the features are categorical. Numerical features can be extracted to get the top points, rebounds, assists, and win shares for the players that postseason if needed. This will be done along with changing the year column from floats to integers. We can also remove the league column because all data here is NBA data.

In [4]:
#Players table
del df_players['Rk']
df_players.head()

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Year
0,Arron Afflalo,SG,24,DEN,82,75,27.1,3.3,7.1,0.465,...,0.7,2.4,3.1,1.7,0.6,0.4,0.9,2.7,8.8,2010
1,Alexis Ajinça,C,21,CHA,6,0,5.0,0.8,1.7,0.5,...,0.2,0.5,0.7,0.0,0.2,0.2,0.3,0.8,1.7,2010
2,LaMarcus Aldridge,PF,24,POR,78,78,37.5,7.4,15.0,0.495,...,2.5,5.6,8.0,2.1,0.9,0.6,1.3,3.0,17.9,2010
3,Joe Alexander,SF,23,CHI,8,0,3.6,0.1,0.8,0.167,...,0.3,0.4,0.6,0.3,0.1,0.1,0.0,1.1,0.5,2010
4,Malik Allen,PF,31,DEN,51,3,8.9,0.9,2.3,0.397,...,0.7,0.9,1.6,0.3,0.2,0.1,0.4,1.3,2.1,2010


In [5]:
#Coaches table
df_coaches.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Seasons,Seasons.1,Unnamed: 5,Regular Season,Regular Season.1,Regular Season.2,Regular Season.3,...,Playoffs,Playoffs.1,Playoffs.2,Playoffs.3,Playoffs.4,Playoffs.5,Playoffs.6,Playoffs.7,Playoffs.8,Year
0,,,,w/ Franch,Overall,,Current Season,Current Season,Current Season,w/ Franchise,...,Current Season,Current Season,Current Season,w/ Franchise,w/ Franchise,w/ Franchise,Career,Career,Career,2010
1,Coach,Tm,,#,#,,G,W,L,G,...,G,W,L,G,W,L,G,W,L,2010
2,Mike Woodson,ATL,,6,6,,82,53,29,492,...,11,4,7,29,11,18,29,11,18,2010
3,Doc Rivers,BOS,,6,11,,82,50,32,492,...,24,15,9,71,41,30,86,46,40,2010
4,Larry Brown,CHA,,2,29,,82,44,38,164,...,4,0,4,4,0,4,235,120,115,2010


In [6]:
#seeing how far in titles that are not values go into the dataframe before removing
print(df_coaches.iloc[0:2].values)

[[nan nan nan 'w/ Franch' 'Overall' nan 'Current Season' 'Current Season'
  'Current Season' 'w/ Franchise' 'w/ Franchise' 'w/ Franchise' 'Career'
  'Career' 'Career' 'Career' nan 'Current Season' 'Current Season'
  'Current Season' 'w/ Franchise' 'w/ Franchise' 'w/ Franchise' 'Career'
  'Career' 'Career' 2010]
 ['Coach' 'Tm' nan '#' '#' nan 'G' 'W' 'L' 'G' 'W' 'L' 'G' 'W' 'L' 'W%'
  nan 'G' 'W' 'L' 'G' 'W' 'L' 'G' 'W' 'L' 2010]]


###   Labels
The coaching table needs to be transformed. There is data with two distinctions:
- Regular and playoff seasons -> key will be (R, P) e.g. Regular season wins = R-Wins
- Current and franchise count seasons -> (C, F)  e.g. Current season, regular season wins = C-Wins

Key:
- Current season, regular season = **CR**  
- Current season, playoffs = **CP**  
- Franchise, regular season = **FR**  
- Franchise, playoff = **FP**  
- Career regular season (experience) = **Car** 
- Career playoffs = **Car.P**

Index made below is based on header deciphered seen above

In [7]:
#renaming remaining useful columns after review and key definitions
cols = ['Coach', 'Team' ,'F-Seasons', 'Car-Seasons', 'CR-G', 'CR-W', 'CR-L', 'FR-G', 'FR-W', 'FR-L', 'Car-G', 'Car-W', 'Car-L', 'Car-W%', 'CP-G', 'CP-W', 'CP-L', 'FP-G', 'FP-W', 'FP-L', 'Car.P-G', 'Car.P-W', 'Car.P-L']

In [8]:
#get coaching data and transform each season's table before appending
df_coaches = pd.DataFrame()
for i in range(2010, 2020):
    coaches = 'https://www.basketball-reference.com/leagues/NBA_'+str(i)+'_coaches.html#NBA_coaches::none'
    df_coach = pd.read_html(coaches, header=0)[0]
    #remove empty columns (html tag extra columns carried over)
    del df_coach['Unnamed: 2'], df_coach['Unnamed: 5'], df_coach['Unnamed: 16']
    #rename columns and delete rows with names in them before appending to larger dataframe
    df_coach.columns = cols 
    df_coach = df_coach[2:]     
    df_coach['Year'] = i
    df_coaches = pd.concat([df_coaches, df_coach], ignore_index=True)

df_coaches.shape

(332, 24)

In [9]:
df_coaches.head()

Unnamed: 0,Coach,Team,F-Seasons,Car-Seasons,CR-G,CR-W,CR-L,FR-G,FR-W,FR-L,...,CP-G,CP-W,CP-L,FP-G,FP-W,FP-L,Car.P-G,Car.P-W,Car.P-L,Year
0,Mike Woodson,ATL,6,6,82,53,29,492,206,286,...,11,4,7,29,11,18,29,11,18,2010
1,Doc Rivers,BOS,6,11,82,50,32,492,280,212,...,24,15,9,71,41,30,86,46,40,2010
2,Larry Brown,CHA,2,29,82,44,38,164,79,85,...,4,0,4,4,0,4,235,120,115,2010
3,Vinny Del Negro,CHI,2,2,82,41,41,164,82,82,...,5,1,4,12,4,8,12,4,8,2010
4,Mike Brown,CLE,5,5,82,61,21,410,272,138,...,11,6,5,71,42,29,71,42,29,2010


In [10]:
#Champions table
champions['Year'] = champions['Year'].apply(lambda x: int(x)) #years into integers
del champions['Lg'] #delete league name col
champions.head()

Unnamed: 0,Year,Champion,Runner-Up,Finals MVP,Points,Rebounds,Assists,Win Shares
1,2019,Toronto Raptors,Golden State Warriors,K. Leonard,K. Leonard (732),D. Green (223),D. Green (187),K. Leonard (4.9)
2,2018,Golden State Warriors,Cleveland Cavaliers,K. Durant,L. James (748),D. Green (222),L. James (198),L. James (5.2)
3,2017,Golden State Warriors,Cleveland Cavaliers,K. Durant,L. James (591),K. Love (191),L. James (141),L. James (4.3)
4,2016,Cleveland Cavaliers,Golden State Warriors,L. James,K. Thompson (582),D. Green (228),R. Westbrook (198),L. James (4.7)
5,2015,Golden State Warriors,Cleveland Cavaliers,A. Iguodala,L. James (601),D. Howard (238),L. James (169),S. Curry (3.9)


In [11]:
#extract top values from points, rebounds, assists, and win shares
def extract_values(text):
    value = text.split('(')[1]
    value = value.split(')')[0]
    try: fin_value = int(value)
    except: fin_value = float(value)
    return fin_value

In [12]:
#rename columns
champions = champions.rename(columns = {'Points':'Top Scorer', 'Rebounds':'Top Rebr', 'Assists':'Top Asst', 
'Win Shares': 'WS Lead'})
#extract top performer values
champions['Points'] = champions['Top Scorer'].apply(lambda x: extract_values(x))
champions['Rebounds'] = champions['Top Rebr'].apply(lambda x: extract_values(x))
champions['Assists'] = champions['Top Asst'].apply(lambda x: extract_values(x))
champions['Win Shares'] = champions['WS Lead'].apply(lambda x: extract_values(x))

In [13]:
champions.head()

Unnamed: 0,Year,Champion,Runner-Up,Finals MVP,Top Scorer,Top Rebr,Top Asst,WS Lead,Points,Rebounds,Assists,Win Shares
1,2019,Toronto Raptors,Golden State Warriors,K. Leonard,K. Leonard (732),D. Green (223),D. Green (187),K. Leonard (4.9),732,223,187,4.9
2,2018,Golden State Warriors,Cleveland Cavaliers,K. Durant,L. James (748),D. Green (222),L. James (198),L. James (5.2),748,222,198,5.2
3,2017,Golden State Warriors,Cleveland Cavaliers,K. Durant,L. James (591),K. Love (191),L. James (141),L. James (4.3),591,191,141,4.3
4,2016,Cleveland Cavaliers,Golden State Warriors,L. James,K. Thompson (582),D. Green (228),R. Westbrook (198),L. James (4.7),582,228,198,4.7
5,2015,Golden State Warriors,Cleveland Cavaliers,A. Iguodala,L. James (601),D. Howard (238),L. James (169),S. Curry (3.9),601,238,169,3.9


In [14]:
#Find missing values
#need to create a dictionary for team names (some franchise names and abbreviations changed)
team_abbr = sorted(list(df_players.Tm.unique()))
team_abbr.remove('Tm')
len(team_abbr)

34

In [15]:
#team abbreviation keys and values
team_names = pd.read_html('https://en.wikipedia.org/wiki/wikipedia:WikiProject_National_Basketball_Association/National_Basketball_Association_team_abbreviations', header=0)[0]
team_names = team_names.rename(columns={'Abbreviation/Acronym': 'Key'})
team_names.head()

Unnamed: 0,Key,Franchise
0,ATL,Atlanta Hawks
1,BKN,Brooklyn Nets
2,BOS,Boston Celtics
3,CHA,Charlotte Hornets
4,CHI,Chicago Bulls


In [16]:
print(len(team_names))

30


In [17]:
#From the above, four teams had a change in the last ten years so that difference needs to be accounted for
missing = [team for team in team_abbr if team not in list(team_names.Key.values)]

print(missing)
#some names are already there, some new
#TOT - Total (for players who played for multiple teams in a season)
#CHO - Charlotte Hornets
#BRK - Brooklyn Nets
#NJN - New Jersey Nets
#PHO - Phoenix Suns
#NOH - New Orleans Hornets

#add the other names to teams df to work as a dictionary
missing_names = ['Brooklyn Nets', 'Charlotte Hornets', 'New Jersey Nets', 'New Orleans Hornets', 'Phoenix Suns', 'Total Averages']
size = len(team_names) + len(missing)

for i, key, value in zip(range(len(team_names), size), missing, missing_names):
    team_names.loc[i] = [key, value]

['BRK', 'CHO', 'NJN', 'NOH', 'PHO', 'TOT']


In [18]:
team_names.tail() #appears all have been added in correctly

Unnamed: 0,Key,Franchise
31,CHO,Charlotte Hornets
32,NJN,New Jersey Nets
33,NOH,New Orleans Hornets
34,PHO,Phoenix Suns
35,TOT,Total Averages


Combine the dataframes into one denormalized table including:
- Add an indicator of the team's status of champion (1), runner-up (2), or no finals appearance (0) to all players on the players table 
- To do this, need to connect the champions table to the teams one for the key

In [19]:
#merge the team-abbreviations for winner and runner-up to the team name
champ_data = champions.copy()
team_merge = team_names.copy()
team_merge['Champion'] = team_merge['Franchise']
team_merge['Runner-Up'] = team_merge['Franchise']

champ_data = pd.merge(champ_data, team_merge[['Key', 'Champion']], on=['Champion'])
champ_data = pd.merge(champ_data, team_merge[['Key', 'Runner-Up']], on=['Runner-Up'])
champ_data['Champ-Abbr'] = champ_data['Key_x']
champ_data['Runner-Up-Abbr'] = champ_data['Key_y']
del champ_data['Key_x'], champ_data['Key_y']

In [20]:
champions = champ_data.sort_values(by=['Year'], ascending=False).reset_index().drop(columns=['index'])
champions.head()

Unnamed: 0,Year,Champion,Runner-Up,Finals MVP,Top Scorer,Top Rebr,Top Asst,WS Lead,Points,Rebounds,Assists,Win Shares,Champ-Abbr,Runner-Up-Abbr
0,2019,Toronto Raptors,Golden State Warriors,K. Leonard,K. Leonard (732),D. Green (223),D. Green (187),K. Leonard (4.9),732,223,187,4.9,TOR,GSW
1,2018,Golden State Warriors,Cleveland Cavaliers,K. Durant,L. James (748),D. Green (222),L. James (198),L. James (5.2),748,222,198,5.2,GSW,CLE
2,2017,Golden State Warriors,Cleveland Cavaliers,K. Durant,L. James (591),K. Love (191),L. James (141),L. James (4.3),591,191,141,4.3,GSW,CLE
3,2016,Cleveland Cavaliers,Golden State Warriors,L. James,K. Thompson (582),D. Green (228),R. Westbrook (198),L. James (4.7),582,228,198,4.7,CLE,GSW
4,2015,Golden State Warriors,Cleveland Cavaliers,A. Iguodala,L. James (601),D. Howard (238),L. James (169),S. Curry (3.9),601,238,169,3.9,GSW,CLE


Now adding indicator for the end of season result (1,0=champ, 0,1=runnerup, 0,0=nada) for both the players and coaches tables

In [21]:
#one hot encoding for the end of season result for each player over the last ten seasons
df_players['Champ'] = 0
df_players['Runner-Up'] = 0
df_coaches['Champ'] = 0
df_coaches['Runner-Up'] = 0
champs = champions.copy()

for year, champ, ru in zip(champs.Year.values, champs['Champ-Abbr'].values, champs['Runner-Up-Abbr'].values):
    df_players['Champ'][(df_players.Tm == champ) & (df_players.Year ==year)] = 1
    df_players['Runner-Up'][(df_players.Tm == ru) & (df_players.Year ==year)] = 1
    df_coaches['Champ'][(df_coaches.Team == champ) & (df_coaches.Year ==year)] = 1
    df_coaches['Runner-Up'][(df_coaches.Team == ru) & (df_coaches.Year ==year)] = 1

In [22]:
#test to see if values added correctly = Thunder lost in the finals
df_players[(df_players.Tm == 'OKC') & (df_players.Year ==2012)]

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,TRB,AST,STL,BLK,TOV,PF,PTS,Year,Champ,Runner-Up
1251,Cole Aldrich,C,23,OKC,26,0,6.7,0.8,1.6,0.524,...,1.8,0.1,0.3,0.6,0.3,0.8,2.2,2012,0,1
1349,Nick Collison,PF,31,OKC,63,0,20.7,1.9,3.2,0.597,...,4.3,1.3,0.5,0.4,1.0,2.4,4.5,2012,0,1
1354,Daequan Cook,SG,24,OKC,57,22,17.4,1.9,5.2,0.368,...,2.1,0.3,0.4,0.2,0.3,1.2,5.5,2012,0,1
1390,Kevin Durant,SF,23,OKC,66,66,38.6,9.7,19.7,0.496,...,8.0,3.5,1.3,1.2,3.8,2.0,28.0,2012,0,1
1417,Derek Fisher,PG,37,OKC,20,0,20.4,1.9,5.4,0.343,...,1.5,1.4,0.6,0.1,0.8,1.6,4.9,2012,0,1
1467,James Harden,SG,22,OKC,62,2,31.4,5.0,10.1,0.491,...,4.1,3.7,1.0,0.2,2.2,2.4,16.8,2012,0,1
1479,Lazar Hayward,SF,25,OKC,26,0,5.4,0.5,1.5,0.342,...,0.6,0.2,0.1,0.0,0.3,0.7,1.4,2012,0,1
1515,Serge Ibaka,PF,22,OKC,66,66,27.2,4.0,7.4,0.535,...,7.5,0.4,0.5,3.7,1.2,2.7,9.1,2012,0,1
1519,Royal Ivey,SG,30,OKC,34,0,10.4,0.8,2.1,0.356,...,0.7,0.3,0.4,0.0,0.3,1.1,2.1,2012,0,1
1521,Reggie Jackson,PG,21,OKC,45,0,11.1,1.1,3.5,0.321,...,1.2,1.6,0.6,0.0,0.8,0.7,3.1,2012,0,1


In [23]:
#coaches -> see Steve Kerr's finals record
df_coaches[df_coaches.Coach == 'Steve Kerr']

Unnamed: 0,Coach,Team,F-Seasons,Car-Seasons,CR-G,CR-W,CR-L,FR-G,FR-W,FR-L,...,CP-L,FP-G,FP-W,FP-L,Car.P-G,Car.P-W,Car.P-L,Year,Champ,Runner-Up
177,Steve Kerr,GSW,1,1,82,67,15,82,67,15,...,5,21,16,5,21,16,5,2015,1,0
212,Steve Kerr,GSW,2,2,82,73,9,164,140,24,...,9,45,31,14,45,31,14,2016,0,1
245,Steve Kerr,GSW,3,3,82,67,15,246,207,39,...,1,62,47,15,62,47,15,2017,1,0
275,Steve Kerr,GSW,4,4,82,58,24,328,265,63,...,5,83,63,20,83,63,20,2018,1,0
310,Steve Kerr,GSW,5,5,82,57,25,410,322,88,...,8,105,77,28,105,77,28,2019,0,1


In [24]:
#check type -> need to be accurate for later computations
print('coaches table', type(df_coaches['F-Seasons'].iloc[0]))
print('players table', type(df_players['Age'].iloc[0]))

coaches table <class 'str'>
players table <class 'str'>


In [25]:
#see what columns need updating, and corresponding data type
df_coaches.dtypes
#should be 1 float(win pct), 2 str(coach, team), else all int

Coach          object
Team           object
F-Seasons      object
Car-Seasons    object
CR-G           object
CR-W           object
CR-L           object
FR-G           object
FR-W           object
FR-L           object
Car-G          object
Car-W          object
Car-L          object
Car-W%         object
CP-G           object
CP-W           object
CP-L           object
FP-G           object
FP-W           object
FP-L           object
Car.P-G        object
Car.P-W        object
Car.P-L        object
Year            int64
Champ           int64
Runner-Up       int64
dtype: object

In [26]:
#convert method
others = [i for i in list(df_coaches.columns.values) if i not in ['Team', 'Coach']]
for i in others: 
    df_coaches[i] = pd.to_numeric(df_coaches[i], errors='coerce')
df_coaches.dtypes

Coach           object
Team            object
F-Seasons        int64
Car-Seasons      int64
CR-G             int64
CR-W             int64
CR-L             int64
FR-G             int64
FR-W             int64
FR-L             int64
Car-G            int64
Car-W            int64
Car-L            int64
Car-W%         float64
CP-G           float64
CP-W           float64
CP-L           float64
FP-G           float64
FP-W           float64
FP-L           float64
Car.P-G        float64
Car.P-W        float64
Car.P-L        float64
Year             int64
Champ            int64
Runner-Up        int64
dtype: object

In [27]:
df_coaches.head()

Unnamed: 0,Coach,Team,F-Seasons,Car-Seasons,CR-G,CR-W,CR-L,FR-G,FR-W,FR-L,...,CP-L,FP-G,FP-W,FP-L,Car.P-G,Car.P-W,Car.P-L,Year,Champ,Runner-Up
0,Mike Woodson,ATL,6,6,82,53,29,492,206,286,...,7.0,29.0,11.0,18.0,29.0,11.0,18.0,2010,0,0
1,Doc Rivers,BOS,6,11,82,50,32,492,280,212,...,9.0,71.0,41.0,30.0,86.0,46.0,40.0,2010,0,1
2,Larry Brown,CHA,2,29,82,44,38,164,79,85,...,4.0,4.0,0.0,4.0,235.0,120.0,115.0,2010,0,0
3,Vinny Del Negro,CHI,2,2,82,41,41,164,82,82,...,4.0,12.0,4.0,8.0,12.0,4.0,8.0,2010,0,0
4,Mike Brown,CLE,5,5,82,61,21,410,272,138,...,5.0,71.0,42.0,29.0,71.0,42.0,29.0,2010,0,0


In [28]:
df_players.dtypes
#should be -> ints = Age, G, GS, Year, Champ, Runner-Up
#should be -> str = Player, Pos, Tm, else floats

Player       object
Pos          object
Age          object
Tm           object
G            object
GS           object
MP           object
FG           object
FGA          object
FG%          object
3P           object
3PA          object
3P%          object
2P           object
2PA          object
2P%          object
eFG%         object
FT           object
FTA          object
FT%          object
ORB          object
DRB          object
TRB          object
AST          object
STL          object
BLK          object
TOV          object
PF           object
PTS          object
Year          int64
Champ         int64
Runner-Up     int64
dtype: object

In [29]:
cols = list(df_players.columns.values)
ints = ['Age', 'G', 'GS', 'Year', 'Champ', 'Runner-Up']
others = [i for i in cols if i not in ['Player', 'Pos', 'Tm'] and i not in ints]
for s in ['Player', 'Pos', 'Tm']: df_players[s] = df_players[s].astype(str)
for i in ints: 
    df_players[i] = pd.to_numeric(df_players[i], errors='coerce')
    df_players[i] = df_players[i].astype('Int64')
for f in others: df_players[f] = pd.to_numeric(df_players[f], errors='coerce')
df_players.dtypes

Player        object
Pos           object
Age            Int64
Tm            object
G              Int64
GS             Int64
MP           float64
FG           float64
FGA          float64
FG%          float64
3P           float64
3PA          float64
3P%          float64
2P           float64
2PA          float64
2P%          float64
eFG%         float64
FT           float64
FTA          float64
FT%          float64
ORB          float64
DRB          float64
TRB          float64
AST          float64
STL          float64
BLK          float64
TOV          float64
PF           float64
PTS          float64
Year           Int64
Champ          Int64
Runner-Up      Int64
dtype: object

In [30]:
df_players.head()

Unnamed: 0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,TRB,AST,STL,BLK,TOV,PF,PTS,Year,Champ,Runner-Up
0,Arron Afflalo,SG,24,DEN,82,75,27.1,3.3,7.1,0.465,...,3.1,1.7,0.6,0.4,0.9,2.7,8.8,2010,0,0
1,Alexis Ajinça,C,21,CHA,6,0,5.0,0.8,1.7,0.5,...,0.7,0.0,0.2,0.2,0.3,0.8,1.7,2010,0,0
2,LaMarcus Aldridge,PF,24,POR,78,78,37.5,7.4,15.0,0.495,...,8.0,2.1,0.9,0.6,1.3,3.0,17.9,2010,0,0
3,Joe Alexander,SF,23,CHI,8,0,3.6,0.1,0.8,0.167,...,0.6,0.3,0.1,0.1,0.0,1.1,0.5,2010,0,0
4,Malik Allen,PF,31,DEN,51,3,8.9,0.9,2.3,0.397,...,1.6,0.3,0.2,0.1,0.4,1.3,2.1,2010,0,0


In [31]:
#save to files
df_players.to_csv('players.csv')
df_coaches.to_csv('coaches.csv')
champions.to_csv('champions.csv')
team_names.to_csv('team_names.csv')