# Introduction

The goal of this project is to visualise, how the impact of Players born NOT in USA has changed over the years. 

This project uses a kaggle datasets: https://www.kaggle.com/drgilermo/nba-players-stats. We can find 3 .csv files there. Seasons_stats.csv contains stats from every NBA player each season since 1949/1950. Two other files contain some basic informations about each player such as: name, birthdate, height, etc.
The list of international Players comes from Wikipedia: https://en.wikipedia.org/wiki/List_of_foreign_NBA_players

Important note: There is a lot of data that is missing, because not every stat is available since the first season. More details and explaination of each stat: https://www.basketball-reference.com/about/glossary.html

### Imports
Import libraries.

In [1]:
# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

from IPython import get_ipython
ipython = get_ipython()

# autoreload extension
if 'autoreload' not in ipython.extension_manager.loaded:
    %load_ext autoreload

%autoreload 2

# Visualizations
import plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(theme='white')

# Project specific imports
from unidecode import unidecode

# Analysis/Modeling
### Import statistics

In [2]:
stat = pd.read_csv('Seasons_Stats.csv')

### Basic info

In [3]:
stat.shape
stat.dtypes

(24691, 53)

Unnamed: 0      int64
Year          float64
Player         object
Pos            object
Age           float64
Tm             object
G             float64
GS            float64
MP            float64
PER           float64
TS%           float64
3PAr          float64
FTr           float64
ORB%          float64
DRB%          float64
               ...   
2PA           float64
2P%           float64
eFG%          float64
FT            float64
FTA           float64
FT%           float64
ORB           float64
DRB           float64
TRB           float64
AST           float64
STL           float64
BLK           float64
TOV           float64
PF            float64
PTS           float64
Length: 53, dtype: object

### Cleaning

In [4]:
stat = stat.rename(columns={'Unnamed: 0': 'Rk'})
stat.isna().sum()

Rk           0
Year        67
Player      67
Pos         67
Age         75
Tm          67
G           67
GS        6458
MP         553
PER        590
TS%        153
3PAr      5852
FTr        166
ORB%      3899
DRB%      3899
          ... 
2PA         67
2P%        195
eFG%       166
FT          67
FTA         67
FT%        925
ORB       3894
DRB       3894
TRB        379
AST         67
STL       3894
BLK       3894
TOV       5046
PF          67
PTS         67
Length: 53, dtype: int64

Dropping rows where Player is NaN

TODO: Do I have to explain it?

In [5]:
stat = stat.dropna(subset=['Player'])

Dropping two columns with no values

<span style="color:red">Question: Do I have to explain, how do I know that or should I proove it?</span>

In [6]:
stat = stat.drop(['blanl', 'blank2'], axis=1)

# delete asteriks from Hall of Famers
stat['Player'] = stat['Player'].str.replace('*', '')

### Import stats from 2018

Source:


stats_2018: https://www.basketball-reference.com/leagues/NBA_2018_totals.html

adv_2018: https://www.basketball-reference.com/leagues/NBA_2018_advanced.html

In [7]:
stats_2018 = pd.read_csv('ignore/stats_2018.csv')
adv_2018 = pd.read_csv('ignore/adv_2018.csv')

Today(10.07), I've spent a lot of time here trying to highlight all the non stat rows. I wanted to show, why I delete them.

In [8]:
stats_2018.loc[stats_2018.Player == 'Player'].style.applymap(lambda x: 'background-color: red')

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
20,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
47,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
73,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
98,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
127,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
152,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
175,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
202,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
225,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
252,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS


### Find rows without stats and delete them

In [9]:
stats_rows_to_delete = stats_2018.loc[stats_2018['Player'] == 'Player']
adv_rows_to_delete = adv_2018.loc[adv_2018['Player'] == 'Player']

stats_2018 = stats_2018.drop(stats_rows_to_delete.index)
adv_2018 = adv_2018.drop(adv_rows_to_delete.index)

Delete columns containing 'Unnamed'

In [10]:
adv_2018 = adv_2018.loc[:, ~adv_2018.columns.str.contains('Unnamed')]

In [11]:
cols = adv_2018.columns.difference(stats_2018.columns)
cols

Index(['3PAr', 'AST%', 'BLK%', 'BPM', 'DBPM', 'DRB%', 'DWS', 'FTr', 'OBPM',
       'ORB%', 'OWS', 'PER', 'STL%', 'TOV%', 'TRB%', 'TS%', 'USG%', 'VORP',
       'WS', 'WS/48'],
      dtype='object')

In [12]:
stats_2018 = stats_2018.join(adv_2018[cols])
stats_2018.shape
stats_2018.dtypes

(664, 50)

Rk        object
Player    object
Pos       object
Age       object
Tm        object
G         object
GS        object
MP        object
FG        object
FGA       object
FG%       object
3P        object
3PA       object
3P%       object
2P        object
           ...  
DRB%      object
DWS       object
FTr       object
OBPM      object
ORB%      object
OWS       object
PER       object
STL%      object
TOV%      object
TRB%      object
TS%       object
USG%      object
VORP      object
WS        object
WS/48     object
Length: 50, dtype: object

In [13]:
cols = stats_2018.columns.values[5:]
stats_2018[cols] = stats_2018[cols].apply(pd.to_numeric, errors='coerce')
stats_2018.dtypes

Rk         object
Player     object
Pos        object
Age        object
Tm         object
G           int64
GS          int64
MP          int64
FG          int64
FGA         int64
FG%       float64
3P          int64
3PA         int64
3P%       float64
2P          int64
           ...   
DRB%      float64
DWS       float64
FTr       float64
OBPM      float64
ORB%      float64
OWS       float64
PER       float64
STL%      float64
TOV%      float64
TRB%      float64
TS%       float64
USG%      float64
VORP      float64
WS        float64
WS/48     float64
Length: 50, dtype: object

#### A very handy function to find differences in column names

In [14]:
stat.columns.difference(stats_2018.columns)

Index(['Year'], dtype='object')

In [15]:
stats_2018['Year'] = 2018
stats_2018.tail()

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,...,BLK,TOV,PF,PTS,3PAr,AST%,BLK%,BPM,DBPM,DRB%,DWS,FTr,OBPM,ORB%,OWS,PER,STL%,TOV%,TRB%,TS%,USG%,VORP,WS,WS/48,Year
685,537,Tyler Zeller,C,28,BRK,42,33,703,125,229,0.546,10,26,0.385,115,203,0.567,0.568,40,60,0.667,63,131,194,28,...,21,35,78,300,0.114,6.5,2.2,-2.5,-0.6,20.0,0.6,0.262,-1.9,9.4,1.0,15.3,0.6,12.1,14.7,0.587,17.9,-0.1,1.5,0.105,2018
686,537,Tyler Zeller,C,28,MIL,24,1,406,62,105,0.59,0,2,0.0,62,103,0.602,0.59,17,19,0.895,47,64,111,19,...,14,12,48,141,0.019,7.0,2.9,-0.1,-0.4,18.4,0.3,0.181,0.3,13.6,1.1,17.1,0.9,9.6,16.0,0.622,13.9,0.2,1.4,0.163,2018
687,538,Paul Zipser,SF,23,CHI,54,12,824,81,234,0.346,37,110,0.336,44,124,0.355,0.425,19,25,0.76,13,118,131,46,...,15,43,86,218,0.47,8.0,1.6,-5.9,-0.3,16.0,0.6,0.107,-5.5,1.6,-1.1,5.2,1.2,14.9,8.5,0.445,15.2,-0.8,-0.6,-0.034,2018
688,539,Ante Žižić,C,21,CLE,32,2,214,49,67,0.731,0,0,,49,67,0.731,0.731,21,29,0.724,24,36,60,5,...,13,11,30,119,0.0,3.8,5.2,0.1,-1.2,18.6,0.2,0.433,1.3,12.8,0.9,24.2,0.5,12.1,15.7,0.746,18.8,0.1,1.0,0.231,2018
689,540,Ivica Zubac,C,20,LAL,43,0,410,61,122,0.5,0,1,0.0,61,121,0.504,0.5,39,51,0.765,45,78,123,25,...,15,26,47,161,0.008,8.8,3.0,-2.2,0.5,20.1,0.5,0.418,-2.7,11.8,0.5,15.3,0.9,15.3,16.0,0.557,17.6,0.0,1.0,0.118,2018


### Import stats from 2019

This is actaully a copy/paste from 2018

Source:

stats_2019: https://www.basketball-reference.com/leagues/NBA_2019_totals.html

adv_2019: https://www.basketball-reference.com/leagues/NBA_2019_advanced.html

In [16]:
stats_2019 = pd.read_csv('ignore/stats_2019.csv')
adv_2019 = pd.read_csv('ignore/adv_2019.csv')

### Find rows without stats and delete them

In [17]:
stats_rows_to_delete = stats_2019.loc[stats_2019['Player'] == 'Player']
adv_rows_to_delete = adv_2019.loc[adv_2019['Player'] == 'Player']

stats_2019 = stats_2019.drop(stats_rows_to_delete.index)
adv_2019 = adv_2019.drop(adv_rows_to_delete.index)

Delete columns containing 'Unnamed'

In [18]:
adv_2019 = adv_2019.loc[:, ~adv_2019.columns.str.contains('Unnamed')]

In [19]:
cols = adv_2019.columns.difference(stats_2019.columns)
cols

Index(['3PAr', 'AST%', 'BLK%', 'BPM', 'DBPM', 'DRB%', 'DWS', 'FTr', 'OBPM',
       'ORB%', 'OWS', 'PER', 'STL%', 'TOV%', 'TRB%', 'TS%', 'USG%', 'VORP',
       'WS', 'WS/48'],
      dtype='object')

In [20]:
stats_2019 = stats_2019.join(adv_2019[cols])
stats_2019.shape

(708, 50)

In [21]:
cols = stats_2019.columns.values[5:]
stats_2019[cols] = stats_2019[cols].apply(pd.to_numeric, errors='coerce')
stats_2019.dtypes

Rk         object
Player     object
Pos        object
Age        object
Tm         object
G           int64
GS          int64
MP          int64
FG          int64
FGA         int64
FG%       float64
3P          int64
3PA         int64
3P%       float64
2P          int64
           ...   
DRB%      float64
DWS       float64
FTr       float64
OBPM      float64
ORB%      float64
OWS       float64
PER       float64
STL%      float64
TOV%      float64
TRB%      float64
TS%       float64
USG%      float64
VORP      float64
WS        float64
WS/48     float64
Length: 50, dtype: object

#### A very handy function to find differences in column names

In [22]:
stat.columns.difference(stats_2019.columns)

Index(['Year'], dtype='object')

In [23]:
stats_2019['Year'] = 2019
stats_2019.sample(5)

Unnamed: 0,Rk,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,...,BLK,TOV,PF,PTS,3PAr,AST%,BLK%,BPM,DBPM,DRB%,DWS,FTr,OBPM,ORB%,OWS,PER,STL%,TOV%,TRB%,TS%,USG%,VORP,WS,WS/48,Year
202,153,Henry Ellenson,PF,22,DET,2,0,25,4,10,0.4,2,4,0.5,2,6,0.333,0.5,2,2,1.0,0,9,9,1,...,0,0,2,12,0.4,6.2,0.0,-6.6,-2.9,40.9,0.0,0.2,-3.7,0.0,0.0,14.8,0.0,0.0,19.7,0.551,18.8,0.0,0.1,0.149,2019
541,392,Kelly Oubre,SF,23,PHO,40,12,1180,243,537,0.453,67,206,0.325,176,331,0.532,0.515,121,159,0.761,47,149,196,64,...,39,73,110,674,0.384,8.7,2.8,-1.2,-0.8,14.2,0.8,0.296,-0.4,4.3,0.5,16.4,2.3,10.7,9.2,0.555,24.7,0.2,1.3,0.054,2019
467,340,Ben McLemore,SG,25,SAC,19,0,158,25,64,0.391,17,41,0.415,8,23,0.348,0.523,8,12,0.667,3,14,17,4,...,3,5,22,75,0.641,3.4,1.6,-3.7,-2.6,9.4,0.1,0.188,-1.1,1.9,0.1,10.5,1.8,6.7,5.5,0.541,19.4,-0.1,0.2,0.055,2019
557,403,Theo Pinson,SG,23,BRK,18,0,211,25,73,0.342,12,46,0.261,13,27,0.481,0.425,19,22,0.864,4,32,36,21,...,0,18,15,81,0.63,14.1,0.0,-4.8,-1.0,15.9,0.2,0.301,-3.7,2.0,-0.2,8.1,1.4,17.9,9.0,0.49,20.0,-0.1,0.0,0.004,2019
515,371,Abdel Nader,SF,25,OKC,61,1,694,91,215,0.423,32,100,0.32,59,115,0.513,0.498,27,36,0.75,14,102,116,20,...,12,26,68,241,0.465,3.8,1.5,-5.1,-0.9,15.7,0.9,0.167,-4.2,2.0,0.0,8.8,1.3,10.1,8.6,0.522,15.1,-0.5,0.9,0.062,2019


In [24]:
stat.columns.difference(stats_2019.columns)

Index([], dtype='object')

### Add stats from last two years

In [25]:
stat = pd.concat([stat, stats_2018], sort=False)
stat = stat.reset_index(drop=True)

stat = pd.concat([stat, stats_2019], sort=False)
stat = stat.reset_index(drop=True)

In [26]:
stat.shape
stat.sample(5)

(25996, 51)

Unnamed: 0,Rk,Year,Player,Pos,Age,Tm,G,GS,MP,PER,TS%,3PAr,FTr,ORB%,DRB%,TRB%,AST%,STL%,BLK%,TOV%,USG%,OWS,DWS,WS,WS/48,...,DBPM,BPM,VORP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
4019,4043,1974.0,Mark Sibley,SG,23,POR,28.0,,124.0,9.3,0.389,,0.125,7.3,13.4,10.3,14.0,1.4,0.4,,,-0.2,0.0,-0.2,-0.065,...,-3.1,-7.5,-0.2,20.0,56.0,0.357,,,,20.0,56.0,0.357,0.357,6.0,7.0,0.857,9.0,16.0,25.0,13.0,4.0,1.0,,23.0,46.0
17404,17459,2005.0,Moochie Norris,PG,31,HOU,6.0,0.0,39.0,0.6,0.237,0.0,0.125,3.1,8.7,6.0,20.0,1.4,0.0,10.6,22.8,-0.2,0.0,-0.1,-0.168,...,-1.4,-11.9,-0.1,3.0,16.0,0.188,0.0,0.0,,3.0,16.0,0.188,0.188,2.0,2.0,1.0,1.0,3.0,4.0,5.0,1.0,0.0,2.0,2.0,8.0
15272,15323,2001.0,Tyrone Nesby,SF,25,WAS,48.0,22.0,1223.0,9.0,0.455,0.329,0.204,3.3,9.4,6.3,8.9,1.7,0.9,11.0,18.2,-0.1,0.3,0.2,0.009,...,-1.6,-3.0,-0.3,149.0,407.0,0.366,39.0,134.0,0.291,110.0,273.0,0.403,0.414,67.0,83.0,0.807,35.0,96.0,131.0,65.0,41.0,16.0,55.0,127.0,404.0
11795,11840,1995.0,Alaa Abdelnaby,PF,26,TOT,54.0,0.0,506.0,12.6,0.519,0.009,0.152,8.7,17.4,13.1,5.0,1.5,1.8,15.4,25.6,-0.4,0.7,0.3,0.027,...,-1.7,-6.3,-0.6,118.0,231.0,0.511,0.0,2.0,0.0,118.0,229.0,0.515,0.511,20.0,35.0,0.571,37.0,77.0,114.0,13.0,15.0,12.0,45.0,104.0,256.0
6792,6825,1983.0,Mark Aguirre,SF,23,DAL,81.0,75.0,2784.0,20.5,0.535,0.048,0.371,7.3,12.5,9.8,18.2,1.3,0.5,12.4,30.0,5.9,1.0,6.9,0.118,...,-2.0,1.7,2.6,767.0,1589.0,0.483,16.0,76.0,0.211,751.0,1513.0,0.496,0.488,429.0,589.0,0.728,191.0,317.0,508.0,332.0,80.0,26.0,261.0,247.0,1979.0


### Replace International letters

In [27]:
stat['Player'] = stat['Player'].apply(lambda x: unidecode(x))

## Import the list of all time international players

Source: https://en.wikipedia.org/wiki/List_of_foreign_NBA_players

In [28]:
intp = pd.read_csv('international.csv')

In [29]:
intp.shape
intp.dtypes

(698, 8)

Nationality[A]      object
Birthplace[B]       object
Player              object
Pos.                object
Career[C]           object
Yrs                float64
Notes               object
Ref.                object
dtype: object

In [30]:
intp.sample(10)

Unnamed: 0,Nationality[A],Birthplace[B],Player,Pos.,Career[C],Yrs,Notes,Ref.
520,,(now Serbia),,,,,,
573,,(now Slovenia),,,2009–2010,,,
386,Lithuania,Soviet Union,Mindaugas Kuzminskas,F,2016–2017,2.0,"Born in the Soviet Union,[F] represents Lithua...",[397]
455,Philippines,United States,Andray Blatche,C/F,2005–2014,9.0,"Born in the United States, became a naturalize...",[462]
402,,,,,2002,,,
582,,(now Slovenia),,,,,,
27,Australia,—,Andrew Gaze,G,1994; 1999,2.0,—,[38]
320,,(now Georgia),,,,,,
196,Dominican Republic,—,Luis Flores,G,2004–2005,1.0,—,[204]
314,Greece,United States,Kosta Koufos*,C,2008–present,11.0,"Born in the United States to Greek parents, re...",[324]


### Cleaning

Rename columns. Drop 2 of them and all Player NaNs

In [31]:
intp.columns = ['Nationality', 'Birthplace', 'Player', 'Pos', 'Career', 'Years', 'Notes', 'Ref']
intp = intp.drop(['Notes', 'Ref'], axis=1)
intp = intp.dropna(subset=['Player', 'Nationality'])

Drop Players with US Nationality or born in USA


In [32]:
intp = intp.loc[intp.Birthplace != ' United States ']
intp = intp.loc[intp.Nationality != ' United States ']

In [33]:
intp.shape
intp.index = pd.RangeIndex(len(intp))
intp.sample(10)

(420, 6)

Unnamed: 0,Nationality,Birthplace,Player,Pos,Career,Years
62,Brazil,—,Marcus Vinicius,F,2006–2008,2.0
181,France,—,Guerschon Yabusele*,F,2017–present,2.0
351,Serbia,SFR Yugoslavia,Igor Rakočević,G,2002–2003,1.0
63,Bulgaria,—,Georgi Glouchkov,F,1985–1986,1.0
135,Czech Republic,Czechoslovakia,Jan Veselý,F,2011–2014,3.0
365,Slovenia,West Germany,Anthony Randolph,F,2008–2014,6.0
279,Netherlands,—,Swen Nater,C,1976–1984,8.0
386,Spain,—,Ricky Rubio*,G,2011–present,8.0
291,Nigeria,—,Michael Olowokandi,C,1998–2007,9.0
383,Spain,SFR Yugoslavia,Nikola Mirotić*,F,2014–present,5.0


### Check which Names match those from stats data frame

In [34]:
intp.loc[intp.Player.isin(stat.Player)]

Unnamed: 0,Nationality,Birthplace,Player,Pos,Career,Years
156,France,—,Tariq Abdul-Wahad,F,1997–2003,6.0


Only one player matches. Apparently on the list from Wikipedia there is a trailing space and some other additional signs. Let's get rid of them

In [35]:
intp['Player'] = intp['Player'].str.replace('[*^]', '')
# intp['Player'] = intp['Player'].str.replace('^', '')
intp['Player'] = np.where(intp.Player.str.endswith(' '), intp.Player.str[:-1], intp.Player)

### Decode international letters

In [36]:
intp['Player'] = intp['Player'].apply(lambda x: unidecode(x))

In [37]:
intp.Player.isin(stat.Player).value_counts()


True     395
False     25
Name: Player, dtype: int64

Checking, which players still don't match to those from stats and played more than two seasons

In [38]:
# For myself: Never forget parenthesses when using multiple conditions
intp.loc[(~intp.Player.isin(stat.Player)) & (intp.Years > 2.0)]

Unnamed: 0,Nationality,Birthplace,Player,Pos,Career,Years
41,Belgium,Zaire,Didier Mbenga,C,2004–2011,7.0
56,Brazil,—,Nene,F/C,2002–present,17.0
123,Croatia,SFR Yugoslavia,Dino Rada,F/C,1993–1997,4.0
201,Germany,West Germany,Christian Welp,C,1987–1990,3.0
301,Puerto Rico,—,Jose Juan Barea,G,2006–present,13.0
313,Russia,Soviet Union,Victor Khryapa,F,2004–2008,4.0
364,Slovenia,SFR Yugoslavia,Radoslav Nesterovic,C,1998–2010,12.0
409,Ukraine,Soviet Union,Slava Medvedenko,F,2000–2007,7.0


Adding those manually

<span style="color:red">Question: Can I do something like this or is it just bad practice? </span>

In [39]:
intp.loc[intp.Player == 'Nene', 'Player'] = 'Nene Hilario'
intp.loc[intp.Player == 'Radoslav Nesterovic', 'Player'] = 'Rasho Nesterovic'
intp.loc[intp.Player == 'Jose Juan Barea', 'Player'] = 'J.J. Barea'
intp.loc[intp.Player == 'Didier Mbenga', 'Player'] = 'Didier Ilunga-Mbenga'
intp.loc[intp.Player == 'Slava Medvedenko', 'Player'] = 'Stanislav Medvedenko'
intp.loc[intp.Player == 'Jakob Poltl', 'Player'] = 'Jakob Poeltl'
intp.loc[intp.Player == 'Victor Khryapa', 'Player'] = 'Viktor Khryapa'
intp.loc[intp.Player == 'Dino Rada', 'Player'] = 'Dino Radja'
intp.loc[intp.Player == 'Luc Mbah a Moute', 'Player'] = 'Luc Mbah'
intp.loc[intp.Player == 'Christian Welp', 'Player'] = 'Chris Welp'

intp.Player.isin(stat.Player).value_counts()
# intp.loc[~intp.Player.isin(stat.Player)].sample(20)

True     403
False     17
Name: Player, dtype: int64

# Results
Show graphs and stats here

In [40]:
stati = stat.loc[stat.Player.isin(intp.Player)]
stati.shape
stat_by_year = stat.groupby(stat.Year).sum()
stati_by_year = stati.groupby(stati.Year).sum()

(2287, 51)

In [41]:
stati_by_year['PTS'].iplot(kind='line', xTitle='Year', yTitle='Points total')

In [42]:
trace1 = go.Scatter(
    x=stat_by_year.index,
    y=stat_by_year.PTS,
    name='Totals'
)
trace2 = go.Scatter(
    x=stati_by_year.index,
    y=stati_by_year.PTS,
    name='Non-US'
)

data = [trace1, trace2]
layout = go.Layout(
    title=go.layout.Title(
        text='Total points per year',
        xref='paper',
    ),
    xaxis=go.layout.XAxis(
        title=go.layout.xaxis.Title(
            text='Year',
            font=dict(
                family='Courier New, monospace',
                size=18,
                color='#5f5f5f'
            )
        )
    ),
    yaxis=go.layout.YAxis(
        title=go.layout.yaxis.Title(
            text='Total Points',
            font=dict(
                family='Courier New, monospace',
                size=18,
                color='#5f5f5f'
            )
        )
    )
)
fig = go.Figure(data=data, layout=layout)
py.offline.iplot(fig)

In [43]:
stati_by_year

Unnamed: 0_level_0,G,GS,MP,PER,TS%,3PAr,FTr,ORB%,DRB%,TRB%,AST%,STL%,BLK%,TOV%,USG%,OWS,DWS,WS,WS/48,OBPM,DBPM,BPM,VORP,FG,FGA,FG%,3P,3PA,3P%,2P,2PA,2P%,eFG%,FT,FTA,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1
1950.0,42.0,0.0,0.0,0.0,0.466,0.000,0.359,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.9,3.9,0.000,0.0,0.0,0.0,0.0,164.0,390.0,0.421,0.0,0.0,0.000,164.0,390.0,0.421,0.421,93.0,140.0,0.664,0.0,0.0,0.0,78.0,0.0,0.0,0.0,126.0,421.0
1951.0,44.0,0.0,0.0,0.0,0.446,0.000,0.289,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.5,0.6,2.1,0.000,0.0,0.0,0.0,0.0,135.0,336.0,0.402,0.0,0.0,0.000,135.0,336.0,0.402,0.402,68.0,97.0,0.701,0.0,0.0,195.0,121.0,0.0,0.0,0.0,144.0,338.0
1952.0,57.0,0.0,1507.0,16.6,0.497,0.000,0.350,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.7,1.2,4.8,0.154,0.0,0.0,0.0,0.0,200.0,457.0,0.438,0.0,0.0,0.000,200.0,457.0,0.438,0.438,124.0,160.0,0.775,0.0,0.0,264.0,164.0,0.0,0.0,0.0,188.0,524.0
1953.0,61.0,0.0,1745.0,18.8,0.499,0.000,0.390,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.7,2.3,7.1,0.194,0.0,0.0,0.0,0.0,272.0,625.0,0.435,0.0,0.0,0.000,272.0,625.0,0.435,0.435,187.0,244.0,0.766,0.0,0.0,342.0,144.0,0.0,0.0,0.0,242.0,731.0
1954.0,155.0,0.0,4211.0,55.1,1.740,0.000,1.786,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.2,1.5,5.8,0.256,0.0,0.0,0.0,0.0,455.0,1227.0,1.417,0.0,0.0,0.000,455.0,1227.0,1.417,1.417,405.0,563.0,2.992,0.0,0.0,770.0,275.0,0.0,0.0,0.0,456.0,1315.0
1955.0,64.0,0.0,1326.0,0.0,0.456,0.000,0.472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,148.0,386.0,0.383,0.0,0.0,0.000,148.0,386.0,0.383,0.383,129.0,182.0,0.709,0.0,0.0,297.0,86.0,0.0,0.0,0.0,180.0,425.0
1956.0,157.0,0.0,2452.0,40.5,1.766,0.000,1.636,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.2,2.9,5.2,-0.126,0.0,0.0,0.0,0.0,331.0,904.0,1.498,0.0,0.0,0.000,331.0,904.0,1.498,1.498,339.0,475.0,3.543,0.0,0.0,515.0,259.0,0.0,0.0,0.0,248.0,1001.0
1957.0,60.0,0.0,1592.0,17.6,0.489,0.000,0.400,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.3,1.2,4.5,0.134,0.0,0.0,0.0,0.0,253.0,585.0,0.432,0.0,0.0,0.000,253.0,585.0,0.432,0.432,167.0,234.0,0.714,0.0,0.0,401.0,113.0,0.0,0.0,0.0,118.0,673.0
1958.0,17.0,0.0,302.0,11.5,0.410,0.000,0.314,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.031,0.0,0.0,0.0,0.0,49.0,137.0,0.358,0.0,0.0,0.000,49.0,137.0,0.358,0.358,30.0,43.0,0.698,0.0,0.0,65.0,19.0,0.0,0.0,0.0,36.0,128.0
1974.0,59.0,0.0,693.0,10.2,0.521,0.000,0.427,3.7,5.6,4.7,9.6,1.9,0.7,0.0,0.0,0.3,0.3,0.6,0.045,-2.6,-1.4,-3.9,-0.3,88.0,185.0,0.476,0.0,0.0,0.000,88.0,185.0,0.476,0.476,53.0,79.0,0.671,25.0,40.0,65.0,54.0,31.0,10.0,0.0,91.0,229.0


# Conclusions and Next Steps
Summarize findings here