# Introduction

The goal of this project is to visualise, how the impact of Players born NOT in USA has changed over the years. 

This project uses a kaggle datasets: https://www.kaggle.com/drgilermo/nba-players-stats. We can find 3 .csv files there. Seasons_stats.csv contains stats from every NBA player each season since 1949/1950. Two other files contain some basic informations about each player such as: name, birthdate, height, etc.
The list of international Players comes from Wikipedia: https://en.wikipedia.org/wiki/List_of_foreign_NBA_players

Important note: There is a lot of data that is missing, because not every stat is available since the first season. More details and explaination of each stat: https://www.basketball-reference.com/about/glossary.html

### Imports
Import libraries.

In [1]:
# Data manipulation
import pandas as pd
import numpy as np

# Options for pandas
pd.options.display.max_columns = 50
pd.options.display.max_rows = 30

# Display all cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

from IPython import get_ipython
ipython = get_ipython()

# autoreload extension
if 'autoreload' not in ipython.extension_manager.loaded:
    %load_ext autoreload

%autoreload 2

# Visualizations
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

import cufflinks as cf
cf.go_offline(connected=True)
cf.set_config_file(theme='white')

# Analysis/Modeling
### Import statistics

In [2]:
stat = pd.read_csv('Seasons_Stats.csv')

### Basic info

In [3]:
stat.shape
stat.dtypes

(24691, 53)

Unnamed: 0      int64
Year          float64
Player         object
Pos            object
Age           float64
Tm             object
G             float64
GS            float64
MP            float64
PER           float64
TS%           float64
3PAr          float64
FTr           float64
ORB%          float64
DRB%          float64
               ...   
2PA           float64
2P%           float64
eFG%          float64
FT            float64
FTA           float64
FT%           float64
ORB           float64
DRB           float64
TRB           float64
AST           float64
STL           float64
BLK           float64
TOV           float64
PF            float64
PTS           float64
Length: 53, dtype: object

### Cleaning

In [4]:
stat.isna().sum()

Unnamed: 0       0
Year            67
Player          67
Pos             67
Age             75
Tm              67
G               67
GS            6458
MP             553
PER            590
TS%            153
3PAr          5852
FTr            166
ORB%          3899
DRB%          3899
              ... 
2PA             67
2P%            195
eFG%           166
FT              67
FTA             67
FT%            925
ORB           3894
DRB           3894
TRB            379
AST             67
STL           3894
BLK           3894
TOV           5046
PF              67
PTS             67
Length: 53, dtype: int64

Dropping rows where Player is NaN

TODO: Do I have to explain it?

In [5]:
stat = stat.dropna(subset=['Player'])

Dropping two columns with no values

TODO: Do I have to explain, how do I know that or should I proove it?

In [6]:
stat = stat.drop(['blanl', 'blank2'], axis=1)

## Import the list of all time international players

Source: https://en.wikipedia.org/wiki/List_of_foreign_NBA_players

In [7]:
intp = pd.read_csv('international.csv')

In [8]:
intp.shape
intp.dtypes

(698, 8)

Nationality[A]      object
Birthplace[B]       object
Player              object
Pos.                object
Career[C]           object
Yrs                float64
Notes               object
Ref.                object
dtype: object

In [9]:
intp.sample(10)

Unnamed: 0,Nationality[A],Birthplace[B],Player,Pos.,Career[C],Yrs,Notes,Ref.
680,United States,Sweden,Miles Simon,G,1998–1999,1.0,Born in Sweden to an American father and a Nor...,[664]
239,France,—,Pape Sy,F/G,2010–2011,1.0,—,[249]
563,,(now Croatia),,,,,,
313,Greece,—,Andreas Glyniadakis,C,2006–2007,1.0,—,[322]
506,,,,,2013–2014,,,
537,Serbia,SFR Yugoslavia,Nenad Krstić,C,2004–2011,7.0,"Born in SFR Yugoslavia,[D] has represented FR ...",[534]
677,United States,Nigeria,Josh Okogie*,F,2018–present,1.0,"Born in Nigeria, became a naturalized U.S citizen",[658]
454,Panama,United States,Stuart Gray,C/F,1984–1991,7.0,Born in the Panama Canal Zone (which was contr...,[461]
428,Nigeria,—,Solomon Alabi,C,2010–2012,2.0,—,[430]
375,Lebanon,United States,Matt Freije,F,2004–2005; 2006,2.0,"Born in the United States, became a naturalize...",[390]


### Cleaning

Rename columns. Drop 2 of them and all Player NaNs

In [10]:
intp.columns = ['Nationality', 'Birthplace', 'Player', 'Pos', 'Career', 'Years', 'Notes', 'Ref']
intp = intp.drop(['Notes', 'Ref'], axis=1)
intp = intp.dropna(subset=['Player', 'Nationality'])

Drop Players born in USA

TODO: Delete also those with US Nationality???


In [11]:
intp = intp.loc[intp.Birthplace != ' United States ']
# intp = intp.loc[intp.Nationality != ' United States ']

In [12]:
intp.shape
intp.index = pd.RangeIndex(len(intp))
intp.sample(10)

(449, 6)

Unnamed: 0,Nationality,Birthplace,Player,Pos,Career,Years
388,Sweden,—,Jonas Jerebko*,F,2009–present,10.0
332,Serbia,FR Yugoslavia,Bogdan Bogdanović*,G,2017–present,2.0
121,Croatia,SFR Yugoslavia,Dražen Petrović^,G,1989–1993,4.0
349,Serbia,SFR Yugoslavia,Vladimir Radmanović,F,2001–2013,12.0
184,Georgia,—,Tornike Shengelia,F,2012–2014,2.0
214,Greece,—,Giannis Antetokounmpo*,F,2013–present,6.0
274,Montenegro,SFR Yugoslavia,Slavko Vraneš,C,2004,1.0
215,Greece,—,Kostas Antetokounmpo*,F,2018–present,1.0
381,Spain,—,Raül López,G,2002–2005,2.0
435,United States,Nigeria,Hakeem Olajuwon^,C,1984–2002,18.0


Check which Names match those from stats data frame

In [14]:
intp.loc[intp.Player.isin(stat.Player)]

Unnamed: 0,Nationality,Birthplace,Player,Pos,Career,Years
156,France,—,Tariq Abdul-Wahad,F,1997–2003,6.0
433,United States,Manchukuo,Tom Meschery,F,1961–1971,10.0


Let's begin with removing whitespaces at the end

In [17]:
#TODO: Have to use my data cleaning tutorial

# Results
Show graphs and stats here

# Conclusions and Next Steps
Summarize findings here