# Data Science Workshop
# NBA Free Throws Prediction

![title](img/free_throw_img.jpg)

## Data preparation and cleaning

## Import Python Libraries

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

## Reading the Original Dataset


In [2]:
free_throws_db = pd.read_csv('free_throws.csv')
free_throws_db.drop_duplicates()
free_throws_db.head(10)

Unnamed: 0,end_result,game,game_id,period,play,player,playoffs,score,season,shot_made,time
0,106 - 114,PHX - LAL,261031013,1,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,0 - 1,2006 - 2007,1,11:45
1,106 - 114,PHX - LAL,261031013,1,Andrew Bynum makes free throw 2 of 2,Andrew Bynum,regular,0 - 2,2006 - 2007,1,11:45
2,106 - 114,PHX - LAL,261031013,1,Andrew Bynum makes free throw 1 of 2,Andrew Bynum,regular,18-Dec,2006 - 2007,1,7:26
3,106 - 114,PHX - LAL,261031013,1,Andrew Bynum misses free throw 2 of 2,Andrew Bynum,regular,18-Dec,2006 - 2007,0,7:26
4,106 - 114,PHX - LAL,261031013,1,Shawn Marion makes free throw 1 of 1,Shawn Marion,regular,21-Dec,2006 - 2007,1,7:18
5,106 - 114,PHX - LAL,261031013,1,Amare Stoudemire makes free throw 1 of 2,Amare Stoudemire,regular,33 - 20,2006 - 2007,1,3:15
6,106 - 114,PHX - LAL,261031013,1,Amare Stoudemire makes free throw 2 of 2,Amare Stoudemire,regular,34 - 20,2006 - 2007,1,3:15
7,106 - 114,PHX - LAL,261031013,2,Leandro Barbosa misses free throw 1 of 2,Leandro Barbosa,regular,43 - 29,2006 - 2007,0,10:52
8,106 - 114,PHX - LAL,261031013,2,Leandro Barbosa makes free throw 2 of 2,Leandro Barbosa,regular,44 - 29,2006 - 2007,1,10:52
9,106 - 114,PHX - LAL,261031013,2,Lamar Odom makes free throw 1 of 2,Lamar Odom,regular,44 - 30,2006 - 2007,1,10:37


Describition of dataset:
- end_result: host total score - guest total score
- game: host team vs guest team
- game_id: id of specific game
- period: which quarter
- play: who make free throw, make or miss free throw
- player: player name
- playoffs: whether a playoff game or regular game
- score: host team score - guest team score at that time
- season: NBA season
- shot_made: whether player got the free throw
- time: time left in that quarter

In [3]:
print("Number of free throws in database: %d"%(free_throws_db.shape[0]))
print("Number of games in database: {}".format(free_throws_db.game_id.unique().size))
print("Games distribution:")
free_throws_db['playoffs'].value_counts()

Number of free throws in database: 618019
Number of games in database: 12874
Games distribution:


regular     575893
playoffs     42126
Name: playoffs, dtype: int64

## Collecting more data from internet

In order to expand our dataset, we decided to use an open source python library PandasBasketball, and use a webscrapper in order to get more players stats from https://www.basketball-reference.com website

In [10]:
from tools import get_player_stats
import warnings
warnings.filterwarnings("ignore", category=UserWarning, module='BeatifulSoup')
dataFrame = get_player_stats("Lebron James")
print(dataFrame.columns)
dataFrame.head(20)


Index(['Season', 'Age', 'Tm', 'Lg', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%',
       '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%',
       'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS', 'Height',
       'Weight', 'ShootingHand', 'draftRank'],
      dtype='object')


Unnamed: 0,Season,Age,Tm,Lg,Pos,G,GS,MP,FG,FGA,...,AST,STL,BLK,TOV,PF,PTS,Height,Weight,ShootingHand,draftRank
0,2003-04,19.0,CLE,NBA,SG,79,79,3122,622,1492,...,465,130,58,273,149,1654,203,113,Right,1
1,2004-05,20.0,CLE,NBA,SF,80,80,3388,795,1684,...,577,177,52,262,146,2175,203,113,Right,1
2,2005-06,21.0,CLE,NBA,SF,79,79,3361,875,1823,...,521,123,66,260,181,2478,203,113,Right,1
3,2006-07,22.0,CLE,NBA,SF,78,78,3190,772,1621,...,470,125,55,250,171,2132,203,113,Right,1
4,2007-08,23.0,CLE,NBA,SF,75,74,3027,794,1642,...,539,138,81,255,165,2250,203,113,Right,1
5,2008-09,24.0,CLE,NBA,SF,81,81,3054,789,1613,...,587,137,93,241,139,2304,203,113,Right,1
6,2009-10,25.0,CLE,NBA,SF,76,76,2966,768,1528,...,651,125,77,261,119,2258,203,113,Right,1
7,2010-11,26.0,MIA,NBA,SF,79,79,3063,758,1485,...,554,124,50,284,163,2111,203,113,Right,1
8,2011-12,27.0,MIA,NBA,SF,62,62,2326,621,1169,...,387,115,50,213,96,1683,203,113,Right,1
9,2012-13,28.0,MIA,NBA,PF,76,76,2877,765,1354,...,551,129,67,226,110,2036,203,113,Right,1


Columns used for this database for each player:
- Position : The most common position for the player over his seasons.
- FG%
- 3P%
- FT%
- Height
- Weight
- ShootingHand
- draftRank