## How to empower coaches to make future decision based on the present data.
* Do players really "get hot"?
	* Does making one shot make it more or less likely that they will make the next?
		* Is the above true for *all* players?
		* Are some player's inherently/apparently "streaky"?
	* By what metrics should we see to find this truth?
		* 3 PT
        
## And 1 and clutchness

## 

Does making a shot make it more or less likely that they make the next? Or does it not matter? 

Everything averages out in the end, right?

Getting play-by-play data
Github: https://github.com/dblackrun/pbpstats

Documentation: https://pbpstats.readthedocs.io/en/latest/quickstart.html



In [None]:
#if you want to use the API to gather this data, install the req modules
#pip install pbpstats

I decided to download the data instead

Made shot vs not made



How to measure a player's streakyness
* Do streaky players have large slumps?
* Do non-streaky players make at least one shot a game?
* How far off are they from the medium? (standard deviations)

Let's just look at average 3pt percentage in any given season.

In [2]:
#import libraries
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
#scrape https://www.basketball-reference.com/leagues/NBA_stats_per_game.html
dfs = pd.read_html("https://www.basketball-reference.com/leagues/NBA_stats_per_game.html")
dfs

In [None]:
df = dfs[0]

In [None]:
print("First five rows of dataframe:\n", df.head(5))
print("\nDataframe columns (muli-index check):\n", df.columns)
print("\nDataframe object types:\n", df.dtypes)

### Clean up the dataframe

In [None]:
#drop the unnecessary multilevel columns first
df.columns=df.columns.droplevel()
df.head(5)

In [None]:
#can now get rid of unncessary columns (rank and league) 
df = df.drop(['Rk', 'Lg'], axis='columns')
             
#focusing on the last 20 years
df = df.head(20)

#sort the values per now new top level column "Season"
df = df.sort_values("Season", ascending=True)

#apply object
df = df.apply(pd.to_numeric, errors="ignore")

In [None]:
#check if changes took
print(df.head(5))
df.dtypes

In [None]:
df[["Season", "3PA", "3P", "3P%"]]

In [None]:
fig, ax1 = plt.subplots(figsize=(15, 5))

# start of second plot (bar-3PA-blue)
color = 'tab:blue'
ax1.set_xlabel('Season')
ax1.set_ylabel('3 point attempts', color=color)
ax1.bar(df['Season'], df["3PA"], color=color, width=0.8, alpha=0.6)
ax1.tick_params(axis='y', labelcolor=color)

# instantiate a second axes that shares the same x-axis
ax2 = ax1.twinx()  

# start of second plot (line-3P%-red)
color = 'tab:red'
# we already handled the x-label with ax1
# ax1.set_xlabel('Season')
ax2.set_ylabel('3 point percentage', color=color)  
ax2.plot(df['Season'], df['3PA'], color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.show()

Now that we've established a relative baseline of what to expect at the conclusion of a basketball season, let's see if we can better understand what shooting looks like throughout the course of an entire season. One way we could do that is getting the daily averages of a single NBA season.

Regardless of the source, we ultimately need to reference the game ID if we are looking to get the necessary box score information.

Thankfully for us, there's an API for that:
* https://github.com/swar/nba_api
    * https://github.com/swar/nba_api/blob/master/src/nba_api/live/nba/endpoints/boxscore.py
    * documentation: https://github.com/swar/nba_api/blob/master/docs/nba_api/live/endpoints/boxscore.md

It would be nice to tap an online resource that helps deliver those daily NBA averages, but it would seem as though that there is no such resource. We will instead be making our own resource.

Some points worth remembering:
* The 2022–23 NBA season is the 76th season of the NBA.
* The regular season began on October 18, 2022
* Ended on April 9, 2023
* The 2023 NBA playoffs began on April 15, 2023
* Playoffs will end with the NBA Finals in June 2023
* 30 teams in the NBA
* Each team plays 82 games
* The NBA's Game ID, is a 10-digit code: XXXYYGGGGG, where...
* XXX refers to a season prefix
* YY is the season year (e.g. 14 for 2014-15)
* GGGGG refers to the game number (1-1230 for a full 30-team regular season)

in our case, if we want to look into the 2022-23 season, our NBA game IDs will span from 0022200001- 0022201230

A CSV with all the games played

In [60]:
schedule = pd.read_csv("Spreadsheets/2022-23 nba schedule.csv")

In [52]:
schedule

Unnamed: 0,Date,Start (ET),Visitor/Neutral,PTS,Home/Neutral,PTS.1,Unnamed: 6,Unnamed: 7,Attend.,Arena,Notes
0,Tue Oct 18 2022,7:30p,Philadelphia 76ers,117,Boston Celtics,126,Box Score,,19156.0,TD Garden,
1,Tue Oct 18 2022,10:00p,Los Angeles Lakers,109,Golden State Warriors,123,Box Score,,18064.0,Chase Center,
2,Wed Oct 19 2022,7:00p,Orlando Magic,109,Detroit Pistons,113,Box Score,,20190.0,Little Caesars Arena,
3,Wed Oct 19 2022,7:00p,Washington Wizards,114,Indiana Pacers,107,Box Score,,15027.0,Gainbridge Fieldhouse,
4,Wed Oct 19 2022,7:30p,Houston Rockets,107,Atlanta Hawks,117,Box Score,,17878.0,State Farm Arena,
...,...,...,...,...,...,...,...,...,...,...,...
1225,Sun Apr 9 2023,3:30p,Utah Jazz,117,Los Angeles Lakers,128,Box Score,,18997.0,Crypto.com Arena,
1226,Sun Apr 9 2023,3:30p,New Orleans Pelicans,108,Minnesota Timberwolves,113,Box Score,,18978.0,Target Center,
1227,Sun Apr 9 2023,3:30p,Memphis Grizzlies,100,Oklahoma City Thunder,115,Box Score,,16601.0,Paycom Center,
1228,Sun Apr 9 2023,3:30p,Los Angeles Clippers,119,Phoenix Suns,114,Box Score,,17071.0,Footprint Center,


In [53]:
schedule = schedule.drop(['Notes', 'Unnamed: 6', 'Unnamed: 7'], axis=1)

In [54]:
schedule.dtypes

Date                object
Start (ET)          object
Visitor/Neutral     object
PTS                  int64
Home/Neutral        object
PTS.1                int64
Attend.            float64
Arena               object
dtype: object

In [61]:
dates = schedule["Date"]

In [62]:
print(dates)
print(type(dates))
print(dates.unique())
print(len(dates.unique()))

0       Tue Oct 18 2022
1       Tue Oct 18 2022
2       Wed Oct 19 2022
3       Wed Oct 19 2022
4       Wed Oct 19 2022
             ...       
1225     Sun Apr 9 2023
1226     Sun Apr 9 2023
1227     Sun Apr 9 2023
1228     Sun Apr 9 2023
1229     Sun Apr 9 2023
Name: Date, Length: 1230, dtype: object
<class 'pandas.core.series.Series'>
['Tue Oct 18 2022' 'Wed Oct 19 2022' 'Thu Oct 20 2022' 'Fri Oct 21 2022'
 'Sat Oct 22 2022' 'Sun Oct 23 2022' 'Mon Oct 24 2022' 'Tue Oct 25 2022'
 'Wed Oct 26 2022' 'Thu Oct 27 2022' 'Fri Oct 28 2022' 'Sat Oct 29 2022'
 'Sun Oct 30 2022' 'Mon Oct 31 2022' 'Tue Nov 1 2022' 'Wed Nov 2 2022'
 'Thu Nov 3 2022' 'Fri Nov 4 2022' 'Sat Nov 5 2022' 'Sun Nov 6 2022'
 'Mon Nov 7 2022' 'Wed Nov 9 2022' 'Thu Nov 10 2022' 'Fri Nov 11 2022'
 'Sat Nov 12 2022' 'Sun Nov 13 2022' 'Mon Nov 14 2022' 'Tue Nov 15 2022'
 'Wed Nov 16 2022' 'Thu Nov 17 2022' 'Fri Nov 18 2022' 'Sat Nov 19 2022'
 'Sun Nov 20 2022' 'Mon Nov 21 2022' 'Tue Nov 22 2022' 'Wed Nov 23 2022'
 'Fri Nov 2

In [68]:
test = dates.apply(pd.to_datetime).dt.normalize()

In [69]:
test

0      2022-10-18
1      2022-10-18
2      2022-10-19
3      2022-10-19
4      2022-10-19
          ...    
1225   2023-04-09
1226   2023-04-09
1227   2023-04-09
1228   2023-04-09
1229   2023-04-09
Name: Date, Length: 1230, dtype: datetime64[ns]

In [None]:
#content