Like most games the goal is to score more points than your opponent. In this particular case your goal is to make a shot and force your opponent to miss a shot. Preferably you wouldn't want your opponent to take a shot at all, but we do not have statistics on whether or not shots were attempted.

We're going to go through each statistic that a player can actively control one by one. It will not, on the other hand, look at the very important joint probabilities.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('../input/shot_logs.csv')

In [None]:
print('Data Shape:', data.shape)
print('Data Columns:\n', data.columns)

First we need the base stat. In this case the chance that a shot is made no matter what.

In [None]:
data['SHOT_RESULT'].value_counts(normalize=True)

45% is the number. 

## Dribbles

Let's start with something a little random, dribbles.

In [None]:
data['DRIBBLES'].value_counts()

In [None]:
dribbles_results = data[data['DRIBBLES'] < 13].groupby('DRIBBLES')['SHOT_RESULT'].value_counts(normalize=True)

In [None]:
dribbles_results.loc[:, 'made'].plot()
plt.ylabel('Chance of Success')
plt.show()

After 12 dribbles the data starts to deplete it's better to look at the overall numbers and realize that all of them are below average.

In [None]:
data[data['DRIBBLES'] >= 13]['SHOT_RESULT'].value_counts(normalize=True)

*Conclusion:* It's better to have few or no dribbles.

## Shot Clock

Should you attempt to make a shot as quickly as possible once you get a hold of it or should you wait until the right moment.

In [None]:
# Calculate shot clock percentages.
time_data = pd.crosstab(data['SHOT_CLOCK'], data['SHOT_RESULT'], normalize='index').loc[:, 'made']
time_data.plot()
# Create a base line.
plt.plot([max(time_data.index), min(time_data.index)], [.45, .45])
# Label and show the plot.
plt.ylabel('Chance of Making the Shot')
plt.xlabel('Shot Clock')
plt.show()

As you can see the best chance of making a shot is with a little more than twenty seconds on the clock.

My first thought was that this had to do with break aways but if you look at the plot below, the number of dribbles is much less with twenty to twenty five seconds than it is the rest of the time.

In [None]:
# Calculate the number of dribbels by shot clock time.
data.groupby('SHOT_CLOCK')['DRIBBLES'].mean().plot()
# Label and show the plot.
plt.xlabel('Shot Clock')
plt.ylabel('Average Number of Dribbles')
plt.show()

## Distance from the Nearest Defender

This should be really easy. But it isn't.

In [None]:
import seaborn as sns

sns.distplot(data['CLOSE_DEF_DIST'])
plt.show()

In [None]:
# Plot the density of shots made and shots missed in comparison with the closest defender.
sns.distplot(data[data['SHOT_RESULT'] == 'made']['CLOSE_DEF_DIST'], label='made')
sns.distplot(data[data['SHOT_RESULT'] == 'missed']['CLOSE_DEF_DIST'], label='missed')
# Label and show the plot.
plt.xlabel('Distance to the Closest Defender')
plt.ylabel('Density')
plt.legend()
plt.show()

In [None]:
closest_defender_chance = pd.crosstab(data['CLOSE_DEF_DIST'], data['SHOT_RESULT'], normalize='index').loc[:,'made']
closest_defender_chance.plot()
plt.plot([max(closest_defender_chance.index), min(closest_defender_chance.index)], [.45, .45])
plt.xlabel('Distance to the Closest Defender')
plt.ylabel('Chance of Making a Shot')
plt.show()

The insanity at the end of that graph is due to the fact that there are very few times that a player is more than fifteen feet from a defender. As well, the distance only really starts to matter after extremes that are very rare.

## Shot Distance

This one should also be pretty easy but I was quite surprised the last time.

In [None]:
shot_dist_chance = pd.crosstab(data['SHOT_DIST'], data['SHOT_RESULT'], normalize='index').loc[:, 'made']
shot_dist_chance.plot()
plt.plot([max(shot_dist_chance.index), min(shot_dist_chance.index)], [.45, .45])
plt.xlabel('Shot Distance')
plt.ylabel('Chance of Making the Shot')
plt.show()

As expected the chance is much better closer to the net. After about seven to eight feet you are going to have a below average chance.

## Period

Should you push to make shots earlier in the game or save your energy until later in the game.

In [None]:
data['PERIOD'].value_counts()

We want to make sure that we are only including regulation and not over time.

In [None]:
regulation_data = data[data['PERIOD'] <= 4]
regulation_data.shape

In [None]:
pd.crosstab(regulation_data['PERIOD'], regulation_data['SHOT_RESULT'], normalize='index').loc[:, 'made']

Chances are greater in the first period and the third period. Both of them occur after a long rest of course.

What about the minutes of each period.

In [None]:
data['MINUTE'] = data['GAME_CLOCK'].str.split(':').str.get(0).astype(int)
for period in range(1, 5):
    period_data = data[data['PERIOD'] == period]
    pd.crosstab(period_data['MINUTE'], period_data['SHOT_RESULT'], normalize='index').loc[:, 'made'].plot(label=period)
    
plt.xlabel('Minutes Left in the Period')
plt.ylabel('Chance of Making the Shot')
plt.legend()
plt.show()


Not much of a change minute to minute. Just don't wait till the last.

# Conclusion

After those stats there are no other given stats that a player can do his or her best to actively avoid. You can on the other hand use those stats in joint with stats in these players control to see if there are any major differences, but that is not what this paper was interested in.