Content from https://github.com/StatsWithR
and https://www.coursera.org/specializations/statistics
data from https://github.com/StatsWithR/statsr

Hot Hands
Basketball players who make several baskets in succession are described as having a hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief and showed that successive shots are independent events. This paper started a great controversy that continues to this day, as you can see by Googling hot hand basketball.

We do not expect to resolve this controversy today. However, in this lab we’ll apply one approach to answering questions like this. The goals for this lab are to (1) think about the effects of independent and dependent events, (2) learn how to simulate shooting streaks in R, and (3) to compare a simulation to actual data in order to determine if the hot hand phenomenon appears to be real.



In [1]:
# Load all packages
# Load packages
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
from __future__ import division
# Load plotting packages
import seaborn as sns
from matplotlib import *
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
# Load data
kobe_basket = pd.read_csv("../../data-files/kobe_basket.csv")

In [3]:
kobe_basket.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 133 entries, 0 to 132
Data columns (total 7 columns):
Unnamed: 0     133 non-null int64
vs             133 non-null object
game           133 non-null int64
quarter        133 non-null object
time           133 non-null object
description    133 non-null object
shot           133 non-null object
dtypes: int64(2), object(5)
memory usage: 7.3+ KB


In [4]:
del kobe_basket['Unnamed: 0']

In [5]:
kobe_basket[:10]

Unnamed: 0,vs,game,quarter,time,description,shot
0,ORL,1,1,9:47,Kobe Bryant makes 4-foot two point shot,H
1,ORL,1,1,9:07,Kobe Bryant misses jumper,M
2,ORL,1,1,8:11,Kobe Bryant misses 7-foot jumper,M
3,ORL,1,1,7:41,Kobe Bryant makes 16-foot jumper (Derek Fisher...,H
4,ORL,1,1,7:03,Kobe Bryant makes driving layup,H
5,ORL,1,1,6:01,Kobe Bryant misses jumper,M
6,ORL,1,1,4:07,Kobe Bryant misses 12-foot jumper,M
7,ORL,1,1,0:52,Kobe Bryant misses 19-foot jumper,M
8,ORL,1,1,0:00,Kobe Bryant misses layup,M
9,ORL,1,2,6:35,Kobe Bryant makes jumper,H


In [6]:
# Defining functions on a pandas column for example f(kobe_basket['shot']) 
def f(column):
    for v in column:
        if v == 'H':
            print "Hit"
        else:
            print "Miss"
            


Let us define a new function that returns the streak and it s count
For this lab, we define the length of a shooting streak to be the number of consecutive baskets made until a miss occurs.

For example, in Game 1 Kobe had the following sequence of hits and misses from his nine shot attempts in the first quarter:

H M | M | H H M | M | M | M

You can verify this by viewing the first 8 rows of the data in the data viewer.

Did Kobe have a hot hand?

In [7]:
def streak(column):
    streak_dict = {}
    start = 0
    count0 = 0
    streak_dict[0] = 0
    while start < len(column):
        shot = column[start]
        if shot == 'M':
            count0 += 1
            start += 1
        else:
            ind = start+1
            count_streak = 1
            while (ind < len(column)) and (column[ind] == 'H') :
                count_streak += 1
                ind += 1
            if count_streak in streak_dict:
                streak_dict[count_streak] += 1
            else:
                streak_dict[count_streak] = 1
            start = ind+1
    streak_dict[0] = count0
    return streak_dict


In [8]:
streak_dict = streak(kobe_basket['shot'])

In [9]:
streak_dict

{0: 39, 1: 24, 2: 6, 3: 6, 4: 1}

Next we will simulate an independent shooter and see if indeed kobe had a hot hand. First let us see what was kobe's shooting percentage

In [10]:
kobe_basket['hit'] = (kobe_basket['shot'] =='H').astype(int)

In [11]:
kobe_shooting = kobe_basket['hit'].sum()/len(kobe_basket)

In [12]:
kobe_shooting

0.43609022556390975

We’ve shown that Kobe had some long shooting streaks, but are they long enough to support the belief that he had hot hands? What can we compare them to?

To answer these questions, let’s return to the idea of independence. Two processes are independent if the outcome of one process doesn’t affect the outcome of the second. If each shot that a player takes is an independent process, having made or missed your first shot will not affect the probability that you will make or miss your second shot.

A shooter with a hot hand will have shots that are not independent of one another. Specifically, if the shooter makes his first shot, the hot hand model says he will have a higher probability of making his second shot.

Let’s suppose for a moment that the hot hand model is valid for Kobe. During his career, the percentage of time Kobe makes a basket (i.e. his shooting percentage) is about 45%, or in probability notation,

P(shot 1 = H)=0.44

If he makes the first shot and has a hot hand (not independent shots), then the probability that he makes his second shot would go up to, let’s say, 60%,

P(shot 2 = H|shot 1 = H)=0.60

As a result of these increased probabilites, you’d expect Kobe to have longer streaks. Compare this to the skeptical perspective where Kobe does not have a hot hand, where each shot is independent of the next. If he hit his first shot, the probability that he makes the second is still 0.45.

P(shot 2 = H|shot 1 = H)=0.44

In other words, making the first shot did nothing to affect the probability that he’d make his second shot. If Kobe’s shots are independent, then he’d have the same probability of hitting every shot regardless of his past shots: 44%.

Now that we’ve phrased the situation in terms of independent shots, let’s return to the question: how do we tell if Kobe’s shooting streaks are long enough to indicate that he has hot hands? We can compare his streak lengths to someone without hot hands: an independent shooter.



In [13]:
# First lets simulate 10 coin tosses a fair coin

In [14]:
np.random.choice(["Head", "Tail"], 10, p=[0.5,0.5] )

array(['Tail', 'Head', 'Tail', 'Tail', 'Tail', 'Head', 'Tail', 'Head',
       'Tail', 'Head'], 
      dtype='|S4')

In [15]:
# Lets simulate an independent shooter with the same shooting pct at Kobe
ind_shooter = np.random.choice(["H", "M"], len(kobe_basket), p=[0.44, 0.56])

In [16]:
# Let us calculate this guy's streak
ind_shooter_streak = streak(ind_shooter)

In [17]:
ind_shooter_streak

{0: 44, 1: 13, 2: 11, 3: 5, 4: 1, 5: 1}

In [18]:
# Very similar to Kobe!