# Which jersey number is the "dunk-iest"? 

### In other words, players wearing what jersey number are responsible for a plurality of dunks?

## Step 1

### Box scores don't capture stats on dunks (at least, most don't). But we do have this information in our play-by-play data. Let's find [this dunk](https://www.youtube.com/watch?v=QqWmpUF6HHg) to get an idea of the data we'll need.



In [0]:
from pandas.io import gbq
project_id = '[YOUR_PROJECT_ID]'

In [13]:
FGCU_vs_Georgetown_q = """
SELECT 
  scheduled_date, 
  away_market, 
  home_market, 
  elapsed_time_sec, 
  team_market, 
  player_full_name,
  event_type,
  shot_type
FROM `bigquery-public-data.ncaa_basketball.mbb_pbp_ncaa`
WHERE season = 2012
AND away_market = "Florida Gulf Coast"
AND home_market = "Georgetown"
AND elapsed_time_sec > 2270
AND elapsed_time_sec < 2300
GROUP BY scheduled_date, away_market, home_market, elapsed_time_sec, team_market, player_full_name, event_type, shot_type
ORDER BY elapsed_time_sec ASC
"""

FGCU_vs_Georgetown = gbq.read_gbq(query=FGCU_vs_Georgetown_q, dialect ='standard', project_id=project_id)
FGCU_vs_Georgetown

Requesting query... ok.
Job ID: job_Jjb1aSwvZaP9l1ohtCmFOYTHDaXP
Query running...
Query done.
Cache hit.

Retrieving results...
Got 12 rows.

Total time taken 0.97 s.
Finished at 2018-03-07 07:52:30.


Unnamed: 0,scheduled_date,away_market,home_market,elapsed_time_sec,team_market,player_full_name,event_type,shot_type
0,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2271,Georgetown,"TRAWICK,JABRIL",MISS,3PTR
1,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2271,Georgetown,"PORTER JR.,OTTO",REBOUND,
2,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2276,Georgetown,"PORTER JR.,OTTO",MISS,JUMPER
3,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2276,Georgetown,"BOWEN,AARON",REBOUND,
4,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2279,Georgetown,"BOWEN,AARON",GOOD,TIPIN
5,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2286,Florida Gulf Coast,"FIELER,CHASE",GOOD,DUNK
6,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2286,Florida Gulf Coast,"COMER,BRETT",ASSIST,
7,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2292,Florida Gulf Coast,"GRAF,DAJUAN",FOUL,
8,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2292,Georgetown,"BOWEN,AARON",GOOD,FT
9,2013-03-22T00:00:00,Florida Gulf Coast,Georgetown,2292,Florida Gulf Coast,"MURRAY,EDDIE",SUB,


## Step 2

### To find dunks, it looks like we'll need plays where event_type = "GOOD" and shot_type = "DUNK". So, which jersey number gets the most dunks?


In [14]:
dunks_made_q = """
SELECT 
  SAFE_CAST(jersey_num AS INT64) as jersey,
  COUNTIF(event_type = "GOOD" AND shot_type = "DUNK") AS dunks_made
FROM `bigquery-public-data.ncaa_basketball.mbb_pbp_ncaa`
WHERE home_division_alias = "D1"
AND away_division_alias = "D1"
AND SAFE_CAST(jersey_num AS INT64) IS NOT NULL
GROUP BY jersey
ORDER BY dunks_made DESC
"""

dunks_made = gbq.read_gbq(query=dunks_made_q, dialect ='standard', project_id=project_id)
dunks_made

Requesting query... ok.
Job ID: job_bNOq-XdRmiNxu6qjkOqbodpCZDgd
Query running...
Query done.
Cache hit.

Retrieving results...
Got 39 rows.

Total time taken 1.18 s.
Finished at 2018-03-07 07:52:32.


Unnamed: 0,jersey,dunks_made
0,1,8731
1,5,8553
2,23,8486
3,21,7929
4,24,7023
5,0,6757
6,2,6598
7,15,6509
8,32,6424
9,4,6192


## Step 3

### Looks like number 1. Neat. But what about the jersey number that gets the most field goals overall?

In [15]:
shots_made_q = """
SELECT 
  SAFE_CAST(jersey_num AS INT64) as jersey,
  COUNTIF(event_type = "GOOD" AND shot_type = "DUNK") AS dunks_made,
  COUNTIF(event_type = "MISS" AND shot_type = "DUNK") AS dunks_missed,
  COUNTIF(event_type = "GOOD" AND shot_type != "FT") AS shots_made,
  COUNTIF(event_type = "MISS" AND shot_type != "FT") AS shots_missed
FROM `bigquery-public-data.ncaa_basketball.mbb_pbp_ncaa`
WHERE home_division_alias = "D1"
AND away_division_alias = "D1"
AND SAFE_CAST(jersey_num AS INT64) IS NOT NULL
GROUP BY jersey
ORDER BY shots_made DESC
"""

shots_made = gbq.read_gbq(query=shots_made_q, dialect ='standard', project_id=project_id)
shots_made

Requesting query... ok.
Job ID: job_m7u4rl-iMQeqU-S9gUjo13P3_TPN
Query running...
Query done.
Cache hit.

Retrieving results...
Got 39 rows.

Total time taken 1.07 s.
Finished at 2018-03-07 07:52:34.


Unnamed: 0,jersey,dunks_made,dunks_missed,shots_made,shots_missed
0,1,8731,1094,172137,235518
1,3,5978,769,155691,218025
2,5,8553,996,147099,199888
3,2,6598,868,135895,187463
4,0,6757,836,118237,155991
5,11,5578,651,112392,154914
6,23,8486,1005,109146,139735
7,10,4765,561,105666,144776
8,4,6192,786,105635,139981
9,21,7929,964,98574,120075


## Step 4

### Also number 1! But these are just totals. What about which jersey number makes the most of their dunk attempts?

In [21]:
shots_pct_q = """
SELECT 
  SAFE_CAST(jersey_num AS INT64) as jersey,
  
  COUNTIF(shot_type = "DUNK") AS dunks_att,
  COUNTIF(event_type = "GOOD" AND shot_type = "DUNK") AS dunks_made,
  IF(COUNTIF(shot_type = "DUNK")>0,
    COUNTIF(event_type = "GOOD" AND shot_type = "DUNK") /
    COUNTIF(shot_type = "DUNK"), 0) AS dunks_made_pct,
  
  COUNTIF(shot_type != "FT") AS shots_att,
  COUNTIF(event_type = "GOOD" AND shot_type != "FT") AS shots_made,
  IF(COUNTIF(event_type = "GOOD" AND shot_type != "FT")>0,
    COUNTIF(event_type = "GOOD" AND shot_type != "FT") /
    COUNTIF(shot_type != "FT"), 0) AS shots_made_pct
FROM `bigquery-public-data.ncaa_basketball.mbb_pbp_ncaa`
WHERE home_division_alias = "D1"
AND away_division_alias = "D1"
AND SAFE_CAST(jersey_num AS INT64) IS NOT NULL
GROUP BY jersey
ORDER BY dunks_made_pct DESC
"""

shots_pct = gbq.read_gbq(query=shots_pct_q, dialect ='standard', project_id=project_id)
shots_pct

Requesting query... ok.
Job ID: job_WquTjjwz79111AWL1UG3uYlJ5UNn
Query running...
Query done.
Cache hit.

Retrieving results...
Got 39 rows.

Total time taken 1.27 s.
Finished at 2018-03-07 08:40:24.


Unnamed: 0,jersey,dunks_att,dunks_made,dunks_made_pct,shots_att,shots_made,shots_made_pct
0,53,278,259,0.931655,6281,3039,0.48384
1,51,205,189,0.921951,4755,2194,0.461409
2,52,763,693,0.908257,15782,7440,0.471423
3,40,1925,1748,0.908052,32537,15493,0.476166
4,31,2976,2698,0.906586,81911,37070,0.452564
5,42,2508,2272,0.905901,54533,26778,0.491042
6,24,7774,7023,0.903396,205346,90512,0.440778
7,35,4668,4213,0.902528,84494,39622,0.468933
8,32,7128,6424,0.901235,177550,80865,0.455449
9,55,1739,1566,0.900518,37809,17617,0.465947


##### Note: It looks like some jersey numbers were incorrectly recorded at some point: "6", "7", and "99" are not valid numbers in NCAA basketball. Across 50,000+ games, you're bound to record an incorrect value at some point.

### Number "1" isn't even near the top!

## Step 5

### So which jersey attempts and makes the most dunks compared to the number of total FGs they attempt and make?

In [17]:
dunks_shots_q = """
SELECT 
  SAFE_CAST(jersey_num AS INT64) as jersey,
  
  COUNTIF(shot_type = "DUNK") AS dunks_att,
  COUNTIF(shot_type != "FT") AS FGs_att,
  COUNTIF(shot_type = "DUNK") / COUNTIF(shot_type != "FT") AS dunk_att_pct,
    
  COUNTIF(event_type = "GOOD" AND shot_type = "DUNK") AS dunks,
  COUNTIF(event_type = "GOOD" AND shot_type != "FT") AS FGs,
  IF(COUNTIF(event_type = "GOOD" AND shot_type != "FT")>0,
    COUNTIF(event_type = "GOOD" AND shot_type = "DUNK") /
    COUNTIF(event_type = "GOOD" AND shot_type != "FT"),0) AS dunk_pct
    
FROM `bigquery-public-data.ncaa_basketball.mbb_pbp_ncaa`
WHERE home_division_alias = "D1"
AND away_division_alias = "D1"
GROUP BY jersey
ORDER BY dunk_pct DESC
"""

dunks_shots = gbq.read_gbq(query=dunks_shots_q, dialect ='standard', project_id=project_id)
dunks_shots

Requesting query... ok.
Job ID: job_cwtedBRQEctKsOQ03NjNceABD07y
Query running...
Query done.
Cache hit.

Retrieving results...
Got 40 rows.

Total time taken 1.36 s.
Finished at 2018-03-07 07:52:40.


Unnamed: 0,jersey,dunks_att,FGs_att,dunk_att_pct,dunks,FGs,dunk_pct
0,54.0,610,9210,0.066232,543,4756,0.114172
1,40.0,1925,32537,0.059163,1748,15493,0.112825
2,35.0,4668,84494,0.055247,4213,39622,0.10633
3,44.0,4231,75270,0.056211,3793,36136,0.104965
4,43.0,1023,18970,0.053927,919,8999,0.102122
5,50.0,1340,25296,0.052973,1192,12291,0.096982
6,52.0,763,15782,0.048346,693,7440,0.093145
7,55.0,1739,37809,0.045994,1566,17617,0.088891
8,45.0,1688,35428,0.047646,1499,16914,0.088625
9,41.0,1272,27454,0.046332,1138,12878,0.088368


## So players wearing #1 get the most dunks.

### By our original question, "#1" is the answer, but there's more to it than that. You could argue that "#53" or "#54" is the dunk-iest number - it all depends on how you look at it.