# Analysis of NCAA Basketball Data
In this notebook, I analyze a variety of information about NCAA basketball. I use a large database hosted on Google Bigquery, which can be found at https://www.kaggle.com/ncaa/ncaa-basketball.


The database contains the following tables:
- mascots
- team_colors
- mbb_teams
- mbb_historical_teams_seasons - historical season data (1894/1895-present)
- mbb_historical_teams_games - final scores, one entry per team per game (1996/1997-present)
- mbb_historical_tournament_games - historical data about tournament games (1984/1985-present)
- mbb_games_sr - team level box scores (2013/2014-2017/2018)
- mbb_pbp_sr - play by play information about games (2013/2014-present)
- mbb_players_games_sr - player level box scores (2013/2014-2017/2018)
- mbb_teams_games_sr - team level box scores (2013/2014-2017/2018)

Because I attended Boston College and Wisconsin, and Wisconsin has the much better basketball team, I will mainly attempt to get a sense of Wisconsin's program and its performance over the years. Based on the database tables, it looks like I may be able to answer the following questions:
- What is Wisconin's mascot?
- What is Wisconsin's team color?
- Where does Wisconsin play home games?
- Where has Wisconsin played the most games?
- Home vs. away performance
- Highest scoring games
- Tournament performance
- Chokes vs. Upsets

In [1]:
# Import packages
import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
from google.cloud import bigquery
%matplotlib inline

# Accessing Kaggle data files
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [2]:
# Retrieve NCAA data
client = bigquery.Client()
dataset_ref = client.dataset("ncaa_basketball", project="bigquery-public-data")
dataset = client.get_dataset(dataset_ref)

Using Kaggle's public dataset BigQuery integration.


In [3]:
# Make sure we have the correct database
tables = list(client.list_tables(dataset))
for table in tables:  
    print(table.table_id)

mascots
mbb_games_sr
mbb_historical_teams_games
mbb_historical_teams_seasons
mbb_historical_tournament_games
mbb_pbp_sr
mbb_players_games_sr
mbb_teams
mbb_teams_games_sr
team_colors


# Wisconsin's Game Data
In this notebook, I'll be analyzing the following tables that relate to Wisconsin's game performances:
- mbb_pbp_sr - play by play information about games (2013/2014-present)
- mbb_teams_games_sr - team level box scores (2013/2014-2017/2018)
- mbb_players_games_sr - player level box scores (2013/2014-2017/2018)

Note: Wisconsin Team Code = 796

In [4]:
# # Insert this code after query to check data usage
# dry_run_config = bigquery.QueryJobConfig(dry_run=True)
# dry_run_query_job = client.query(query, job_config=dry_run_config)
# print("This query will process {} bytes.".format(dry_run_query_job.total_bytes_processed))

In [5]:
# Select the plays table
table_ref = dataset_ref.table("mbb_pbp_sr")
table = client.get_table(table_ref)

In [6]:
# Check layout the plays table
print('The plays table has {0} columns'.format(len(table.schema)))
table.schema[:5]

The plays table has 71 columns


[SchemaField('game_id', 'STRING', 'NULLABLE', 'Unique identifier for the game', (), None),
 SchemaField('load_timestamp', 'TIMESTAMP', 'NULLABLE', 'Time at which the data was loaded into the table', (), None),
 SchemaField('season', 'INTEGER', 'NULLABLE', 'Season the game was played in', (), None),
 SchemaField('status', 'STRING', 'NULLABLE', '', (), None),
 SchemaField('scheduled_date', 'TIMESTAMP', 'NULLABLE', 'Date the game was played', (), None)]

In [7]:
# Some example records
client.list_rows(table, max_results=5).to_dataframe()

  


Unnamed: 0,game_id,load_timestamp,season,status,scheduled_date,venue_id,venue_name,venue_city,venue_state,venue_address,...,event_type,type,shot_made,shot_type,shot_subtype,three_point_shot,points_scored,turnover_type,rebound_type,timeout_duration
0,60f60dfe-4423-4ef2-9b6c-aee096a0b65c,2018-02-01 15:40:23.876394+00:00,2013,closed,2013-12-01 20:00:00+00:00,,,,,,...,endperiod,,False,,,False,,,,
1,60f60dfe-4423-4ef2-9b6c-aee096a0b65c,2018-02-01 15:40:23.876394+00:00,2013,closed,2013-12-01 20:00:00+00:00,,,,,,...,deadball,,False,,,False,,,,
2,5be9a30e-3582-465d-b19e-5388eddbacd5,2018-02-01 16:03:33.167831+00:00,2013,closed,2013-11-30 18:00:00+00:00,da1c1f6d-e0b0-498f-96d7-42b6bcfa81fd,Imperial Arena,Nassau,,Casino Drive,...,teamtimeout,,False,,,False,,,,60.0
3,5be9a30e-3582-465d-b19e-5388eddbacd5,2018-02-01 16:03:33.167831+00:00,2013,closed,2013-11-30 18:00:00+00:00,da1c1f6d-e0b0-498f-96d7-42b6bcfa81fd,Imperial Arena,Nassau,,Casino Drive,...,teamtimeout,,False,,,False,,,,60.0
4,5be9a30e-3582-465d-b19e-5388eddbacd5,2018-02-01 16:03:33.167831+00:00,2013,closed,2013-11-30 18:00:00+00:00,da1c1f6d-e0b0-498f-96d7-42b6bcfa81fd,Imperial Arena,Nassau,,Casino Drive,...,teamtimeout,,False,,,False,,,,60.0


In [8]:
# Query Wisconsin event statistics
query = """
        SELECT
            team_market,
            event_type,
            COUNT(event_type) as event_count
        FROM `bigquery-public-data.ncaa_basketball.mbb_pbp_sr`
        WHERE team_market = 'Wisconsin'
        group BY team_market, event_type
        ORDER BY event_count DESC
        """

client = bigquery.Client()
query_job = client.query(query)
wisc = query_job.to_dataframe()
wisc

Using Kaggle's public dataset BigQuery integration.


  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,team_market,event_type,event_count
0,Wisconsin,rebound,6125
1,Wisconsin,twopointmade,3117
2,Wisconsin,twopointmiss,3023
3,Wisconsin,turnover,2661
4,Wisconsin,freethrowmade,2543
5,Wisconsin,threepointmiss,2297
6,Wisconsin,assist,2223
7,Wisconsin,personalfoul,1754
8,Wisconsin,threepointmade,1283
9,Wisconsin,shootingfoul,1182


In [9]:
# Query Wisconsin event statistics by game
query = """
        SELECT
            team_market,
            game_id,
            event_type,
            COUNT(event_type) as event_count
        FROM `bigquery-public-data.ncaa_basketball.mbb_pbp_sr`
        WHERE team_market = 'Wisconsin'
            AND event_type IN ("rebound", "twopointmade", "threepointmade")
        group BY team_market, event_type, game_id
        ORDER BY event_count DESC
        """

client = bigquery.Client()
query_job = client.query(query)
wisc = query_job.to_dataframe()
wisc.head()

Using Kaggle's public dataset BigQuery integration.


  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,team_market,game_id,event_type,event_count
0,Wisconsin,cea8f997-0a29-4d49-8f37-b455a2c12845,rebound,53
1,Wisconsin,9757f8e0-5020-4c8e-94ff-68da87bc7eb1,rebound,51
2,Wisconsin,f089ac70-c48a-4997-b504-5afa1e568cee,rebound,50
3,Wisconsin,aa659e6e-2ac6-43c4-b0cb-65e3882ed14e,rebound,50
4,Wisconsin,b8c9584c-a613-40be-b86a-56a5a4973836,rebound,47


In [10]:
# How many games are in the sample?
print("Wisconsin played {0} games in this sample".format(wisc.shape[0]/3))

Wisconsin played 179.0 games in this sample


In [11]:
# Which games had the most two pointers made?
wisc[wisc.event_type=="twopointmade"].sort_values(by="event_count", ascending=False).head()

Unnamed: 0,team_market,game_id,event_type,event_count
127,Wisconsin,a53541a5-4c5b-425e-b424-7f1baf847e48,twopointmade,31
160,Wisconsin,9a2f5d24-ba7a-465a-9911-3a6b79b3e98f,twopointmade,27
164,Wisconsin,67e7a92b-c85d-4505-9e2d-a8c9ae16d9d6,twopointmade,26
168,Wisconsin,28b8ccd6-de23-44d6-9580-c6445e8be7ad,twopointmade,25
170,Wisconsin,053606eb-8eb0-4366-8d53-aa792ee9065c,twopointmade,25


In [12]:
# Which games had the most three pointers made?
wisc[wisc.event_type=="threepointmade"].sort_values(by="event_count", ascending=False).head()

Unnamed: 0,team_market,game_id,event_type,event_count
327,Wisconsin,56e2bb90-49c5-4dc6-ba9c-d1fcbc88b2e0,threepointmade,13
330,Wisconsin,9a5a93d4-e74a-483e-a1e0-3426525e0fc4,threepointmade,13
336,Wisconsin,8e848ae3-48d9-4bf3-89a8-e2b24d7124c7,threepointmade,13
338,Wisconsin,01af9a2a-098d-4a9f-9e3c-3ca91f85096b,threepointmade,13
328,Wisconsin,abeefba4-5b14-477d-a996-704926e9ffab,threepointmade,13


In [13]:
# Average type of each event
wisc.groupby(by=["team_market", "event_type"]).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,event_count
team_market,event_type,Unnamed: 2_level_1
Wisconsin,rebound,34.217877
Wisconsin,threepointmade,7.167598
Wisconsin,twopointmade,17.413408


In [14]:
# Select the team games table
table_ref = dataset_ref.table("mbb_teams_games_sr")
table = client.get_table(table_ref)

In [15]:
# Check layout the team games table
print('The plays table has {0} columns'.format(len(table.schema)))
table.schema[:5]

The plays table has 132 columns


[SchemaField('game_id', 'STRING', 'NULLABLE', '[Game data] Unique identifier for the game', (), None),
 SchemaField('season', 'INTEGER', 'NULLABLE', '[Game data] Season the game was played in', (), None),
 SchemaField('status', 'STRING', 'NULLABLE', "[Game data] Indicates the last state of Sportradar's game file", (), None),
 SchemaField('coverage', 'STRING', 'NULLABLE', '[Game data] Type of coverage provided by Sportradar', (), None),
 SchemaField('neutral_site', 'BOOLEAN', 'NULLABLE', '[Game data] Type of coverage provided by Sportradar', (), None)]

In [16]:
# Some example records
client.list_rows(table, max_results=5).to_dataframe()

  


Unnamed: 0,game_id,season,status,coverage,neutral_site,scheduled_date,gametime,conference_game,tournament,tournament_type,...,opp_fast_break_pts,opp_second_chance_pts,opp_team_turnovers,opp_points_off_turnovers,opp_team_rebounds,opp_flagrant_fouls,opp_player_tech_fouls,opp_team_tech_fouls,opp_coach_tech_fouls,created
0,4069f80e-04f0-4f69-a563-86014bbe95a0,2015,closed,full,,2015-12-03,2015-12-03 03:00:00+00:00,,,,...,0,17,0,31,7,0,0,0,0,2018-02-20 15:48:54+00:00
1,7160a0e0-bbc3-46ad-afc6-e4e6b5b90a51,2015,closed,full,,2016-01-17,2016-01-17 02:00:00+00:00,,,,...,21,16,1,30,3,0,0,0,0,2018-02-20 15:48:54+00:00
2,320ccf7a-8a32-4ce6-a561-10687985c6a6,2015,closed,full,,2015-12-22,2015-12-22 20:00:00+00:00,,,,...,26,16,0,36,4,0,0,0,0,2018-02-20 15:48:53+00:00
3,1a689aee-fec2-49df-822d-993e2826744b,2017,closed,full,False,2017-12-10,2017-12-10 00:00:00+00:00,False,,,...,8,21,0,16,0,0,0,0,0,2018-02-20 13:03:23+00:00
4,6314105c-8456-4b35-bfbf-1ec04749ff09,2017,closed,full,False,2017-12-09,2017-12-09 21:00:00+00:00,False,,,...,8,15,0,19,2,0,0,0,0,2018-02-20 13:03:24+00:00


In [17]:
# Query Wisconsin attendance statistics
query = """
        SELECT
            market,
            venue_name,
            AVG(attendance) as avg_attendance
        FROM `bigquery-public-data.ncaa_basketball.mbb_teams_games_sr`
        WHERE market = 'Wisconsin'
        group BY market, venue_name
        ORDER BY avg_attendance DESC
        LIMIT 10
        """

client = bigquery.Client()
query_job = client.query(query)
wisc = query_job.to_dataframe()
wisc

Using Kaggle's public dataset BigQuery integration.


  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,market,venue_name,avg_attendance
0,Wisconsin,AT&T Stadium,79444.0
1,Wisconsin,Lucas Oil Stadium,71693.5
2,Wisconsin,Carrier Dome,22360.0
3,Wisconsin,Wells Fargo Center,20686.0
4,Wisconsin,Staples Center,18967.0
5,Wisconsin,KeyBank Center,18440.0
6,Wisconsin,BMO Harris Bradley Center,18304.75
7,Wisconsin,Comcast Center,17950.0
8,Wisconsin,Honda Center,17793.5
9,Wisconsin,CenturyLink Center,17658.666667


In [18]:
# Which games had the greatest attendance?
query = """
        SELECT
            market,
            opp_market,
            venue_name,
            attendance
        FROM `bigquery-public-data.ncaa_basketball.mbb_teams_games_sr`
        WHERE market = 'Wisconsin'
        ORDER BY attendance DESC
        LIMIT 10
        """

client = bigquery.Client()
query_job = client.query(query)
wisc = query_job.to_dataframe()
wisc

Using Kaggle's public dataset BigQuery integration.


  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,market,opp_market,venue_name,attendance
0,Wisconsin,Kentucky,AT&T Stadium,79444
1,Wisconsin,Kentucky,Lucas Oil Stadium,72238
2,Wisconsin,Duke,Lucas Oil Stadium,71149
3,Wisconsin,Syracuse,Carrier Dome,22360
4,Wisconsin,Notre Dame,Wells Fargo Center,20686
5,Wisconsin,Florida,Madison Square Garden,20047
6,Wisconsin,Villanova,KeyBank Center,19261
7,Wisconsin,Arizona,Staples Center,19125
8,Wisconsin,North Carolina,Staples Center,18809
9,Wisconsin,Marquette,BMO Harris Bradley Center,18691


In [19]:
# How were Wisconsin's offensive numbers?
query = """
        SELECT
            market,
            opp_market,
            points_game,
            two_points_att,
            two_points_made,
            two_points_pct,
            three_points_att,
            three_points_made,
            three_points_pct,
            opp_points,
            points_game + opp_points as total_points
        FROM `bigquery-public-data.ncaa_basketball.mbb_teams_games_sr`
        WHERE market = 'Wisconsin'
        """

client = bigquery.Client()
query_job = client.query(query)
wisc = query_job.to_dataframe()
wisc.head()

Using Kaggle's public dataset BigQuery integration.


  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,market,opp_market,points_game,two_points_att,two_points_made,two_points_pct,three_points_att,three_points_made,three_points_pct,opp_points,total_points
0,Wisconsin,Georgetown,68,27,15,0.556,16,6,37.5,65,133
1,Wisconsin,Virginia,48,29,10,34.5,23,5,21.7,38,86
2,Wisconsin,Iowa,67,29,15,0.517,23,10,43.5,59,126
3,Wisconsin,Iowa,79,39,22,56.4,24,8,33.3,74,153
4,Wisconsin,Iowa,74,33,16,0.485,19,7,36.8,63,137


In [20]:
# What did Wisconsin's highest scoring game look like?
wisc[wisc.points_game == wisc.points_game.max()]

Unnamed: 0,market,opp_market,points_game,two_points_att,two_points_made,two_points_pct,three_points_att,three_points_made,three_points_pct,opp_points,total_points
96,Wisconsin,North Dakota,103,39,23,59.0,20,12,60.0,85,188


In [21]:
# When did Wisconsin let up the most points?
wisc[wisc.opp_points == wisc.opp_points.max()]

Unnamed: 0,market,opp_market,points_game,two_points_att,two_points_made,two_points_pct,three_points_att,three_points_made,three_points_pct,opp_points,total_points
21,Wisconsin,Purdue,80,31,13,0.419,29,13,44.8,91,171


In [22]:
# What was the highest scoring game in the sample?
wisc[wisc.total_points == wisc.total_points.max()]

Unnamed: 0,market,opp_market,points_game,two_points_att,two_points_made,two_points_pct,three_points_att,three_points_made,three_points_pct,opp_points,total_points
96,Wisconsin,North Dakota,103,39,23,59.0,20,12,60.0,85,188


In [23]:
# Two pointer information
w = wisc.two_points_made.mean()
x = wisc.two_points_made.max()
y = wisc.two_points_made.sum() / (wisc.two_points_att.sum())
z = 2*wisc.two_points_made.sum() / wisc.points_game.sum()
print('Wisconsin made {0:0.2f} two pointers per game'.format(w))
print('Wisconsin maxed at {0} two pointers in one game'.format(x))
print('Wisconsin had a shooting percentage of {0:0.2f} for two pointers'.format(100*y))
print('Two pointers made up {0:0.2f} percent of Wisconsin\'s offense'.format(100*z))

Wisconsin made 17.53 two pointers per game
Wisconsin maxed at 31 two pointers in one game
Wisconsin had a shooting percentage of 51.02 for two pointers
Two pointers made up 49.54 percent of Wisconsin's offense


In [24]:
# Three pointer information
i = wisc.three_points_made.mean()
j = wisc.three_points_made.max()
k = wisc.three_points_made.sum() / (wisc.three_points_att.sum())
l = 3*wisc.three_points_made.sum() / wisc.points_game.sum()
print('Wisconsin made {0:0.2f} three pointers per game'.format(i))
print('Wisconsin maxed at {0} three pointers in one game'.format(j))
print('Wisconsin had a shooting percentage of {0:0.2f} for three pointers'.format(100*k))
print('Three pointers made up {0:0.2f} percent of Wisconsin\'s offense'.format(100*l))

Wisconsin made 7.17 three pointers per game
Wisconsin maxed at 13 three pointers in one game
Wisconsin had a shooting percentage of 35.79 for three pointers
Three pointers made up 30.42 percent of Wisconsin's offense


In [25]:
# How do Wisconsin's numbers stack up?
query = """
        SELECT
            AVG(two_points_made) as two_avg,
            SUM(two_points_made)/SUM(two_points_att) as two_pct,
            2*SUM(two_points_made)/SUM(points_game) as two_off_pct,
            AVG(three_points_made) as three_avg,
            SUM(three_points_made)/SUM(three_points_att) as three_pct,
            3*SUM(three_points_made)/SUM(points_game) as three_off_pct,
        FROM `bigquery-public-data.ncaa_basketball.mbb_teams_games_sr`
        """

client = bigquery.Client()
query_job = client.query(query)
wisc = query_job.to_dataframe()
wisc

Using Kaggle's public dataset BigQuery integration.


  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,two_avg,two_pct,two_off_pct,three_avg,three_pct,three_off_pct
0,17.729158,0.490799,0.430262,6.977535,0.347524,0.254002


In [26]:
# Select the player statistics table
table_ref = dataset_ref.table("mbb_players_games_sr")
table = client.get_table(table_ref)

In [27]:
# Check layout of the table
print('The plays table has {0} columns'.format(len(table.schema)))
table.schema[:5]

The plays table has 66 columns


[SchemaField('game_id', 'STRING', 'NULLABLE', '[Game data] Unique identifier for the game', (), None),
 SchemaField('season', 'INTEGER', 'NULLABLE', '[Game data] Season the game was played in', (), None),
 SchemaField('neutral_site', 'BOOLEAN', 'NULLABLE', '[Game data] Indicator of whether the game was played on a neutral court', (), None),
 SchemaField('scheduled_date', 'DATE', 'NULLABLE', '[Game data] Date the game was played', (), None),
 SchemaField('gametime', 'TIMESTAMP', 'NULLABLE', '[Game data] Date and time the game was played', (), None)]

In [28]:
# Some example records
client.list_rows(table, max_results=5).to_dataframe()

  


Unnamed: 0,game_id,season,neutral_site,scheduled_date,gametime,tournament,tournament_type,tournament_round,tournament_game_no,player_id,...,assists,turnovers,steals,blocks,assists_turnover_ratio,personal_fouls,tech_fouls,flagrant_fouls,points,sp_created
0,14ab9c26-b586-4f68-8989-f433bb3a3e7f,2017,False,2017-11-22,2017-11-22 00:00:00+00:00,,,,,b8df0122-7f1e-4189-a47b-1e08050bf6c6,...,,,,,,,,,,2018-02-20 13:03:24+00:00
1,64667cdd-9379-4ecc-877a-3fb4d76fbff2,2017,False,2017-12-19,2017-12-19 01:00:00+00:00,,,,,2036297a-f0e1-4d65-8cc2-94d3d4de314f,...,,,,,,,,,,2018-02-20 13:03:27+00:00
2,fcc2decd-b14f-4fed-8a78-8856c6689c74,2017,False,2017-11-12,2017-11-12 18:00:00+00:00,,,,,c367fd91-183a-4d7d-9307-b8a289cb7bc9,...,,,,,,,,,,2018-02-20 13:03:23+00:00
3,7cda8dca-e87b-4b4e-9eed-68cfba948957,2017,False,2017-11-26,2017-11-26 21:30:00+00:00,,,,,0ca3800d-fb89-4151-90c0-a3e03835fd04,...,,,,,,,,,,2018-02-20 13:03:26+00:00
4,b3b15b00-c5a6-4239-9c99-713ebe01b8c7,2017,False,2018-01-24,2018-01-24 01:00:00+00:00,,,,,269dc916-2208-40e2-a5ee-062a38a3a00d,...,,,,,,,,,,2018-02-20 13:03:21+00:00


In [29]:
# Query Wisconsin player information
query = """
        SELECT
            player_id,
            full_name,
            SUM(minutes_int64) as playtime,
            SUM(points) as points,
            SUM(assists) as assists,
            SUM(two_points_att) as two_points_att,
            SUM(two_points_made) as two_points_made,
            SUM(three_points_att) as three_points_att,
            SUM(three_points_made) as three_points_made,
        FROM `bigquery-public-data.ncaa_basketball.mbb_players_games_sr`
        WHERE team_market = 'Wisconsin'
        GROUP BY player_id, full_name
        """

client = bigquery.Client()
query_job = client.query(query)
wisc = query_job.to_dataframe()
wisc.head()

Using Kaggle's public dataset BigQuery integration.


  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,player_id,full_name,playtime,points,assists,two_points_att,two_points_made,three_points_att,three_points_made
0,df127f24-8a2e-4d09-b83f-527622c7b03e,Jordan Smith,44,4,0,7,1,5,0
1,2b192a6a-9d52-46f9-a159-301e32662921,Aleem Ford,805,192,30,40,15,110,45
2,6529b34a-c775-4fbf-98b3-b1d44893d6bd,Michael Ballard,1,0,0,0,0,0,0
3,daec2436-b721-4b8d-bb38-d86dc6bed85c,Andy Van Vliet,179,76,5,25,9,31,15
4,92e134ac-b31d-4f7d-bc1f-a1cef5189d8f,Aaron Moesch,273,41,13,19,9,3,2


In [30]:
# Who played the most?
wisc.sort_values('playtime', ascending=False).head()

Unnamed: 0,player_id,full_name,playtime,points,assists,two_points_att,two_points_made,three_points_att,three_points_made
36,cfc15d5a-efc8-443a-9868-39853a28b849,Nigel Hayes,4446,1857,319,1054,504,304,101
35,6344815c-ac3a-4506-827f-de05ec5e37c9,Bronson Koenig,4075,1459,296,515,228,695,270
13,1df2ebf8-f260-44e1-b41f-10739d1f02e5,Ethan Happ,3026,1541,270,1122,622,11,1
28,f2673e7c-d568-4840-b53f-66433707c7d3,Josh Gasser,2590,598,142,133,65,235,96
32,ea26f94f-987d-4d49-8005-861cc6b99799,Zak Showalter,2457,641,157,221,132,244,88


In [31]:
# Who had the most points?
wisc.sort_values('points', ascending=False).head()

Unnamed: 0,player_id,full_name,playtime,points,assists,two_points_att,two_points_made,three_points_att,three_points_made
36,cfc15d5a-efc8-443a-9868-39853a28b849,Nigel Hayes,4446,1857,319,1054,504,304,101
13,1df2ebf8-f260-44e1-b41f-10739d1f02e5,Ethan Happ,3026,1541,270,1122,622,11,1
35,6344815c-ac3a-4506-827f-de05ec5e37c9,Bronson Koenig,4075,1459,296,515,228,695,270
26,ed2f8d67-3593-4b26-a242-a92bc114d740,Frank Kaminsky,2343,1262,152,660,384,199,79
29,f75e79a3-b9f8-4541-b75e-fb5477c3600f,Sam Dekker,2373,1028,101,484,289,280,92


In [32]:
# Who had the most assists?
wisc.sort_values('assists', ascending=False).head()

Unnamed: 0,player_id,full_name,playtime,points,assists,two_points_att,two_points_made,three_points_att,three_points_made
36,cfc15d5a-efc8-443a-9868-39853a28b849,Nigel Hayes,4446,1857,319,1054,504,304,101
35,6344815c-ac3a-4506-827f-de05ec5e37c9,Bronson Koenig,4075,1459,296,515,228,695,270
13,1df2ebf8-f260-44e1-b41f-10739d1f02e5,Ethan Happ,3026,1541,270,1122,622,11,1
18,8195c7a4-6b7a-459d-83f5-6766a831b55e,Traevon Jackson,1676,578,205,310,137,128,45
32,ea26f94f-987d-4d49-8005-861cc6b99799,Zak Showalter,2457,641,157,221,132,244,88


In [33]:
# Query Wisconsin player tournament information
query = """
        SELECT
            player_id,
            full_name,
            SUM(minutes_int64) as playtime,
            SUM(points) as points,
            SUM(assists) as assists,
            SUM(two_points_att) as two_points_att,
            SUM(two_points_made) as two_points_made,
            SUM(three_points_att) as three_points_att,
            SUM(three_points_made) as three_points_made,
        FROM `bigquery-public-data.ncaa_basketball.mbb_players_games_sr`
        WHERE team_market = 'Wisconsin' AND tournament IS NOT NULL
        GROUP BY player_id, full_name
        """

client = bigquery.Client()
query_job = client.query(query)
wisc = query_job.to_dataframe()
wisc.head()

Using Kaggle's public dataset BigQuery integration.


  "Cannot create BigQuery Storage client, the dependency "


Unnamed: 0,player_id,full_name,playtime,points,assists,two_points_att,two_points_made,three_points_att,three_points_made
0,4bb0010f-18dc-4f3d-b837-0459fa743589,Alex Illikainen,47.0,4,1,5,2,1,0
1,90e75b82-d286-4c2d-9cf4-d075daeae53b,T.J. Schlundt,2.0,0,0,0,0,0,0
2,df127f24-8a2e-4d09-b83f-527622c7b03e,Jordan Smith,2.0,0,0,0,0,0,0
3,991336aa-fd95-4e28-b38f-efda1779990e,Trevor Anderson,0.0,0,0,0,0,0,0
4,1df2ebf8-f260-44e1-b41f-10739d1f02e5,Ethan Happ,353.0,187,26,127,74,0,0


In [34]:
# Who played the most?
wisc.sort_values('playtime', ascending=False).head()

Unnamed: 0,player_id,full_name,playtime,points,assists,two_points_att,two_points_made,three_points_att,three_points_made
34,6344815c-ac3a-4506-827f-de05ec5e37c9,Bronson Koenig,784.0,286,61,95,36,139,57
35,cfc15d5a-efc8-443a-9868-39853a28b849,Nigel Hayes,772.0,308,46,187,82,66,21
24,f2673e7c-d568-4840-b53f-66433707c7d3,Josh Gasser,558.0,87,37,23,12,36,13
27,ed2f8d67-3593-4b26-a242-a92bc114d740,Frank Kaminsky,548.0,295,32,168,94,44,16
25,f75e79a3-b9f8-4541-b75e-fb5477c3600f,Sam Dekker,517.0,224,25,97,63,65,22


In [35]:
# Who had the most points?
wisc.sort_values('points', ascending=False).head()

Unnamed: 0,player_id,full_name,playtime,points,assists,two_points_att,two_points_made,three_points_att,three_points_made
35,cfc15d5a-efc8-443a-9868-39853a28b849,Nigel Hayes,772.0,308,46,187,82,66,21
27,ed2f8d67-3593-4b26-a242-a92bc114d740,Frank Kaminsky,548.0,295,32,168,94,44,16
34,6344815c-ac3a-4506-827f-de05ec5e37c9,Bronson Koenig,784.0,286,61,95,36,139,57
25,f75e79a3-b9f8-4541-b75e-fb5477c3600f,Sam Dekker,517.0,224,25,97,63,65,22
4,1df2ebf8-f260-44e1-b41f-10739d1f02e5,Ethan Happ,353.0,187,26,127,74,0,0


In [36]:
# Who had the most assists?
wisc.sort_values('assists', ascending=False).head()

Unnamed: 0,player_id,full_name,playtime,points,assists,two_points_att,two_points_made,three_points_att,three_points_made
34,6344815c-ac3a-4506-827f-de05ec5e37c9,Bronson Koenig,784.0,286,61,95,36,139,57
35,cfc15d5a-efc8-443a-9868-39853a28b849,Nigel Hayes,772.0,308,46,187,82,66,21
24,f2673e7c-d568-4840-b53f-66433707c7d3,Josh Gasser,558.0,87,37,23,12,36,13
13,8195c7a4-6b7a-459d-83f5-6766a831b55e,Traevon Jackson,250.0,85,35,43,17,20,7
27,ed2f8d67-3593-4b26-a242-a92bc114d740,Frank Kaminsky,548.0,295,32,168,94,44,16
