# Scraping & Building 2 Massive NBA Data Sets

## 1. Introduction

The NBA is rich with team and player data. After having challenges working with NBA API, we decided to scrape directly from Basketball Reference (https://www.basketball-reference.com/), a source of NBA data and analytics used widely in the industry.

First, we'll scrape team data, including team name, conference, win-loss percentage, and unadjusted and adjusted offensive, defensive, and net ratings from 2016 through 2020. Next, we'll scrape player data for total season, per game, per 36 minute, per 100 possession, and advance analytics from 2016 through 2020. To build up our data set, we'll define and bundle together functions that scrape data, build, format, and save dataframes, and loop over each year (and stat type for player data). In scraping, we'll make us of list comprehension, which is arguably one of the most beautiful techniques in Python. We'll merge together team data for each year and player data for all stat types for each year to create massive team and player csv files, respectively. 

If you'd like to explore the data, you can access and use the csv files from our Google Drive:
- NBA Team Stats 2016-2020: https://drive.google.com/file/d/130BewO9dBGYjHicr_1y-1xWZB-HFWfQ3/view?usp=sharing
- NBA Player Stats 2016-2020: https://drive.google.com/file/d/1t7hcd6-M7RxTn7WZsA1s4O-6njcwS4X0/view?usp=sharing 

## 2. Install & Import Packages

In [1]:
import pandas as pd 
import numpy as np
import io
import os

# Web scraping using BeautifulSoup and converting to pandas dataframe
import requests 
import urllib.request 
from urllib.request import urlopen
from bs4 import BeautifulSoup
!pip install lxml # Install lxml parser as it's faster than the built-in html parser

from time import sleep
from warnings import warn
from random import randint

pd.set_option('display.max_columns', None)



In [2]:
# Get current working directory for saving csv files. I couldn't change it using chdir, so I'll keep it as is
os.getcwd()

'/home/wsuser/work'

## 3. Team Stats 2016-2020

In [28]:
# Variables to direct scrape
directory = '/home/wsuser/work'
start_year = 2016
end_year = 2020

# Get website and create BeautifulSoup object
def get_html(year):
    url = ('https://www.basketball-reference.com/leagues/NBA_{}_ratings.html'.format(year))
    html = urlopen(url)
    soup = BeautifulSoup(html, 'lxml')
    return soup

# Get variable headers for the statistics from the page
def get_header(soup):
    header = [th.getText() for th in soup.findAll('tr')[1].findAll('th')] # list comprehension: get text for table header (th) for first table row ((tr)[1]) (not, it's the 2nd row)
    #header.append('Year')
    header = header[1:] # Don't need rank column 
    header = [item.replace('%', '_percent').replace('/', '_').lower() for item in header] # replace special characters and convert to lower case
    return header

# Get the team statistics
def get_stats(soup, headers):
    stats = []
    rows = soup.find_all('tr', class_=None)
    for i in range(len(rows)):
        stats.append([j.text for j in rows[i].find_all('td')]) # list comprehension: get the text of the table data (td) for each row
        #stats[i].append(year)
    stats = pd.DataFrame(stats, columns=headers)
    return stats

# Format dataframe
def format_dataframe(team_stats):
    team_stats = team_stats.copy()

    #Drop empty columns
    drop_cols = []
    for col in team_stats.columns:
        if (len(col) <= 1): # if length of column is <=1
            drop_cols.append(col)
    team_stats = team_stats.drop(columns=drop_cols)

    # Convert numerical columns to numeric
    cols = [i for i in team_stats.columns if i not in ['team', 'conf', 'div']] # list comprehension: columns that aren't non-numer columns player, pos, team
    for col in cols:
        team_stats[col] = pd.to_numeric(team_stats[col])

    # Fill blanks
    team_stats = team_stats.fillna(0)
    
    # Make sure year is integer type
    #team_stats['year'] = team_stats['year'].astype(int) 
    
    # Save to csv
    team_stats.to_csv('{}/nba_team_stats_{}.csv'.format(directory, year), index=False)
    
# Main loop to get team statistics for years of interest

# Loop through the years
for year in range(start_year, end_year + 1):
    team_stats = pd.DataFrame() # team_stats dataframe for each year (make sure that it's inside the for loop!)
    
    # Slow down the web scrape to avoid server getting overloaded
    sleep(randint(1, 4))

    # Get website
    html_soup = get_html(year)

    # Get header
    if year == start_year:
        headers = get_header(html_soup)

    # Get team stats
    team_stats = team_stats.append(get_stats(html_soup, headers), ignore_index=True)
    
    team_stats = team_stats.drop([0]) # drop the first row with blank data
    
    print('{} table completed.'.format(year))

    # Format the datatframe
    format_dataframe(team_stats)

2016 table completed.
2017 table completed.
2018 table completed.
2019 table completed.
2020 table completed.


In [29]:
# Read all stats into their own variables
NBA_Teams_2020 = pd.read_csv('{}/nba_team_stats_2020.csv'.format(directory))
NBA_Teams_2019 = pd.read_csv('{}/nba_team_stats_2019.csv'.format(directory))
NBA_Teams_2018 = pd.read_csv('{}/nba_team_stats_2018.csv'.format(directory))
NBA_Teams_2017 = pd.read_csv('{}/nba_team_stats_2017.csv'.format(directory))
NBA_Teams_2016 = pd.read_csv('{}/nba_team_stats_2016.csv'.format(directory))

In [30]:
# Let's check the first few rows
NBA_Teams_2020.head()

Unnamed: 0,team,conf,div,w_l_percent,mov,ortg,drtg,nrtg,mov_a,ortg_a,drtg_a,nrtg_a
0,Milwaukee Bucks,E,C,0.767,10.08,112.98,103.36,9.62,9.41,112.69,103.73,8.96
1,Los Angeles Clippers,W,P,0.681,6.44,114.66,108.25,6.41,6.66,114.56,107.94,6.62
2,Los Angeles Lakers,W,P,0.732,5.79,112.78,107.14,5.65,6.28,112.83,106.68,6.16
3,Toronto Raptors,E,A,0.736,6.24,112.02,105.85,6.18,5.97,111.99,106.08,5.9
4,Boston Celtics,E,A,0.667,6.31,114.14,107.79,6.35,5.83,114.01,108.15,5.86


In [31]:
# Merge all 5 tables
# Inner join to keep only rows in left dataframe that has matching keys in the right dataframe
# Replace suffixes (defaults are _x and _y) so they're appopriate to dataframe

# Merge 2019 with 2020
NBA_Teams_2016_2020 = NBA_Teams_2020.merge(NBA_Teams_2019, on=['team', 'conf', 'div'], how='inner')
NBA_Teams_2016_2020.columns = NBA_Teams_2016_2020.columns.str.replace('_x', '').str.replace('_y', '_2019')

# Merge in 2018
NBA_Teams_2016_2020 = NBA_Teams_2016_2020.merge(NBA_Teams_2018, on=['team','conf','div'], how='inner') 
NBA_Teams_2016_2020.columns = NBA_Teams_2016_2020.columns.str.replace('_x', '').str.replace('_y', '_2018')

# Merge in 2017
NBA_Teams_2016_2020 = NBA_Teams_2016_2020.merge(NBA_Teams_2017, on=['team','conf','div'], how='inner') 
NBA_Teams_2016_2020.columns = NBA_Teams_2016_2020.columns.str.replace('_x', '').str.replace('_y', '_2017')

# Merge in 2016
NBA_Teams_2016_2020 = NBA_Teams_2016_2020.merge(NBA_Teams_2018, on=['team','conf','div'], how='inner') 
NBA_Teams_2016_2020.columns = NBA_Teams_2016_2020.columns.str.replace('_x', '_2020').str.replace('_y', '_2016')

# Save csv
NBA_Teams_2016_2020.to_csv('{}/all_nba_player_stats.csv'.format(directory), index=False)

In [32]:
NBA_Teams_2016_2020.head()

Unnamed: 0,team,conf,div,w_l_percent_2020,mov_2020,ortg_2020,drtg_2020,nrtg_2020,mov_a_2020,ortg_a_2020,drtg_a_2020,nrtg_a_2020,w_l_percent_2019,mov_2019,ortg_2019,drtg_2019,nrtg_2019,mov_a_2019,ortg_a_2019,drtg_a_2019,nrtg_a_2019,w_l_percent_2018,mov_2018,ortg_2018,drtg_2018,nrtg_2018,mov_a_2018,ortg_a_2018,drtg_a_2018,nrtg_a_2018,w_l_percent_2017,mov_2017,ortg_2017,drtg_2017,nrtg_2017,mov_a_2017,ortg_a_2017,drtg_a_2017,nrtg_a_2017,w_l_percent_2016,mov_2016,ortg_2016,drtg_2016,nrtg_2016,mov_a_2016,ortg_a_2016,drtg_a_2016,nrtg_a_2016
0,Milwaukee Bucks,E,C,0.767,10.08,112.98,103.36,9.62,9.41,112.69,103.73,8.96,0.732,8.87,114.23,105.76,8.47,8.05,113.89,106.23,7.66,0.537,-0.3,110.67,110.98,-0.31,-0.45,110.7,111.16,-0.46,0.512,-0.18,109.92,110.16,-0.23,-0.44,109.96,110.46,-0.5,0.537,-0.3,110.67,110.98,-0.31,-0.45,110.7,111.16,-0.46
1,Los Angeles Clippers,W,P,0.681,6.44,114.66,108.25,6.41,6.66,114.56,107.94,6.62,0.585,0.85,113.14,112.33,0.81,1.09,113.29,112.22,1.06,0.512,0.04,110.97,111.02,-0.05,0.15,111.06,111.0,0.07,0.622,4.29,113.38,108.88,4.5,4.42,113.46,108.82,4.64,0.512,0.04,110.97,111.02,-0.05,0.15,111.06,111.0,0.07
2,Los Angeles Lakers,W,P,0.732,5.79,112.78,107.14,5.65,6.28,112.83,106.68,6.16,0.451,-1.72,108.52,110.18,-1.66,-1.32,108.46,109.71,-1.26,0.427,-1.55,107.1,108.64,-1.54,-1.43,106.96,108.39,-1.43,0.317,-6.88,106.69,113.72,-7.02,-6.29,106.96,113.38,-6.42,0.427,-1.55,107.1,108.64,-1.54,-1.43,106.96,108.39,-1.43
3,Toronto Raptors,E,A,0.736,6.24,112.02,105.85,6.18,5.97,111.99,106.08,5.9,0.707,6.09,113.99,108.0,5.99,5.49,113.78,108.4,5.38,0.72,7.78,114.6,106.56,8.03,7.3,114.5,106.96,7.54,0.622,4.21,113.2,108.68,4.51,3.65,113.07,109.14,3.93,0.72,7.78,114.6,106.56,8.03,7.3,114.5,106.96,7.54
4,Boston Celtics,E,A,0.667,6.31,114.14,107.79,6.35,5.83,114.01,108.15,5.86,0.598,4.44,112.78,108.35,4.43,3.9,112.57,108.69,3.88,0.671,3.59,108.4,104.65,3.75,3.24,108.14,104.75,3.39,0.646,2.63,111.88,109.16,2.72,2.25,111.81,109.49,2.33,0.671,3.59,108.4,104.65,3.75,3.24,108.14,104.75,3.39


In [33]:
NBA_Teams_2016_2020.shape

(30, 48)

## 4. Player Stats 2016-2020

In [9]:
# Variables to direct scrape
directory = '/home/wsuser/work'
start_year = 2016
end_year = 2020
stat_types = ['_totals', '_per_game', '_per_minute', '_per_poss', '_advanced']

# Get website and create BeautifulSoup object
def get_html(year, stat_type):
    url = ('https://www.basketball-reference.com/leagues/NBA_{}{}.html'.format(year, stat_type))
    html = urlopen(url)
    soup = BeautifulSoup(html, 'lxml')
    return soup

# Get variable headers for the statistics from the page
def get_header(soup):
    #header = [i.text for i in soup.find_all('tr')[0].find_all('th')] # list comprehension: get text for table header (th) for first table row ((tr)[0])
    header = [th.getText() for th in soup.findAll('tr')[0].findAll('th')] # list comprehension: get text for table header (th) for first table row ((tr)[0])
    header.append('Year')
    header = header[1:] # Don't need rank column 
    header = [item.replace('%', '_percent').replace('/', '_').lower() for item in header] # replace special characters and convert to lower case
    return header

# Get the player statistics for the stat type
def get_stats(soup, headers):
    stats = []
    rows = soup.find_all('tr', class_=['full_table', 'italic_text partial_table']) # table rows are in class 'full_table' or 'italic_text partial_table'
    for i in range(len(rows)):
        stats.append([j.text for j in rows[i].find_all('td')]) # list comprehension: get the text of the table data (td) for each row
        stats[i].append(year)
    stats = pd.DataFrame(stats, columns=headers)
    return stats

# Format dataframe
def format_dataframe(player_stats, stat_type):
    player_stats = player_stats.copy()

    # Drop empty columns
    drop_cols = []
    for col in player_stats.columns:
        if (len(col) <= 1) and (col != 'g'): # if length of column is <=1 and doesn't correspond to games (tag'g')
            drop_cols.append(col)
    player_stats = player_stats.drop(columns=drop_cols)

    # Convert numerical columns to numeric
    cols = [i for i in player_stats.columns if i not in ['player', 'pos', 'tm']] # list comprehension: columns that aren't non-numer columns player, pos, team
    for col in cols:
        player_stats[col] = pd.to_numeric(player_stats[col])

    # Fill blanks
    player_stats = player_stats.fillna(0)
    
    # Save to csv
    player_stats.to_csv('{}/nba_player_stats{}.csv'.format(directory, stat_type), index=False)
    
# Main loop to get player statistics for all stat types for years of interest
for stat_type in stat_types:
    player_stats = pd.DataFrame()
    
    # Loop through the years
    for year in range(start_year, end_year + 1):
        
        # Slow down the web scrape to avoid server getting overloaded
        sleep(randint(1, 4))

        # Get website
        html_soup = get_html(year, stat_type)

        # Get header
        if year == start_year:
            headers = get_header(html_soup)

        # Get player stats
        player_stats = player_stats.append(get_stats(html_soup, headers), ignore_index=True)

        print('{} completed for {} table'.format(year, stat_type))

    # Format the datatframe
    format_dataframe(player_stats, stat_type)

2016 completed for _totals table
2017 completed for _totals table
2018 completed for _totals table
2019 completed for _totals table
2020 completed for _totals table
2016 completed for _per_game table
2017 completed for _per_game table
2018 completed for _per_game table
2019 completed for _per_game table
2020 completed for _per_game table
2016 completed for _per_minute table
2017 completed for _per_minute table
2018 completed for _per_minute table
2019 completed for _per_minute table
2020 completed for _per_minute table
2016 completed for _per_poss table
2017 completed for _per_poss table
2018 completed for _per_poss table
2019 completed for _per_poss table
2020 completed for _per_poss table
2016 completed for _advanced table
2017 completed for _advanced table
2018 completed for _advanced table
2019 completed for _advanced table
2020 completed for _advanced table


In [10]:
# Read all stats into their own variables
totals = pd.read_csv('{}/nba_player_stats_totals.csv'.format(directory))
per_game = pd.read_csv('{}/nba_player_stats_per_game.csv'.format(directory))
per_36_min = pd.read_csv('{}/nba_player_stats_per_minute.csv'.format(directory))
per_100_poss = pd.read_csv('{}/nba_player_stats_per_poss.csv'.format(directory))
advanced = pd.read_csv('{}/nba_player_stats_advanced.csv'.format(directory))

In [11]:
# Check totals (this is per full season)
totals.head(3)

Unnamed: 0,player,pos,age,tm,g,gs,mp,fg,fga,fg_percent,3p,3pa,3p_percent,2p,2pa,2p_percent,efg_percent,ft,fta,ft_percent,orb,drb,trb,ast,stl,blk,tov,pf,pts,year
0,Quincy Acy,PF,25,SAC,59,29,876,119,214,0.556,19,49,0.388,100,165,0.606,0.6,50,68,0.735,65,123,188,27,29,24,27,103,307,2016
1,Jordan Adams,SG,21,MEM,2,0,15,2,6,0.333,0,1,0.0,2,5,0.4,0.333,3,5,0.6,0,2,2,3,3,0,2,2,7,2016
2,Steven Adams,C,22,OKC,80,80,2014,261,426,0.613,0,0,0.0,261,426,0.613,0.613,114,196,0.582,219,314,533,62,42,89,84,223,636,2016


In [12]:
# Check per game
per_game.head(3)

Unnamed: 0,player,pos,age,tm,g,gs,mp,fg,fga,fg_percent,3p,3pa,3p_percent,2p,2pa,2p_percent,efg_percent,ft,fta,ft_percent,orb,drb,trb,ast,stl,blk,tov,pf,pts,year
0,Quincy Acy,PF,25,SAC,59,29,14.8,2.0,3.6,0.556,0.3,0.8,0.388,1.7,2.8,0.606,0.6,0.8,1.2,0.735,1.1,2.1,3.2,0.5,0.5,0.4,0.5,1.7,5.2,2016
1,Jordan Adams,SG,21,MEM,2,0,7.5,1.0,3.0,0.333,0.0,0.5,0.0,1.0,2.5,0.4,0.333,1.5,2.5,0.6,0.0,1.0,1.0,1.5,1.5,0.0,1.0,1.0,3.5,2016
2,Steven Adams,C,22,OKC,80,80,25.2,3.3,5.3,0.613,0.0,0.0,0.0,3.3,5.3,0.613,0.613,1.4,2.5,0.582,2.7,3.9,6.7,0.8,0.5,1.1,1.1,2.8,8.0,2016


In [13]:
# Check per 36 mins (pro forma calculated assuming player plays 36 minutes)
per_36_min.head(3)

Unnamed: 0,player,pos,age,tm,g,gs,mp,fg,fga,fg_percent,3p,3pa,3p_percent,2p,2pa,2p_percent,ft,fta,ft_percent,orb,drb,trb,ast,stl,blk,tov,pf,pts,year
0,Quincy Acy,PF,25,SAC,59,29,876,4.9,8.8,0.556,0.8,2.0,0.388,4.1,6.8,0.606,2.1,2.8,0.735,2.7,5.1,7.7,1.1,1.2,1.0,1.1,4.2,12.6,2016
1,Jordan Adams,SG,21,MEM,2,0,15,4.8,14.4,0.333,0.0,2.4,0.0,4.8,12.0,0.4,7.2,12.0,0.6,0.0,4.8,4.8,7.2,7.2,0.0,4.8,4.8,16.8,2016
2,Steven Adams,C,22,OKC,80,80,2014,4.7,7.6,0.613,0.0,0.0,0.0,4.7,7.6,0.613,2.0,3.5,0.582,3.9,5.6,9.5,1.1,0.8,1.6,1.5,4.0,11.4,2016


In [14]:
# Check per 100 possessions
per_100_poss.head(3)

Unnamed: 0,player,pos,age,tm,g,gs,mp,fg,fga,fg_percent,3p,3pa,3p_percent,2p,2pa,2p_percent,ft,fta,ft_percent,orb,drb,trb,ast,stl,blk,tov,pf,pts,ortg,drtg,year
0,Quincy Acy,PF,25,SAC,59,29,876,6.5,11.7,0.556,1.0,2.7,0.388,5.5,9.0,0.606,2.7,3.7,0.735,3.6,6.7,10.3,1.5,1.6,1.3,1.5,5.6,16.8,124.0,108,2016
1,Jordan Adams,SG,21,MEM,2,0,15,6.9,20.6,0.333,0.0,3.4,0.0,6.9,17.2,0.4,10.3,17.2,0.6,0.0,6.9,6.9,10.3,10.3,0.0,6.9,6.9,24.0,84.0,90,2016
2,Steven Adams,C,22,OKC,80,80,2014,6.4,10.5,0.613,0.0,0.0,0.0,6.4,10.5,0.613,2.8,4.8,0.582,5.4,7.7,13.1,1.5,1.0,2.2,2.1,5.5,15.7,123.0,105,2016


In [15]:
# Check advanced statistics 
advanced.head(3)

Unnamed: 0,player,pos,age,tm,g,mp,per,ts_percent,3par,ftr,orb_percent,drb_percent,trb_percent,ast_percent,stl_percent,blk_percent,tov_percent,usg_percent,ows,dws,ws,ws_48,obpm,dbpm,bpm,vorp,year
0,Quincy Acy,PF,25,SAC,59,876,14.7,0.629,0.229,0.318,8.1,15.1,11.6,4.4,1.6,2.2,10.0,13.1,1.8,0.7,2.5,0.137,-0.2,0.2,-0.1,0.4,2016
1,Jordan Adams,SG,21,MEM,2,15,17.3,0.427,0.167,0.833,0.0,15.9,7.6,31.9,10.3,0.0,19.6,30.5,0.0,0.0,0.0,0.015,-2.5,9.4,6.9,0.0,2016
2,Steven Adams,C,22,OKC,80,2014,15.5,0.621,0.0,0.46,12.5,16.1,14.4,4.3,1.0,3.3,14.1,12.6,4.2,2.3,6.5,0.155,0.0,0.2,0.2,1.1,2016


In [16]:
# Merge all 5 tables
# Inner join to keep only rows in left dataframe that has matching keys in the right dataframe
# Replace suffixes (defaults are _x and _y) so they're appopriate to dataframe

# Merge per_game stats into totals on variables we want to focus on
NBA_Players_2016_2020 = totals.merge(per_game, on=['player', 'pos', 'age', 'tm', 'g', 'gs','fg_percent', '3p_percent','2p_percent', 'efg_percent','ft_percent','year'], how='inner')
NBA_Players_2016_2020.columns = NBA_Players_2016_2020.columns.str.replace('_x', '').str.replace('_y', '_pg')

# Merge in per_36_min
NBA_Players_2016_2020 = NBA_Players_2016_2020.merge(per_36_min, on=['player', 'pos', 'age', 'tm', 'g', 'gs','fg_percent','3p_percent','2p_percent', 'ft_percent', 'year'], how='inner') # per_36_min doesn't include efg percent
NBA_Players_2016_2020.columns = NBA_Players_2016_2020.columns.str.replace('_x', '').str.replace('_y', '_p36m')

# Merge in per_100_poss
NBA_Players_2016_2020 = NBA_Players_2016_2020.merge(per_100_poss, on=['player', 'pos', 'age', 'tm', 'g', 'gs','fg_percent','3p_percent','2p_percent', 'ft_percent', 'year'], how='inner') # per_100_poss doesn't include efg percent
NBA_Players_2016_2020.columns = NBA_Players_2016_2020.columns.str.replace('_x', '').str.replace('_y', '_p100p')

# Merge in advanced
NBA_Players_2016_2020 = NBA_Players_2016_2020.merge(advanced, on=['player', 'pos', 'age', 'tm', 'g', 'mp', 'year'], how='inner')
NBA_Players_2016_2020.columns = NBA_Players_2016_2020.columns.str.replace('_x', '_tot').str.replace('_y', '_adv')

# Save csv
NBA_Players_2016_2020.to_csv('{}/NBA_Players_2016_2020.csv'.format(directory), index=False)

In [17]:
# We have 3196 players and 106 columns in the full dataframe
NBA_Players_2016_2020.shape

(3196, 106)

In [18]:
# Check first few rows
NBA_Players_2016_2020.head(3)

Unnamed: 0,player,pos,age,tm,g,gs,mp,fg,fga,fg_percent,3p,3pa,3p_percent,2p,2pa,2p_percent,efg_percent,ft,fta,ft_percent,orb,drb,trb,ast,stl,blk,tov,pf,pts,year,mp_pg,fg_pg,fga_pg,3p_pg,3pa_pg,2p_pg,2pa_pg,ft_pg,fta_pg,orb_pg,drb_pg,trb_pg,ast_pg,stl_pg,blk_pg,tov_pg,pf_pg,pts_pg,mp_p36m,fg_p36m,fga_p36m,3p_p36m,3pa_p36m,2p_p36m,2pa_p36m,ft_p36m,fta_p36m,orb_p36m,drb_p36m,trb_p36m,ast_p36m,stl_p36m,blk_p36m,tov_p36m,pf_p36m,pts_p36m,mp_p100p,fg_p100p,fga_p100p,3p_p100p,3pa_p100p,2p_p100p,2pa_p100p,ft_p100p,fta_p100p,orb_p100p,drb_p100p,trb_p100p,ast_p100p,stl_p100p,blk_p100p,tov_p100p,pf_p100p,pts_p100p,ortg,drtg,per,ts_percent,3par,ftr,orb_percent,drb_percent,trb_percent,ast_percent,stl_percent,blk_percent,tov_percent,usg_percent,ows,dws,ws,ws_48,obpm,dbpm,bpm,vorp
0,Quincy Acy,PF,25,SAC,59,29,876,119,214,0.556,19,49,0.388,100,165,0.606,0.6,50,68,0.735,65,123,188,27,29,24,27,103,307,2016,14.8,2.0,3.6,0.3,0.8,1.7,2.8,0.8,1.2,1.1,2.1,3.2,0.5,0.5,0.4,0.5,1.7,5.2,876,4.9,8.8,0.8,2.0,4.1,6.8,2.1,2.8,2.7,5.1,7.7,1.1,1.2,1.0,1.1,4.2,12.6,876,6.5,11.7,1.0,2.7,5.5,9.0,2.7,3.7,3.6,6.7,10.3,1.5,1.6,1.3,1.5,5.6,16.8,124.0,108,14.7,0.629,0.229,0.318,8.1,15.1,11.6,4.4,1.6,2.2,10.0,13.1,1.8,0.7,2.5,0.137,-0.2,0.2,-0.1,0.4
1,Jordan Adams,SG,21,MEM,2,0,15,2,6,0.333,0,1,0.0,2,5,0.4,0.333,3,5,0.6,0,2,2,3,3,0,2,2,7,2016,7.5,1.0,3.0,0.0,0.5,1.0,2.5,1.5,2.5,0.0,1.0,1.0,1.5,1.5,0.0,1.0,1.0,3.5,15,4.8,14.4,0.0,2.4,4.8,12.0,7.2,12.0,0.0,4.8,4.8,7.2,7.2,0.0,4.8,4.8,16.8,15,6.9,20.6,0.0,3.4,6.9,17.2,10.3,17.2,0.0,6.9,6.9,10.3,10.3,0.0,6.9,6.9,24.0,84.0,90,17.3,0.427,0.167,0.833,0.0,15.9,7.6,31.9,10.3,0.0,19.6,30.5,0.0,0.0,0.0,0.015,-2.5,9.4,6.9,0.0
2,Steven Adams,C,22,OKC,80,80,2014,261,426,0.613,0,0,0.0,261,426,0.613,0.613,114,196,0.582,219,314,533,62,42,89,84,223,636,2016,25.2,3.3,5.3,0.0,0.0,3.3,5.3,1.4,2.5,2.7,3.9,6.7,0.8,0.5,1.1,1.1,2.8,8.0,2014,4.7,7.6,0.0,0.0,4.7,7.6,2.0,3.5,3.9,5.6,9.5,1.1,0.8,1.6,1.5,4.0,11.4,2014,6.4,10.5,0.0,0.0,6.4,10.5,2.8,4.8,5.4,7.7,13.1,1.5,1.0,2.2,2.1,5.5,15.7,123.0,105,15.5,0.621,0.0,0.46,12.5,16.1,14.4,4.3,1.0,3.3,14.1,12.6,4.2,2.3,6.5,0.155,0.0,0.2,0.2,1.1


In [19]:
# The code was removed by Watson Studio for sharing.

In [34]:
# Save team file
project.save_data(data=NBA_Teams_2016_2020.to_csv(index=False),file_name='NBA_Teams_2016_2020.csv',overwrite=True)

{'file_name': 'NBA_Teams_2016_2020.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'basketballreference-donotdelete-pr-flcmkfqvsuzoez',
 'asset_id': 'f54a94c4-8053-486d-b042-8d16c78c5421'}

In [21]:
# Save player file
project.save_data(data=NBA_Players_2016_2020.to_csv(index=False),file_name='NBA_Players_2016_2020.csv',overwrite=True)

{'file_name': 'NBA_Players_2016_2020.csv',
 'message': 'File saved to project storage.',
 'bucket_name': 'basketballreference-donotdelete-pr-flcmkfqvsuzoez',
 'asset_id': '2ee733aa-9c30-40a5-ad71-8dd5b9d85dcd'}