## Extract NBA Player Data
This notebook extract career stats for each player in a given year.  
For each year that the loop runs for, the stats will be repeated.  
For example: if you extract the stat for Al Horford using year = 2010, you will get his stats from the start of his career up to today (2022).  
Of course it's not necessary to extract all this data, but at this stage it's easier to just get the data for all players each year then do more complicated filtering.  
I will de-dup the late at a later stage

In [29]:
from bs4 import BeautifulSoup
import requests
import re
import pandas as pd
from lxml import etree, html
import os
from datetime import datetime
from sportsipy.nba.teams import Teams
from sportsipy.nba.roster import Roster,  Player

**NBA Loop**

In [None]:
# list of years to loop through
years = [2013,2014,2015,2016,2017,2018,2019,2020]

# get team abbreviation for each team
for year in years:
    teams = Teams(year = year)
    team_abbr = []
    for team in teams:
        team_abbr.append(team.abbreviation)

    
 
    lst_players = [] # list of players
    player_team = [] # team the player plays for
    for team in team_abbr:
        print('team: ',team,'year: ',year,'Start at: ',datetime.now().strftime("%d/%m/%Y %H:%M:%S"))
        for player in Roster(team, year=year).players:
            lst_players.append(player)  
            player_team.append(team)
    
    # empty df to hold the stats for each player over the year
    stats = pd.DataFrame()
    print('populating stats df: ',datetime.now().strftime("%d/%m/%Y %H:%M:%S"))
    for player,team_abbr in list(zip(lst_players,player_team)):
        df = pd.concat([
            player.dataframe.reset_index(),
            pd.DataFrame({'name':[player.name]*player.dataframe.shape[0]}),
            pd.DataFrame({'id':[player.player_id]*player.dataframe.shape[0]}),
            pd.DataFrame({'team':[team_abbr]*player.dataframe.shape[0]})
        ],axis=1)
        stats = pd.concat([stats,df],axis = 0)
    
    # create csv
    stats.to_csv('nba_'+str(year)+'.csv', index = False)

## Create table from output

**Create nba_player**

In [25]:
files = os.listdir('01_raw_csv_files')
files_csv = [f for f in files if f[-3:] == 'csv' and f[:3]=='nba']

nba_player = pd.DataFrame()
for file in files_csv:
    df = pd.read_csv('01_raw_csv_files/'+file)
    nba_player = pd.concat([nba_player,df.loc[:,['player_id','name']].drop_duplicates().reset_index(drop=True)])
nba_player = nba_player.drop_duplicates().reset_index(drop= True)    

In [26]:
nba_player.to_csv('02_database/nba_player.csv',index=False)

**Create nba_player_stats**

In [20]:
files = os.listdir('01_raw_csv_files')
files_csv = [f for f in files if f[-3:] == 'csv' and f[:3]=='nba']

nba_player_stats = pd.DataFrame()
for file in files_csv:
    df = pd.read_csv('01_raw_csv_files/'+file)
    nba_player_stats = pd.concat([nba_player_stats,df.loc[(df.level_0 != 'Career'),:].drop(columns = 'name')])
nba_player_stats = nba_player_stats.drop_duplicates().reset_index(drop= True)   
nba_player_stats.rename(columns = {'level_0':'season'},inplace = True)

In [21]:
nba_player_stats.to_csv('02_database/nba_player_stats.csv',index=False)

**Create nba_player_career_stats**  
At the moment this seems to be a table that will need to be updated since there is no season variable.  
So for a Freshman, this table will need to be updated the following year. Not sure if this will be a useful table

In [22]:
files = os.listdir('01_raw_csv_files')
files_csv = [f for f in files if f[-3:] == 'csv' and f[:3]=='nba']

nba_player_career_stats = pd.DataFrame()
for file in files_csv:
    df = pd.read_csv('01_raw_csv_files/'+file)
    nba_player_career_stats = pd.concat([nba_player_career_stats,df.loc[(df.level_0 == 'Career'),:].drop(columns = 'name')])
nba_player_career_stats = nba_player_career_stats.drop_duplicates().reset_index(drop= True)   
nba_player_career_stats.rename(columns = {'level_0':'season'},inplace = True)

In [24]:
nba_player_career_stats.to_csv('02_database/nba_player_career_stats.csv',index=False)