### **We are creating a betting app, that hopefully helps the user place smarter bets. How do we do this?**
###### let's think ##

### Bets are based on what? ###
- Team stats (+/- OU, ML)
- Player Stats (Pts, Reb, Ast)
- Game-based Stats (First to X, First-half)
- Stat Combinations (Pts+Reb, Double-Double)

We will need a way to:
1. Intake stats from sources (Web, CSV, SQL, etc.)
2. Relational db of players with stats
3. Analyze via functions

Keeping simplicity in mind, the easiest functionality we can code is accessing a single player's stats (we'll use points) for the current 2022 NBA Season.

CSV's would be an easy choice, but it will be outdated as the season progresses. 
We could manually code a web-scraper, but this would require a long time investment and time is valuable.
API commands to pull from a source are a good compromise.

For now, let's deal with CSV's.

In [None]:
import pandas as pd # Module Importing
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('/Users/kareemtaha/Downloads/2022-2023 NBA Player Stats - Regular.csv', sep=";", index_col=0, encoding = 'latin') # the csv uses colons (;) instead of commas (,) to separate values
#df = df.drop(columns="Rk") # drop unnecessary Rk column. Useful if we used player names as our index, but that complicates functions later, so we just leave RK.
df

Great! We have our players this season, their teams, and their stats. Let's return one player.

In [None]:
df.query('Player=="Marcus Smart"') # via query
df[df['Player'] == 'Marcus Smart'] # via series (same code)

In [None]:
All_Stats = df.columns.values.tolist()
Qual_Stats = ["Pos", "Tm"]
Quant_Stats = ["Age", "G", "GS", "MP", "PTS", "FG", "FGA", "FG%", "3P", "3PA", "3P%", "2P","2PA","2P%","eFG%","FT","FTA","FT%","ORB","DRB","TRB","AST","STL","BLK","TOV","PF"]

In [None]:
def parse_player_df(d, players, stats = Quant_Stats.copy()):
    if stats != None: #User specifies certain stats (non-default)
        stats = stats.copy()
        try: assert "Player" in stats #Makes sure that when we shrink the df to those stats, we don't accidentally remove the Player Name col (needed for compares)
        except AssertionError: stats.append("Player") 
        d = d[stats]
    return d[d['Player'].isin(players)]

pd.concat([parse_player_df(df, ["Damian Lillard", "Stephen Curry", "Kevin Durant"], All_Stats), 
            parse_player_df(df, ["LeBron James"], All_Stats), 
            parse_player_df(df, ["Josh Hart", "Nassir Little"], All_Stats)])

We can now compare two players current league stats. Neat.

Next, let's see some visuals since I am tired of seeing words and text.

In [None]:
def Visualize_Stats(d, players, stats = Quant_Stats, to_list = False, stacked_bar = False):
    try: assert len(stats) > 0# Stats Setup - Ensure we don't accidentally remove the Player Name col (needed for compares) {
    except AssertionError: stats = ["PTS"] # .. 

    stats = stats.copy()
    d = d[["Player"] + stats]                  # .. - shrink to stats (retain player to search their stats later)                          # ..
    d = parse_player_df(d, players, stats = stats)
    return_stats = []

    num_players = len(players)                 # .. } Stats Setup
    if num_players > 1 and not stacked_bar: fig, ax = plt.subplots(num_players, figsize=(15,3.5 * num_players))
    else: plt.figure(figsize=(15,3.5))

    if stacked_bar:
        plt.title(f"{players} stat comparison:")



    for i in range(num_players):
        name = players[i]
        player_df = d[d['Player'] == name]   
        player_stats = []  

        for stat in stats:
            value = player_df[stat].iat[0]
            if "%" in stat: value *= 100
            if type(value) != str: value = round(value,2);player_stats.append(value)

        if stacked_bar:                 # If graph is stacked
            b = plt.bar(x = stats, height = player_stats)
            plt.bar_label(b, player_stats) # Add player stats to the bars

        else:
            match num_players: # Non-stacked graph: Check how many players
                case 1: # 1 player : Use one graph
                    plt.title(f"{name}'s stats:")
                    b = plt.bar(x = stats, height = player_stats) 
                    plt.bar_label(b, player_stats)#if !to_list: (complete this if block if you want to_list to not also graph stats
                case _: # 2+ players : Use 2+ graphs
                    ax[i].set_title(f"{name}'s stats:")
                    b = ax[i].bar(x=stats, height=player_stats)
                    ax[i].bar_label(b, player_stats)#if !to_list: (complete this if block if you want to_list to not also graph stats
                    #ax[i].legend()
        
        return_stats.append([name] + player_stats) # Add player stats to all stats to return
    
    plt.legend(players)
    if to_list: return return_stats

#Visualize_Stats(df, players = ["Damian Lillard"])
#Visualize_Stats(df, players = ["Damian Lillard"], to_list=True)
Visualize_Stats(df, players = ["Damian Lillard", "Josh Hart"], stats=Quant_Stats, stacked_bar=False)
#f = Visualize_Stats(df, players = ["Damian Lillard", "Stephen Curry"], to_list=True)
#f[1]


We can now visualize a player's stats and compare them to someone else's.
## What do we do now?

###### Great question

First let's create a function that let's us live compare players: 

In [None]:
def Live_Compare(d, stacked_bar = False):
    selected = []
    players = d['Player'].values
    player = input("Which player's stats would you like to see? (' ' to terminate)")

    while player in players:
        selected.append(player)
        player = input("Which player's stats would you like to see? (' ' to terminate)")
    
    if len(selected) > 0: Visualize_Stats(d, selected, stacked_bar = stacked_bar)
    else: print("No players given!")

#Compare_Players(df)
Live_Compare(df, True)




Not bad, but now we need to work with more insightful data than season averages if we want to do advanced computations. Let's get game-by-game info's for players (We'll use LeBron's stats from (https://www.basketball-reference.com/players/j/jamesle01/gamelog/2023)).

In [None]:
lebron_2023_html = pd.read_html("https://www.basketball-reference.com/players/j/jamesle01/gamelog/2023#pgl_basic", match="Regular Season")[0].copy() # returns html tables list
lebron_2023_html

lebron_2023 = lebron_2023_html.copy() # re-copy from the html table to not accidentally overwrite data
lebron_2023.rename(columns = {'Unnamed: 5':'Away', 'Unnamed: 7': 'Result'}, inplace = True) # Rename columns properly
lebron_2023.set_index('Rk', inplace = True) # Set the Rk number (team game) as the index
lebron_2023['Away'].mask(lebron_2023['Away'] == '@', True, inplace=True) ; lebron_2023['Away'] = lebron_2023['Away'].fillna(False) # replace values in away column with bool
lebron_2023 = lebron_2023.dropna(subset=['G']) # drop unplayed games
lebron_2023 = lebron_2023[lebron_2023['G'] != 'G']
lebron_2023

And the same for career stats. We need to get the years that LeBron has played, so we know which years to look up his per-game stats. We will extend this to every player.

In [None]:
def get_player_career_stats(player_tag): return pd.read_html(f"https://www.basketball-reference.com/players/{player_tag[0]}/{player_tag}.html#per_game")[0].copy()

lebron_tag = "jamesle01"
lebron_career_stats = get_player_career_stats(lebron_tag)
lebron_career_stats

In [None]:
lebron_career = lebron_career_stats.copy()                                             # Get the per-game stats for the player's career
lebron_career.dropna(subset=['Age'], inplace=True)                                    # drop unplayed seasons (bloater rows)
lebron_seasons = lebron_career['Season'].apply(lambda x: int(x[0:2] + x[5:7])).values # get the seasons he played
lebron_seasons

Given an HTML link, we've learned how to parse the table, clean up the data into a readable format, and locate useful information to finding even further information. Given a player's NBA URL-specific tag, we can find their seasons and corresponding stats for that respective season, then clean that data in a DataFrame format.

In [None]:
def get_player_seasons_list(player_tag):
    df = pd.read_html(f'https://www.basketball-reference.com/players/{player_tag[0]}/{player_tag}.html#per_game')[0].copy()
    df.dropna(subset=['Age'], inplace=True) # drop unplayed seasons, which is just bloater rows
    return df['Season'].apply(lambda x: int(x[0:2] + x[5:7])).values # get the seasons he played

def get_player_gamelogs(player_tag, season):
    return pd.read_html(f"https://www.basketball-reference.com/players/{player_tag[0]}/{player_tag}/gamelog/{season}#pgl_basic", match="Regular Season")[0].copy() # returns html tables list

lebron_seasons_list = get_player_seasons_list(lebron_tag)
lebron_season4 = get_player_gamelogs(lebron_tag, lebron_seasons_list[3])
lebron_season4

def clean_gamelogs(d, season_stats = False):
    d = d.copy() # Copy the original df to avoid errors
    d.dropna(subset=['Age'], inplace=True)
    d = d[d['Age'] != 'Age'] # = d[d["Rk"] != "Rk"] # Remove extra rows (there are rows that repeat the original column headers, but do not contain values (because the table is 80+ rows))
    if not season_stats: 
        d.rename(columns = {'Rk':'Game', 'G':'Played','Unnamed: 5':'Away', 'Unnamed: 7': 'Result'}, inplace = True) # rename column headers
        d.set_index('Game', inplace = True) # Set the game # as the index
        d['Away'].mask(d['Away'] == '@', True, inplace=True);d['Away'] = d['Away'].fillna(False) # replace values in away column with bool
        d.dropna(subset=['Played'], inplace=True) # drop unplayed games
    return d

clean_gamelogs(lebron_season4)

We can now get a player's career stats AND season stats given the proper html links. But how do we get those links? My approach is to use HTML parsing to take them from each Team's pages.

In [None]:
from bs4 import BeautifulSoup
import urllib.request
import re
import time
def pull_player_tags(roster_htmls):

    player_tags = {}
    pattern = '/[a-zA-Z]*/[a-zA-Z]/([a-zA-Z\d]*)'

    for roster in roster_htmls:
        html_page = urllib.request.urlopen(roster)
        soup = BeautifulSoup(html_page, "html.parser")

        for table in soup.findAll('table', attrs={'id': 'per_game'}):
            for tbody in table.findAll('tbody'):
                for tr in tbody.findAll('tr'):
                    for td in tr.findAll('td', attrs={'data-stat': ['player']}):
                        for a in td.findAll('a'):
                            player = a.text
                            link = a['href']
                            link = re.findall(pattern, link)[0]
                    player_tags[player] = link
    return player_tags

POR_link = "https://www.basketball-reference.com/teams/POR/2023.html#roster"
ATL_link = "https://www.basketball-reference.com/teams/ATL/2023.html#all_roster"
POR_ATL_roster = pull_player_tags([POR_link, ATL_link])
POR_ATL_roster

In [None]:
def pull_league_tags(return_html = True):

    league_rosters = {}
    pattern = '/[a-zA-Z]*/([a-zA-Z]*)/'

    html_page = urllib.request.urlopen('https://www.basketball-reference.com/leagues/NBA_2023_standings.html')
    soup = BeautifulSoup(html_page, "html.parser")

    for table in soup.findAll('table', attrs={'id': ['confs_standings_E', 'confs_standings_W']}):
        for tbody in table.findAll('tbody'):
            for tr in tbody.findAll('tr'):
                for th in tr.findAll('th', attrs={'data-stat': ['team_name']}):
                    for a in th.findAll('a'):
                        team = a.text
                        team_link = a['href']
                        team_link = re.findall(pattern, team_link)[0]
                        if return_html: team_link = f'https://www.basketball-reference.com/teams/{team_link}/2023.html#roster'
                league_rosters[team] = team_link
    return league_rosters

Rosters = pull_league_tags()

temp = {}
for team, link in Rosters.items(): temp = temp | pull_player_tags([link]) ; time.sleep(0.25)
league_tags = temp.copy()

In [None]:
def get_player_tags(players): return [league_tags.get(key) for key in players]
get_player_tags(['Damian Lillard','Josh Hart','Aaron Gordon'])

Everything is starting to come full circle, but we need to easily access roster pages for different teams if we want to pull their pages.

In [None]:
def produce_player_gamelogs(players, seasons):
    player_dfs = []
    multiple = len(players) > 1
    for player in players:
        player_tag = get_player_tags([player])[0]
        for season in seasons:
            season_games = get_player_gamelogs(player_tag, season)
            clean_df = clean_gamelogs(season_games)
            clean_df['Season'] = season
            if multiple: clean_df['Player'] = player
            player_dfs.append(clean_df)
    return pd.concat(player_dfs)
produce_player_gamelogs(['Damian Lillard', 'LeBron James'], ['2021'])

In [None]:
def produce_player_career_stats(players):
    player_dfs = []
    multiple = len(players) > 1
    for player in players:
        player_tag = get_player_tags([player])[0]
        career_stats = get_player_career_stats(player_tag)
        clean_df = clean_gamelogs(career_stats, True)
        if multiple: clean_df['Player'] = player
        player_dfs.append(clean_df)
    return pd.concat(player_dfs)
produce_player_career_stats(['Damian Lillard', 'LeBron James'])

In [None]:
def Super_Simulate(league_tags):
    NBA_players = league_tags.keys()

    selected_players = []
    player = input("Which player's stats would you like to see? (' ' to terminate)")

    while player in NBA_players:
        selected_players.append(player)
        player = input("Which other players' stats would you like to see? (' ' to terminate)")
    
    selected_stats = []
    stat = input("Which stat would you like to see? (' ' to terminate)")
    while stat != ' ':
        selected_stats.append(stat)
        stat = input("Which other stats would you like to see? (' ' to terminate)")

    selected_seasons = []
    season = input("Which season would you like to see? (' ' to terminate)")
    while season != ' ':
        selected_seasons.append(season)
        season = input("Which other seasons would you like to see? (' ' to terminate)")

    if len(selected_players) < 1: print("No players given!")
    if len(selected_stats) < 1: print("No stats given!")
    if len(selected_seasons) < 1: print("No seasons given!")

    cs = produce_player_career_stats(selected_players)
    reduced = cs[cs['Season'].apply(lambda x: x[0:4]).isin(selected_seasons)]
    Visualize_Stats(d = reduced, players = selected_players, stats = selected_stats, to_list = False, stacked_bar = False)
    print(reduced)

#Compare_Players(df)
Super_Simulate(league_tags)