In [50]:
import numpy as np
import pandas as pd
import requests
import time
from bs4 import BeautifulSoup

In [34]:
url = f"https://www.basketball-reference.com/leagues/NBA_2024_per_game.html#per_game_stats::pts_per_g"
player_df = pd.read_html(url)[0]
player_df

Unnamed: 0,Rk,Player,Age,Team,Pos,G,GS,MP,FG,FGA,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Awards
0,1.0,Joel Embiid,29.0,PHI,C,39.0,39.0,33.6,11.5,21.8,...,2.4,8.6,11.0,5.6,1.2,1.7,3.8,2.9,34.7,AS
1,2.0,Luka Dončić,24.0,DAL,PG,70.0,70.0,37.5,11.5,23.6,...,0.8,8.4,9.2,9.8,1.4,0.5,4.0,2.1,33.9,"MVP-3,CPOY-6,AS,NBA1"
2,3.0,Giannis Antetokounmpo,29.0,MIL,PF,73.0,73.0,35.2,11.5,18.8,...,2.7,8.8,11.5,6.5,1.2,1.1,3.4,2.9,30.4,"MVP-4,DPOY-9,CPOY-12,AS,NBA1"
3,4.0,Shai Gilgeous-Alexander,25.0,OKC,PG,75.0,75.0,34.0,10.6,19.8,...,0.9,4.7,5.5,6.2,2.0,0.9,2.2,2.5,30.1,"MVP-2,DPOY-7,CPOY-3,AS,NBA1"
4,5.0,Jalen Brunson,27.0,NYK,PG,77.0,77.0,35.4,10.3,21.4,...,0.6,3.1,3.6,6.7,0.9,0.2,2.4,1.9,28.7,"MVP-5,CPOY-5,AS,NBA2"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
731,569.0,Ron Harper Jr.,23.0,TOR,PF,1.0,0.0,4.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,
732,570.0,Justin Jackson,28.0,MIN,SF,2.0,0.0,0.5,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
733,571.0,Dmytro Skapintsev,25.0,NYK,C,2.0,0.0,1.0,0.0,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
734,572.0,Javonte Smart,24.0,PHI,PG,1.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


## What is Goal of this analysis
The goal is to see what players team most rely on their personal shooting throughut each game. This is sparked from the idea of consistency of NBA players. My next analysis will be on consistency directy, basically comparing the standard deviation of a players stats over a season. 

The players with the widest spread in their stats will be deemed inconsistent. For example, if a player shot 40% from 3 but only ever shot 0/5 or 5/5 from 3 every game that would be extremely inconsistent. A plyer who shot 40% bu shot 2/5 every game would deemed very consistent. 

## Why use 35% for this stat. 
This is basically the minimum average across the NBA for every year of the last 15 years. The lowest was 34.9%, so I'd say that anytime a player shoots above 35% for a game that is an average to above average game of shooting, while below that wuld be below average game. 

## Why use data from multiple years
The data is bound to be very volatile in the sense that it must be a small sample size. Getting data from across multiple years can determine if the teams win percentage was simply due to luck or if there truly is a strong correlation. 

Julius Randle will be a focal point of this analysis, and one thing we will notice is that last year was an outlier in his impact, the teams record by 33% better when he shot good. This could be due to luck, or a variety of factors outside of luck, for example the talent level of the team outside of Randle, as well as his consistency. If his lows were extra low this season, that would certainly impact winning more. 

## Another thing I want to do
I am also curious to see what players are most impacts by poor 3 point shooting nights. For now I have added the game score attribute to the table where we can see a players average game score when they shoot bad from 3 or good from 3. I expect to see interesting results that are likely more correlated with wins and losses than 3 point percentage, but I also expect to see some players who have a much stronger correlation with bad 3 pt percentage and bad game score than others. 

This would be an analysis of "being in a slump". Basically how negatively are you impacted when your shot isn't fallling, specifically three point shot. Randle was a huge culprit of this early on in his knicks years, showing a very bad attitude when his shot was not falling, possibly affecting the rest of his game, but obviously unclear until we do the analysis

In [32]:
players = player_df.loc[player_df['3PA'] > 4.5, 'Player'] # only players averaging over 4.5 3PA per game

# Remove duplicate player names
unique_players = players.drop_duplicates()

# Create a list of unique player names
players_list = unique_players.tolist()

# Initialize a dictionary with player names as keys and empty dictionaries as values
players_dict = {player: {} for player in players_list}

# Display the players list and the initialized dictionary
print(players_list)
print(players_dict)

['Luka Dončić', 'Jalen Brunson', 'Devin Booker', 'Kevin Durant', 'Jayson Tatum', "De'Aaron Fox", 'Donovan Mitchell', 'Stephen Curry', 'Anthony Edwards', 'Tyrese Maxey', 'LeBron James', 'Trae Young', 'Kyrie Irving', 'Ja Morant', 'Damian Lillard', 'Julius Randle', 'LaMelo Ball', 'Desmond Bane', 'Kawhi Leonard', 'Lauri Markkanen', 'Jaylen Brown', 'Cade Cunningham', 'Paul George', 'Anfernee Simons', 'Jaren Jackson Jr.', 'Dejounte Murray', 'Cam Thomas', 'Kyle Kuzma', 'Karl-Anthony Towns', 'Victor Wembanyama', 'Jamal Murray', 'Miles Bridges', 'Jerami Grant', 'Tyler Herro', 'RJ Barrett', 'Tyrese Haliburton', 'Kristaps Porziņģis', 'CJ McCollum', 'Scottie Barnes', 'Terry Rozier', 'Franz Wagner', 'Mikal Bridges', 'Jalen Green', 'Zach LaVine', 'Devin Vassell', 'Coby White', 'Darius Garland', "D'Angelo Russell", 'Klay Thompson', 'Jordan Poole', 'Fred VanVleet', 'Brandon Miller', 'Jordan Clarkson', 'Immanuel Quickley', 'Bogdan Bogdanović', 'Michael Porter Jr.', 'James Harden', 'Austin Reaves', 'Sha

In [58]:
import requests
from bs4 import BeautifulSoup

# Function to fetch the player's name from the page title
def get_player_name_from_url(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Ensure the request was successful
        soup = BeautifulSoup(response.text, 'html.parser')
        title = soup.find('title').text  # Get the page title
        # Extract the player's name from the title
        player_name_in_title = title.split(" Game Log")[0].strip()
        return player_name_in_title
    except Exception as e:
        print(f"Error fetching page title: {e}")
        return None

In [66]:
players_list = ['Luka Doncic', 'Jalen Brunson', 'Devin Booker', 'Kevin Durant', 'Jayson Tatum', "De'Aaron Fox", 'Donovan Mitchell', 'Stephen Curry', 'Anthony Edwards', 'Tyrese Maxey', 'LeBron James', 'Trae Young', 'Kyrie Irving', 'Ja Morant', 'Damian Lillard', 'Julius Randle', 'LaMelo Ball', 'Desmond Bane', 'Kawhi Leonard', 'Lauri Markkanen', 'Jaylen Brown', 'Cade Cunningham', 'Paul George', 'Anfernee Simons', 'Jaren Jackson Jr.', 'Dejounte Murray', 'Cam Thomas', 'Kyle Kuzma', 'Karl-Anthony Towns', 'Victor Wembanyama', 'Jamal Murray', 'Miles Bridges', 'Jerami Grant', 'Tyler Herro', 'RJ Barrett', 'Tyrese Haliburton', 'Kristaps Porziņģis', 'CJ McCollum', 'Scottie Barnes', 'Terry Rozier', 'Franz Wagner', 'Mikal Bridges', 'Jalen Green', 'Zach LaVine', 'Devin Vassell', 'Coby White', 'Darius Garland', "DAngelo Russell", 'Klay Thompson', 'Jordan Poole', 'Fred VanVleet', 'Brandon Miller', 'Jordan Clarkson', 'Immanuel Quickley', 'Bogdan Bogdanović', 'Michael Porter Jr.', 'James Harden', 'Austin Reaves', 'Shaedon Sharpe', 'Malcolm Brogdon', 'Keldon Johnson', "De'Andre Hunter", 'Donte DiVincenzo', 'Jaden Ivey', 'Malik Monk', 'Kelly Oubre Jr.', 'Bojan Bogdanović', 'Keegan Murray', 'Derrick White', 'Trey Murphy III', 'OG Anunoby', 'GG Jackson II', 'Marcus Smart', 'Tim Hardaway Jr.', 'Dennis Schroder', 'Caris LeVert', 'Norman Powell', 'Saddiq Bey', 'Jabari Smith Jr.', 'Gary Trent Jr.', 'Grayson Allen', 'Naz Reid', 'Cameron Johnson', 'Corey Kispert', 'Keyonte George', 'PJ Washington', 'Duncan Robinson', 'Dillon Brooks', 'Jalen Suggs', 'Jrue Holiday', 'Brook Lopez', 'Cam Whitmore', 'Harrison Barnes', 'Aaron Nesmith', 'Max Strus', 'Buddy Hield', 'Mike Conley', 'Malik Beasley', "De'Anthony Melton", 'Eric Gordon', 'Luke Kennard', 'Luguentz Dort', 'Santi Aldama', 'Simone Fontecchio', 'Spencer Dinwiddie', 'Alec Burks', 'Dalano Banton', 'Kevin Huerter', 'Alex Caruso', 'Lonnie Walker IV', 'Payton Pritchard', 'Georges Niang', 'Sam Hauser', 'Taurean Prince', 'Dorian Finney-Smith', 'Sam Merrill', "Royce O'Neale", 'Quentin Grimes', 'Evan Fournier', 'Davis Bertans']
players_dict = {player: {} for player in players_list}

for player in players_list:
    for i in range(2021, 2025):
        first_name, last_name = player.split()[:2]  # Only use first and last name
        last_part = last_name[:5].lower()
        first_part = first_name[:2].lower()
        
        # Try with 01, then 02 if necessary
        for suffix in ['01', '02', '03']:
            formatted_name = f"{last_part}{first_part}{suffix}"
            url = f"https://www.basketball-reference.com/players/r/{formatted_name}/gamelog/{i}"
            
            # Fetch the player name from the page title
            player_name_in_title = get_player_name_from_url(url)
            print(player_name_in_title)
            time.sleep(4)
            if player_name_in_title and player_name_in_title == player:
                print(f"Correct player found: {player_name_in_title}")
                try:
                    # If the name matches, fetch the game log data
                    df = pd.read_html(url)[7]  # Fetch the data
                    df['Win/Loss'] = df['Unnamed: 7'].apply(lambda x: str(x)[0])
                    df = df.drop(columns='Unnamed: 7')
                    df['3P%'] = pd.to_numeric(df['3P%'], errors='coerce')

                    over_35 = df.loc[df['3P%'] > .35, :]
                    under_35 = df.loc[df['3P%'] < .35, :]

                    players_dict[player][f'{i}'] = {}
                    players_dict[player][f'{i}']['3PA'] = df['3PA']
                    players_dict[player][f'{i}']['3P'] = df['3P']
                    players_dict[player][f'{i}']['3P%'] = df['3P%']
                    players_dict[player][f'{i}']['Win_Percent when over 35%']['Average Game_Score'] = over_35['GmSc'].sum() / over_35.shape[0]
                    players_dict[player][f'{i}']['Win_Percent when under 35%']['Average Game_Score'] = under_35['GmSc'].sum() / under_35.shape[0]

    
                    if over_35.shape[0] > 0:
                        players_dict[player][f'{i}']['Win_Percent when over 35%'] = 100 * (over_35.loc[over_35['Win/Loss'] == 'W'].shape[0] / over_35.shape[0])
                    else:
                        players_dict[player][f'{i}']['Win_Percent when over 35%'] = None  # or 0 if you prefer
        
                    if under_35.shape[0] > 0:
                        players_dict[player][f'{i}']['Win_Percent when under 35%'] = 100 * (under_35.loc[under_35['Win/Loss'] == 'W'].shape[0] / under_35.shape[0])
                    else:
                        players_dict[player][f'{i}']['Win_Percent when under 35%'] = None  # or 0 if you prefer
        
                    players_dict[player][f'{i}']['games_played over 35%'] = over_35.shape[0]
                    players_dict[player][f'{i}']['games_played under 35%'] = under_35.shape[0]
                
                except Exception as e:
                    print(f"Error processing {player} for {i}: {e}")
        
        time.sleep(4)
  

Error processing Luka Doncic for 2021: HTTP Error 429: Too Many Requests
Error processing Luka Doncic for 2022: HTTP Error 429: Too Many Requests


KeyboardInterrupt: 

In [52]:
records = []

# Loop through each player and year, and create a flattened dictionary for each year
for player, years_data in players_dict.items():
    record = {'Player': player}
    for year, stats in years_data.items():
        # For each stat in that year, add it to the record with year-specific key
        for stat, value in stats.items():
            record[f'{year}_{stat}'] = value
    records.append(record)

# Convert the list of records into a DataFrame
df = pd.DataFrame(records)
df['2024_difference'] = df['2024_Win_Percent when over 35%'] - df['2024_Win_Percent when under 35%']
df['2023_difference'] = df['2023_Win_Percent when over 35%'] - df['2023_Win_Percent when under 35%']
df['2022_difference'] = df['2022_Win_Percent when over 35%'] - df['2022_Win_Percent when under 35%']
df['2021_difference'] = df['2021_Win_Percent when over 35%'] - df['2021_Win_Percent when under 35%']

df

KeyError: '2024_Win_Percent when over 35%'

In [49]:




df

Unnamed: 0,Rk,G,Date,Age,Tm,Unnamed: 5,Opp,GS,MP,FG,...,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,+/-,Win/Loss
0,1,1,2023-10-25,28-330,NYK,,BOS,1,34:01,5,...,11,7,2,0,0,3,14,9.6,-13,L
1,2,2,2023-10-27,28-332,NYK,@,ATL,1,34:15,4,...,12,9,0,0,3,3,17,17.3,+7,W
2,3,3,2023-10-28,28-333,NYK,@,NOP,1,33:42,4,...,12,4,0,0,8,4,10,-2.1,-11,L
3,4,4,2023-10-31,28-336,NYK,@,CLE,1,31:23,5,...,10,2,1,0,1,1,19,15.2,+18,W
4,5,5,2023-11-01,28-337,NYK,,CLE,1,35:15,3,...,6,4,1,0,3,2,6,-1.5,-4,L
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,79,,2024-04-09,29-132,NYK,@,CHI,Inactive,Inactive,Inactive,...,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,W
82,80,,2024-04-11,29-134,NYK,@,BOS,Inactive,Inactive,Inactive,...,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,W
83,Rk,G,Date,Age,Tm,,Opp,GS,MP,FG,...,TRB,AST,STL,BLK,TOV,PF,PTS,GmSc,+/-,n
84,81,,2024-04-12,29-135,NYK,,BRK,Inactive,Inactive,Inactive,...,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,Inactive,W


In [57]:
df['3P%'] = pd.to_numeric(df['3P%'], errors = 'coerce')

In [81]:
Ju_over_35 = df.loc[df['3P%'] > .35, :].iloc[:, 14:]
Ju_under_35 = df.loc[df['3P%'] < .35, :].iloc[:, 14:]

In [89]:
print('win_percent', Ju_over_35.loc[Ju_over_35['Win/Loss'] == 'W', :].shape[0] / Ju_over_35.shape[0], 'games_played', Ju_over_35.shape[0])

win_percent 0.8333333333333334 games_played 18


In [91]:
print('win_percent', Ju_under_35.loc[Ju_under_35['Win/Loss'] == 'W', :].shape[0] / Ju_under_35.shape[0], 'games_played', Ju_under_35.shape[0])

win_percent 0.5 games_played 28
