## Phase II Project Proposal
### What makes an NBA team successful?

#### Names: 
#### Madhav Nair
#### Nicholas Umansky

1. (1\%) Expresses the central motivation of the project and explains the (at least) two key questions to be explored. Gives a summary of the data processing pipeline so a technical expert can easily follow along.
2. (2\%) Obtains, cleans, and merges all data sources involved in the project.
3. (2\%) Builds at least two visualizations (graphs/plots) from the data which help to understand or answer the questions of interest. These visualizations will be graded based on how much information they can effectively communicate to readers. Please make sure your visualization are sufficiently distinct from each other.
   

### Context

The central motivation of the project is to determine whether certain player or team statistics can be used to determine a team's performance. We essentially want to know what parts of an NBA team make it successful in either having a winning record or winning the championship. The first question we wanted to explore is which specific performance metrics (such as points per game, rebounds, assists, etc.) are most strongly associated with a winning record? The second is can these statistics, alongside categorical factors like player position or team affiliation, be used to predict game outcomes or player performance trends? These questions can be very useful because they can offer valuable insights to coaches, analysts, and fans. The first question can inform team owners and game managers as to what type of players would bring success to their team, whether that is scorers, rebounders, or defensive players. The second question could be useful to understand the future of a team for a particular season. For dataframes including player statistics, we used the following data processing pipeline:
1. Make an API call querrying player statistics for a certain team during a certain season
2. Clean the data by only collecting each player's points, assists, and rebounds from the list of statistics
3. Calculate the average of each statistic
4. Put all the averages in the dataframe along with the player's name and id

For dataframes including team statistics, we used the following data processing pipeline:
1. Make an API call: Query the team statistics endpoint for a specified season (and stage) to retrieve each team's performance metrics.

2. Clean the data: Extract only the key performance metrics (e.g., points per game, rebounds per game, assists per game, field goal percentage, plus-minus, turnovers, blocks, and steals) from the API response.

3. Validate and format the data: Make sure the data is consistent by handling missing or zero values and converting data types if necessary.

4. Add categorical information: Append additional team details, such as team name, team ID, and division to the cleaned performance metrics.

5. Load and store the final data: Compile the processed statistics into a DataFrame and export it as a CSV file for  analysis.

# Player Averages from the Top 5 Teams with the Best Records

In [3]:
import requests
import pandas as pd

url = "https://api-nba-v1.p.rapidapi.com/players/statistics"
headers = {
    "x-rapidapi-key": "fa660d5be7mshcf5c901c4fd3687p1ce7d1jsn118999c1f0fc",
    "x-rapidapi-host": "api-nba-v1.p.rapidapi.com"
}

def querry_player_stats(team_number, year):
    team_id = f"{team_number}"
    season = f"{year}"
    querystring = {"team":team_id,"season":season}
    response = requests.get(url, headers=headers, params=querystring).json()
    return response

In [22]:
def find_player_averages(data):
    df = pd.json_normalize(data['response'])
    for col in ['points', 'totReb', 'assists']: df[col] = pd.to_numeric(df[col], errors='coerce')
    player_avg = df.groupby( ['player.id', 'player.firstname', 'player.lastname'] )[['points', 'totReb', 'assists']].mean().reset_index()
    player_avg = player_avg.rename(columns={'player.id':'Player ID', 'player.firstname':'First Name', 'player.lastname': 'Last Name', 'points': 'Points', 'totReb': 'Rebounds', 'assists': 'Assists'})
    return player_avg

In [23]:
celtics_player_stats = querry_player_stats(2, 2024)
celtics_player_averages = find_player_averages(celtics_player_stats)
celtics_player_averages

Unnamed: 0,Player ID,First Name,Last Name,Points,Rebounds,Assists
0,75,Jaylen,Brown,22.433962,5.792453,4.566038
1,242,Jrue,Holiday,10.58,4.14,3.48
2,248,Al,Horford,7.808511,5.468085,1.829787
3,432,Kristaps,Porzingis,18.9375,6.78125,1.9375
4,765,Torrey,Craig,3.333333,1.666667,0.666667
5,819,Luke,Kornet,5.490909,5.127273,1.618182
6,882,Jayson,Tatum,26.016667,8.5,5.733333
7,897,Derrick,White,16.05,4.25,4.183333
8,1038,Lonnie,Walker IV,7.25,1.5,2.5
9,2635,Payton,Pritchard,14.222222,3.793651,3.68254


In [28]:
thunder_player_stats = querry_player_stats(25, 2024)
thunder_player_averages = find_player_averages(thunder_player_stats)
thunder_player_averages

Unnamed: 0,Player ID,First Name,Last Name,Points,Rebounds,Assists
0,631,Alex,Caruso,5.904762,2.738095,2.571429
1,972,Shai,Gilgeous-Alexander,31.229508,5.081967,5.967213
2,978,Isaiah,Hartenstein,10.785714,11.119048,3.857143
3,1044,Kenrich,Williams,5.765957,3.191489,1.12766
4,2040,Luguentz,Dort,9.241379,4.051724,1.62069
5,2604,Isaiah,Joe,9.448276,2.586207,1.413793
6,2863,Aaron,Wiggins,10.968254,3.698413,1.634921
7,3432,Ousmane,Dieng,4.771429,2.914286,1.314286
8,3448,Chet,Holmgren,14.75,7.8,1.7
9,3504,Jalen,Williams,20.516667,5.266667,4.9


In [29]:
nuggets_player_stats = querry_player_stats(9, 2024)
nuggets_player_averages = find_player_averages(nuggets_player_stats)
nuggets_player_averages

Unnamed: 0,Player ID,First Name,Last Name,Points,Rebounds,Assists
0,195,Aaron,Gordon,12.564103,4.794872,3.25641
1,279,Nikola,Jokic,28.142857,12.357143,10.017857
2,286,DeAndre,Jordan,3.422222,4.733333,0.844444
3,383,Jamal,Murray,20.345455,3.636364,5.8
4,468,Dario,Saric,4.684211,3.789474,1.684211
5,544,Russell,Westbrook,12.759259,4.814815,6.277778
6,1014,Michael,Porter Jr.,18.440678,6.59322,2.067797
7,1312,Vlatko,Cancar,3.222222,2.0,0.888889
8,2627,Zeke,Nnaji,3.357143,1.642857,0.285714
9,3420,Christian,Braun,14.33871,5.016129,2.177419


In [32]:
timberwolves_player_stats = querry_player_stats(22, 2024)
timberwolves_player_averages = find_player_averages(timberwolves_player_stats)
timberwolves_player_averages

Unnamed: 0,Player ID,First Name,Last Name,Points,Rebounds,Assists
0,114,Mike,Conley,7.962264,2.54717,4.377358
1,192,Rudy,Gobert,10.827586,10.37931,1.793103
2,258,Joe,Ingles,1.285714,1.0,1.571429
3,441,Julius,Randle,18.77551,7.142857,4.469388
4,770,P.J.,Dozier,1.538462,0.538462,0.692308
5,934,Keita,Bates-Diop,5.0,2.5,0.0
6,962,Donte,DiVincenzo,11.113636,3.659091,3.659091
7,1845,Nickeil,Alexander-Walker,9.142857,3.095238,2.460317
8,2146,Naz,Reid,14.770492,5.754098,2.065574
9,2584,Anthony,Edwards,27.15,5.85,4.5


In [35]:
clippers_player_stats = querry_player_stats(16, 2024)
clippers_player_averages = find_player_averages(clippers_player_stats)
clippers_player_averages

Unnamed: 0,Player ID,First Name,Last Name,Points,Rebounds,Assists
0,40,Nicolas,Batum,3.133333,2.8,1.216667
1,152,Kris,Dunn,5.781818,3.472727,2.745455
2,216,James,Harden,21.186441,5.644068,8.389831
3,283,Derrick,Jones Jr.,10.216667,3.366667,0.8
4,314,Kawhi,Leonard,16.882353,4.764706,2.823529
5,365,Patty,Mills,2.0,0.0,0.6
6,434,Norman,Powell,23.142857,3.408163,2.163265
7,481,Ben,Simmons,8.0,5.75,5.5
8,575,Ivica,Zubac,14.704918,11.983607,2.409836
9,743,Bogdan,Bogdanovic,11.166667,1.666667,3.5


# Team Performance Metrics from NBA Franchises (2024 Season)

In [5]:
import requests
import pandas as pd
import time

base_url = "https://api-nba-v1.p.rapidapi.com"
headers = {
    "X-RapidAPI-Key": "fa660d5be7mshcf5c901c4fd3687p1ce7d1jsn118999c1f0fc",
    "X-RapidAPI-Host": "api-nba-v1.p.rapidapi.com"
}

def get_teams():
    """
    Retrieves the first 40 NBA teams and their respective divisions.
    
    Returns:
        teams (dict): A dictionary mapping team IDs to team names.
        divisions (dict): A dictionary mapping team IDs to their division names.
    """
    url = f"{base_url}/teams"
    response = requests.get(url, headers=headers).json()
    
    teams = {}
    divisions = {}
    
    for team in response["response"][:40]:
        teams[team["id"]] = team["name"]
        divisions[team["id"]] = team.get("leagues", {}).get("standard", {}).get("division", "Unknown")
    
    return teams, divisions

def get_team_stats(team_id, season="2024", stage=1):
    """
    Retrieves statistical data for a specific NBA team in a given season.
    
    Args:
        team_id (int): The unique identifier of the team.
        season (str): The season year for which statistics are retrieved (default is "2024").
        stage (int): The stage of the season (default is 1).
    
    Returns:
        dict: A dictionary containing key team statistics such as points per game, rebounds, assists, etc.
    """
    url = f"{base_url}/teams/statistics"
    params = {"id": team_id, "season": season, "stage": stage}
    stats_response = requests.get(url, headers=headers, params=params).json()
    
    stats = stats_response["response"][0]
    return {
        "Team ID": team_id,
        "Points per Game": stats.get("points", 0),
        "Rebounds per Game": stats.get("totReb", 0),
        "Field Goal Percentage": stats.get("fgp", 0),
        "Assists per Game": stats.get("assists", 0),
        "Plus-Minus": stats.get("plusMinus", 0),
        "Turnovers": stats.get("turnovers", 0),
        "Blocks": stats.get("blocks", 0),
        "Steals": stats.get("steals", 0),
    }

team_names, team_divisions = get_teams()

team_stats_list = []
# Limit to 15 teams and prompt individually
for team_id in list(team_names.keys())[:15]:
    input(f"\nPress Enter to fetch stats for {team_names[team_id]} (ID: {team_id})...")
    print(f"Fetching stats for {team_names[team_id]} (ID: {team_id})...")
    stats = get_team_stats(team_id, season="2024")
    stats["Team Name"] = team_names[team_id]
    stats["Division"] = team_divisions[team_id]
    team_stats_list.append(stats)
    time.sleep(1)

df = pd.DataFrame(team_stats_list)

df.to_csv("nba_team_stats_2024.csv", index=False)

print(df.head())
print(df.tail())


Press Enter to fetch stats for Atlanta Hawks (ID: 1)...
Fetching stats for Atlanta Hawks (ID: 1)...

Press Enter to fetch stats for Boston Celtics (ID: 2)...
Fetching stats for Boston Celtics (ID: 2)...

Press Enter to fetch stats for Brisbane Bullets (ID: 3)...
Fetching stats for Brisbane Bullets (ID: 3)...

Press Enter to fetch stats for Brooklyn Nets (ID: 4)...
Fetching stats for Brooklyn Nets (ID: 4)...

Press Enter to fetch stats for Charlotte Hornets (ID: 5)...
Fetching stats for Charlotte Hornets (ID: 5)...

Press Enter to fetch stats for Chicago Bulls (ID: 6)...
Fetching stats for Chicago Bulls (ID: 6)...

Press Enter to fetch stats for Cleveland Cavaliers (ID: 7)...
Fetching stats for Cleveland Cavaliers (ID: 7)...

Press Enter to fetch stats for Dallas Mavericks (ID: 8)...
Fetching stats for Dallas Mavericks (ID: 8)...

Press Enter to fetch stats for Denver Nuggets (ID: 9)...
Fetching stats for Denver Nuggets (ID: 9)...

Press Enter to fetch stats for Detroit Pistons (ID: 10