# DIVE Analysis: Risk & Strategy
### Samriddh Gupta
### Project: NBA Player Performance vs. Salary Analysis

**My Core Objective:** To look beyond the immediate "bang-for-buck" analysis and build a 24-month strategic roadmap for an NBA front office. My focus is on long-term value, identifying future franchise pillars, and mitigating contract risks to ensure sustained success.

--- 
## Part A: DISCOVER - Identifying the Strategic Landscape

**Goal:** To get a high-level view of the league's talent distribution, focusing on the intersection of age, performance, and value. This phase is about finding the initial signals of strategic opportunity and risk.

**Key Questions:**

1. Which young players (under 26) show the highest potential value (high `PS/$M` and strong negative `Rank Gap`)?

2. Which veteran players (over 30) are on large, potentially risky contracts?

3. What initial patterns emerge when plotting `Age` vs. `Performance Score` and `Age` vs. `PS/$M` across the league?

**Methodology & Analysis:**

* **Filtering:** Create subsets of the data for different age brackets (e.g., Under 23, 23-26, 27-30, Over 30).

* **Visualization:**

  * Create a scatter plot of `Age` vs. `Performance Score` to identify performance peaks.

  * Create a scatter plot of `Age` vs. `PS/$M` to find high-value, efficient players.

  * Generate a list of the top 15 players under 26 based on `PS/$M`. This will be my initial "Future Pillars Watchlist."

### Gemini Prompt Used: Discover
"I am the Risk & Strategy Analyst for a consulting project analyzing the NBA's 2022-2023 player dataset. My goal is to identify long-term strategic opportunities. I'm starting my DISCOVER phase by plotting player Age vs. Performance Score and Age vs. our value metric, 'PS/$M'.

Here is a summary of my initial data exploration and charts: [Paste initial findings, chart descriptions, or key data points here].

Based on this, help me:

Identify the 3-5 most significant strategic patterns or trends regarding age, performance, and value.
What are the most interesting initial insights that jump out from this data?
What key metrics should I track to identify future franchise pillars and high-risk contracts?"


In [1]:
# ==============================================================================
# Cell 1: Import Necessary Libraries
# ==============================================================================
import pandas as pd
import plotly.express as px
from google.cloud import bigquery
import warnings

# Ignore a common warning from the BigQuery library
warnings.filterwarnings('ignore', category=FutureWarning)

print("✅ Libraries imported successfully.")

✅ Libraries imported successfully.


In [2]:
# ==============================================================================
# Cell 2: Connect to BigQuery and Load Data
# ==============================================================================
# --- Configuration ---
# IMPORTANT: Authenticate in your terminal first by running:
# gcloud auth application-default login

project_id = 'mgmt599-project-carlorama-lab2'
dataset_name = 'nba_2023'
table_name = 'player_perf'
df = None # Initialize df to None

# --- Connect and Query ---
try:
    client = bigquery.Client(project=project_id)
    print(f"✅ Authenticated and connected to project: {project_id}")
    
    sql_query = f"SELECT * FROM `{project_id}.{dataset_name}.{table_name}`"
    df = client.query(sql_query).to_dataframe()
    
    print(f"✅ Successfully loaded {len(df)} rows from BigQuery.")
    display(df.head())

except Exception as e:
    print(f"❌ Authentication or query failed. Please check your setup.")
    print(f"Error: {e}")

✅ Authenticated and connected to project: mgmt599-project-carlorama-lab2




✅ Successfully loaded 896 rows from BigQuery.


Unnamed: 0,Rk,Player,Age,Team,Pos,G,GS,MP,FG,FGA,...,STL,BLK,TOV,PF,PTS,Awards,Player-additional,Season Type,PER,Adjusted Salary
0,1,Joel Embiid,28,PHI,C,66.0,66.0,34.6,11.0,20.1,...,1.0,1.7,3.4,3.1,33.1,MVP-1DPOY-9CPOY-5ASNBA1,embiijo01,Regular,31.4,35605377.25
1,2,Luka Dončić,23,DAL,PG,66.0,66.0,36.2,10.9,22.0,...,1.4,0.5,3.6,2.5,32.4,MVP-8CPOY-8ASNBA1,doncilu01,Regular,28.7,39290951.43
2,3,Damian Lillard,32,POR,PG,58.0,58.0,36.3,9.6,20.7,...,0.9,0.3,3.3,1.9,32.2,CPOY-10ASNBA3,lillada01,Regular,26.7,45006144.5
3,4,Shai Gilgeous-Alexander,24,OKC,PG,68.0,68.0,35.5,10.4,20.3,...,1.6,1.0,2.8,2.8,31.4,MVP-5CPOY-7ASNBA1,gilgesh01,Regular,27.2,32742459.52
4,5,Giannis Antetokounmpo,28,MIL,PF,63.0,63.0,32.1,11.2,20.3,...,0.8,0.8,3.9,3.1,31.1,MVP-3DPOY-6ASNBA1,antetgi01,Regular,29.0,45006144.5


In [3]:
# ==============================================================================
# Cell 3: Data Cleaning and Preparation
# ==============================================================================
if df is not None:
    # Standardize column names
    df.rename(columns={
        'Player': 'PLAYER',
        'Age': 'AGE',
        'Team': 'TEAM',
        'Pos': 'POS',
        'Adjusted Salary': 'SALARY_ADJUSTED',
        'TRB': 'REB' # Using TRB (Total Rebounds) for REB
    }, inplace=True)

    # Ensure key columns are numeric
    numeric_cols = ['AGE', 'PER', 'PTS', 'AST', 'REB', 'BLK', 'MP', 'SALARY_ADJUSTED']
    for col in numeric_cols:
        df[col] = pd.to_numeric(df[col], errors='coerce')
    
    # Drop rows with missing values in our key columns
    original_rows = len(df)
    df.dropna(subset=numeric_cols, inplace=True)
    
    print("✅ Data cleaning and preparation complete.")
    print(f"   {original_rows - len(df)} rows with missing data were removed.")
    display(df.head())

✅ Data cleaning and preparation complete.
   0 rows with missing data were removed.


Unnamed: 0,Rk,PLAYER,AGE,TEAM,POS,G,GS,MP,FG,FGA,...,STL,BLK,TOV,PF,PTS,Awards,Player-additional,Season Type,PER,SALARY_ADJUSTED
0,1,Joel Embiid,28,PHI,C,66.0,66.0,34.6,11.0,20.1,...,1.0,1.7,3.4,3.1,33.1,MVP-1DPOY-9CPOY-5ASNBA1,embiijo01,Regular,31.4,35605377.25
1,2,Luka Dončić,23,DAL,PG,66.0,66.0,36.2,10.9,22.0,...,1.4,0.5,3.6,2.5,32.4,MVP-8CPOY-8ASNBA1,doncilu01,Regular,28.7,39290951.43
2,3,Damian Lillard,32,POR,PG,58.0,58.0,36.3,9.6,20.7,...,0.9,0.3,3.3,1.9,32.2,CPOY-10ASNBA3,lillada01,Regular,26.7,45006144.5
3,4,Shai Gilgeous-Alexander,24,OKC,PG,68.0,68.0,35.5,10.4,20.3,...,1.6,1.0,2.8,2.8,31.4,MVP-5CPOY-7ASNBA1,gilgesh01,Regular,27.2,32742459.52
4,5,Giannis Antetokounmpo,28,MIL,PF,63.0,63.0,32.1,11.2,20.3,...,0.8,0.8,3.9,3.1,31.1,MVP-3DPOY-6ASNBA1,antetgi01,Regular,29.0,45006144.5


In [4]:
# ==============================================================================
# Cell 4: Calculate Custom Strategic Metrics
# ==============================================================================
if df is not None:
    print("Calculating custom metrics...")

    # Calculate Performance Score
    df['Performance_Score'] = df['PER'] + df['PTS'] + df['AST'] + df['REB'] + df['BLK'] + (df['MP'] / 5)

    # Calculate Salary and Performance Ranks
    df['Salary_Rank'] = df['SALARY_ADJUSTED'].rank(ascending=False, method='min')
    df['Performance_Rank'] = df['Performance_Score'].rank(ascending=False, method='min')

    # Calculate Rank Gap
    df['Rank_Gap'] = df['Salary_Rank'] - df['Performance_Rank']

    # Calculate Performance Score per Million Dollars (PS/$M)
    df['PS_per_Million'] = df.apply(
        lambda row: row['Performance_Score'] / (row['SALARY_ADJUSTED'] / 1_000_000) if row['SALARY_ADJUSTED'] > 0 else 0,
        axis=1
    )

    print("✅ DataFrame with custom metrics created successfully.")

    # --- Display Table 1: Custom Metrics Overview ---
    print("\n--- Custom Metrics Overview ---")
    display(df[['PLAYER', 'AGE', 'Performance_Score', 'Rank_Gap', 'PS_per_Million']].head())

Calculating custom metrics...
✅ DataFrame with custom metrics created successfully.

--- Custom Metrics Overview ---


Unnamed: 0,PLAYER,AGE,Performance_Score,Rank_Gap,PS_per_Million
0,Joel Embiid,28,87.52,49.0,2.458056
1,Luka Dončić,23,85.44,34.0,2.174546
2,Damian Lillard,32,78.56,6.0,1.745539
3,Shai Gilgeous-Alexander,24,77.0,56.0,2.351686
4,Giannis Antetokounmpo,28,84.82,9.0,1.884632


In [5]:
# ==============================================================================
# Cell 5: Chart 1 - Age vs. Performance Score
# ==============================================================================
if df is not None:
    print("Generating visualization: Age vs. Performance Score...")

    fig_age_vs_perf = px.scatter(
        df,
        x='AGE',
        y='Performance_Score',
        hover_name='PLAYER',
        title='Age vs. Performance Score (On-Court Contribution)',
        labels={'AGE': 'Player Age', 'Performance_Score': 'Calculated Performance Score'},
        template='plotly_white'
    )
    fig_age_vs_perf.show()

Generating visualization: Age vs. Performance Score...


In [6]:
# ==============================================================================
# Cell 6: Chart 2 - Age vs. Value (PS/$M)
# ==============================================================================
if df is not None:
    print("Generating visualization: Age vs. Value...")

    fig_age_vs_value = px.scatter(
        df,
        x='AGE',
        y='PS_per_Million',
        hover_name='PLAYER',
        title='Age vs. Value (Performance Score per $1M Salary)',
        labels={'AGE': 'Player Age', 'PS_per_Million': 'Performance Score per $1M'},
        template='plotly_white'
    )
    fig_age_vs_value.show()

Generating visualization: Age vs. Value...


In [7]:
# ==============================================================================
# Cell 7: Generate and Display the "Future Pillars Watchlist"
# ==============================================================================
if df is not None:
    print("Generating 'Future Pillars Watchlist'...")

    # Filter for players under 26 who have played significant minutes (> 15 MPG)
    young_players_df = df[(df['AGE'] < 26) & (df['MP'] > 15)].copy()

    # Sort by the value metric to find the most efficient young players
    future_pillars_watchlist = young_players_df.sort_values(by='PS_per_Million', ascending=False)

    # --- Display Table 2: Top 15 Future Pillar Candidates ---
    print("\n--- Top 15 Future Pillar Candidates (Under 26, >15 MPG) ---")
    display(future_pillars_watchlist[['PLAYER', 'AGE', 'TEAM', 'Performance_Score', 'PS_per_Million', 'Rank_Gap']].head(15))

Generating 'Future Pillars Watchlist'...

--- Top 15 Future Pillar Candidates (Under 26, >15 MPG) ---


Unnamed: 0,PLAYER,AGE,TEAM,Performance_Score,PS_per_Million,Rank_Gap
94,RaiQuan Gray,23,BRK,55.5,8958.837772,818.0
581,Jacob Gilyard,24,MEM,29.5,4761.904762,464.0
191,Jeenathan Williams,23,POR,36.58,656.047356,601.0
506,Justin Minaya,23,POR,18.86,507.369638,185.0
105,Skylar Mays,25,POR,52.7,426.824468,782.0
149,Mac McClung,24,PHI,45.3,265.889588,698.0
62,Louis King,23,PHI,53.1,163.256562,763.0
574,Jamaree Bouyea,23,MIA,17.66,142.52702,149.0
228,Julian Champagnie,21,SAS,33.98,63.043308,495.0
226,Julian Champagnie,21,2TM,31.26,57.996875,435.0


### Key Findings from Discover Phase


Here is a summary of my initial data exploration and charts:

1.  **Age vs. Performance Score Chart:** My analysis shows that raw on-court performance (`Performance Score`) tends to peak for players between the ages of **25 and 28**. There's a wide distribution, but the highest concentration of elite performers falls within this window. Performance generally declines for players older than **32**.

2.  **Age vs. Value (PS/$M) Chart:** The story for *value* is completely different. The chart clearly shows that the highest value players—those who deliver the most performance for their salary—are almost all **under the age of 25**. There are several extreme outliers in the 23-24 age range who provide immense value (`PS/$M` > 4,000), while value drops off dramatically after age 26. This suggests a critical strategic window for identifying and securing talent *before* they reach their peak performance and command a maximum salary.

3.  **Future Pillars Watchlist:** My initial list of the top 15 most valuable players under 26 reveals several interesting candidates. The list is populated with players on smaller, likely rookie-scale or two-way contracts (e.g., RaiQuan Gray, Jacob Gilyard, Jeenathan Williams). A key finding is the presence of **Austin Reaves** at #14. He is a known, high-quality rotation player for a major market team, which validates that our `PS/$M` metric is effective at identifying legitimate talent, not just statistical noise from low-minute players.

## Part B: INVESTIGATE - Probing Deeper into Value and Risk

**Goal:** To understand the "why" behind the signals found in the Discover phase by analyzing player durability, role significance, and positional value to separate legitimate strategic targets from statistical noise.

**Key Questions:**

1. For the "Future Pillars Watchlist," is their high value (PS/$M) supported by significant on-court roles (MP) and durability (G), or is it an anomaly of a small contract?

2. For high-cost veterans, does their large Rank Gap correspond with low durability (G), indicating compounded risk from both underperformance and injury potential?

3. How does the value landscape change when segmented by position, and who are the most undervalued players within each role (Guard, Forward, Center)?

**Methodology & Analysis:**

* **Durability & Role Analysis:**
    * For the "Future Pillars Watchlist," analyze their MP (Minutes Played) and G (Games Played) to assess if their role is substantial enough to be considered a core asset.
    * For the "High-Risk Veterans" list, examine their G to identify players whose performance risk is magnified by availability issues.

* **Positional Deep-Dive:**

    * Group players into three core roles: Guard, Forward, and Center.
    * Calculate the median `PS/$M` and `Rank Gap` for each group to establish a positional value baseline.
    * Identify the top 5 most undervalued players (lowest Rank Gap) within each position group to create a "Bargain Watchlist."

* **Hypothesis Formation:** Based on the investigation, formulate specific, testable hypotheses that will be challenged in the Validate phase. For example: "Players aged 24-26 with a `PS/$M` in the top quartile represent optimal extension targets," and "Centers over 32 with salaries above $20M present the highest risk."

### Gemini Prompt: Investigate
"In my analysis, I've identified a pattern: players aged 24-26 seem to offer the best value. I've also created a 'Future Pillars Watchlist' of promising young players.

Now, help me investigate the 'why':
1. What are the 3 most likely business or performance factors that explain this pattern?
2. For the players on my watchlist, what specific data points (e.g., year-over-year performance trends, minutes played, injury history) should I analyze to determine if their high value is sustainable?
3. How can I formulate a testable hypothesis about identifying optimal contract extension targets?"

In [8]:
# ==============================================================================
# Cell 8: Investigation - High-Risk Veteran Contracts
# ==============================================================================
if 'df' in locals():
    print("\n--- INVESTIGATION 2: Identifying potential high-risk veteran contracts ---")
    
    # Define "high-cost" as the top 25% of salaries and "veteran" as age > 30
    high_salary_threshold = df['SALARY_ADJUSTED'].quantile(0.75)
    high_cost_veterans_df = df[(df['AGE'] > 30) & (df['SALARY_ADJUSTED'] >= high_salary_threshold)].copy()
    
    # Sort by Rank_Gap to find players whose salary rank most exceeds their performance rank
    risky_veterans = high_cost_veterans_df.sort_values(by='Rank_Gap', ascending=False)
    
    # --- Display Table 4: Potential High-Risk Veteran Contracts ---
    print(f"\n--- Potential High-Risk Veterans (Age > 30 & Salary > ${high_salary_threshold:,.2f}) ---")
    # INSIGHT: Players at the top of this list are being paid like stars but performing below that level.
    display(risky_veterans[['PLAYER', 'AGE', 'TEAM', 'SALARY_ADJUSTED', 'Performance_Score', 'Rank_Gap']].head(10))


--- INVESTIGATION 2: Identifying potential high-risk veteran contracts ---

--- Potential High-Risk Veterans (Age > 30 & Salary > $12,134,971.13) ---


Unnamed: 0,PLAYER,AGE,TEAM,SALARY_ADJUSTED,Performance_Score,Rank_Gap
708,Brook Lopez,34,MIL,14729646.15,54.08,96.0
95,Brook Lopez,34,MIL,14729646.15,50.88,61.0
78,Nikola Vučević,32,CHI,23301414.73,58.3,40.0
27,DeMar DeRozan,33,CHI,28914937.37,62.54,37.0
46,James Harden,33,PHI,34952122.09,67.26,20.0
40,Bojan Bogdanović,33,DET,20706484.45,51.92,16.0
679,Kawhi Leonard,31,LAC,45006144.5,85.4,11.0
689,Jimmy Butler,33,MIA,39880689.05,71.84,10.0
150,Kelly Olynyk,31,UTA,13562353.31,43.02,9.0
2,Damian Lillard,32,POR,45006144.5,78.56,6.0


In [9]:
# ==============================================================================
# Cell 9: Investigation - Positional Value Analysis
# ==============================================================================
if 'df' in locals():
    print("\n--- INVESTIGATION 3: Analyzing value by player position ---")
    
    # Clean up the 'POS' column to handle multi-position players (e.g., 'SF-SG' -> 'SF')
    df['POS_Primary'] = df['POS'].apply(lambda x: str(x).split('-')[0])

    # Calculate the median value and performance for each primary position
    positional_analysis = df.groupby('POS_Primary').agg(
        Median_Performance_Score=('Performance_Score', 'median'),
        Median_PS_per_Million=('PS_per_Million', 'median'),
        Player_Count=('PLAYER', 'count')
    ).reset_index()

    # Filter out positions with very few players and sort by value
    positional_analysis = positional_analysis[positional_analysis['Player_Count'] > 5]
    positional_analysis = positional_analysis.sort_values(by='Median_PS_per_Million', ascending=False)

    # --- Display Table 5: Median Value by Position ---
    print("\n--- Median Value by Position ---")
    display(positional_analysis[['POS_Primary', 'Median_PS_per_Million', 'Median_Performance_Score', 'Player_Count']])


--- INVESTIGATION 3: Analyzing value by player position ---

--- Median Value by Position ---


Unnamed: 0,POS_Primary,Median_PS_per_Million,Median_Performance_Score,Player_Count
0,C,5.799941,31.22,182
3,SF,5.651699,26.38,179
4,SG,5.021841,27.12,217
1,PF,4.93678,29.63,164
2,PG,4.021859,30.4,154


In [10]:
# ==============================================================================
# Cell 10: Investigation - Positional Value Chart
# ==============================================================================
# This cell uses the 'positional_analysis' DataFrame created in the previous cell.

if 'positional_analysis' in locals():
    print("\n--- Visualizing Positional Value ---")
    
    # --- Chart 3: Median Value by Position ---
    fig_pos_value = px.bar(
        positional_analysis,
        x='POS_Primary',
        y='Median_PS_per_Million',
        title='Median Value (PS/$M) by Player Position',
        labels={'POS_Primary': 'Player Position', 'Median_PS_per_Million': 'Median Performance Score per $1M'},
        template='plotly_white'
    )
    fig_pos_value.show()
    print("\n--- Investigation Phase Complete ---")


--- Visualizing Positional Value ---



--- Investigation Phase Complete ---


### Insights from Investigate Phase:
#### Investigation 1: High-Risk Veteran Contracts

This analysis identifies veteran players (age 30+) in the top quartile of league salaries who may be underperforming their contracts. The key metric here is **Rank Gap**, where a large positive number is a major red flag 🚩.

**Key Insights:**

* **Brook Lopez (Age 34, MIL):** Lopez appears twice at the top of the list with significant **Rank Gaps (96.0 and 61.0)**. This indicates a substantial mismatch between his salary and his output as measured by your `Performance Score`. While a valuable defender, his contract appears inefficient from a purely statistical performance standpoint. He is a prime candidate for a deeper look in the **Validate** phase to see if his defensive impact justifies this gap.

* **Established All-Stars:** The list includes several high-profile players like **DeMar DeRozan (CHI)**, **James Harden (PHI)**, **Kawhi Leonard (LAC)**, and **Damian Lillard (POR)**. While their performance is still high, their massive salaries (often exceeding $30-40M) mean that even a slight dip in performance creates a significant negative value gap.

* **Kawhi Leonard & Damian Lillard:** These two are particularly noteworthy. They have the highest salaries on this list (over $45M) and relatively low **Rank Gaps (11.0 and 6.0)**, which means their performance is still close to their salary rank. However, their age and any potential for injury make their massive contracts a significant strategic risk for their teams moving forward.

#### Investigation 2: Positional Value Analysis

This table reveals the median value (`PS_per_Million`) and performance (`Performance_Score`) for each primary position group. This gives us a crucial baseline for understanding the market.

**Key Insights:**

* **Centers (C) Provide the Best Value:** On average, **Centers** provide the highest value, with a median `PS_per_Million` of **5.800**. This is likely because many valuable defensive and rebounding contributions from centers don't always translate into the massive salaries commanded by elite guards and forwards. Finding a high-value Center is a strategic advantage, but it's also the market expectation.

* **Point Guards (PG) are the Most Expensive:** **Point Guards** have the lowest median value (`PS_per_Million` of **4.022**), despite having a high median `Performance_Score`. This indicates that the market pays a significant premium for primary ball-handlers and playmakers. An undervalued, efficient Point Guard is therefore one of the rarest and most valuable assets in the league.

* **Forwards Offer a Balanced Profile:** Small Forwards (SF) and Power Forwards (PF) sit in the middle, offering a blend of solid performance and value. This suggests that the market for forwards is relatively efficient.

--- 
## Part C: VALIDATE - Stress-Testing Our Strategic Assumptions

**Goal:** To rigorously challenge my own findings and assumptions to build a confident, defensible strategy. This directly addresses the feedback on my previous assignment by making assumption-checking a core part of the process.

### Gemini Prompt: Validate
"My analysis suggests that we should prioritize signing Austin Reaves to a long-term extension. My key assumptions are: 1) His performance will continue to improve or at least maintain its current level, and 2) He will remain healthy, playing over 70 games per season.

Act as a skeptical 'devil's advocate' and help me validate these findings:
1. What could invalidate my conclusions? What are the biggest risks I'm not seeing?
2. What specific tests or sensitivity analyses should I run to check the robustness of my recommendation?
3. How can I model a 'worst-case scenario' to understand the potential downside of this contract extension?"

In [11]:
# ===============================================================================
# Part C: VALIDATE - Python Code
# ===============================================================================
# This cell assumes the DataFrame 'df' from the previous phases is available.

# Ensure the DataFrame 'df' exists
if 'df' not in locals():
    print("❌ DataFrame 'df' not found. Please run the Discover & Investigate cells first.")
else:
    print("--- Validation Phase Started ---")

    # --- 1. Assumption Testing (Durability) ---
    print("\n1. Assumption Test: Durability of 'Future Pillar' Austin Reaves")
    
    # Isolate our test case
    austin_reaves_stats = df[df['PLAYER'] == 'Austin Reaves'].iloc[0]
    games_played = austin_reaves_stats['G']
    durability_threshold = 70 # An 85% season is ~70 games played

    is_durable = games_played >= durability_threshold
    
    print(f"Test Case: Austin Reaves")
    print(f"Games Played: {games_played}")
    print(f"Durability Threshold (>= 70 games): {durability_threshold}")
    print(f"Passes Durability Assumption: {is_durable}")
    if not is_durable:
        print("Result: This slightly weakens the case for a max-level investment without performance/games played incentives.")
    else:
        print("Result: The durability assumption holds, strengthening the case for a long-term investment.")


    # --- 2. Sensitivity Analysis (Metric Robustness) ---
    print("\n\n2. Sensitivity Analysis: Testing Alternative Performance Score Formulas")

    # Create an offense-focused and a defense-focused score
    df['Perf_Score_OFFENSE'] = (df['PER'] * 1.0) + (df['PTS'] * 1.2) + (df['AST'] * 1.2) + (df['REB'] * 0.8) + (df['BLK'] * 0.8) + (df['MP'] / 5)
    df['Perf_Score_DEFENSE'] = (df['PER'] * 1.0) + (df['PTS'] * 0.8) + (df['AST'] * 0.8) + (df['REB'] * 1.2) + (df['BLK'] * 1.2) + (df['MP'] / 5)

    # Recalculate PS per Million for these new scores
    df['PS_per_Million_OFFENSE'] = df.apply(lambda row: row['Perf_Score_OFFENSE'] / (row['SALARY_ADJUSTED'] / 1_000_000) if row['SALARY_ADJUSTED'] > 0 else 0, axis=1)
    df['PS_per_Million_DEFENSE'] = df.apply(lambda row: row['Perf_Score_DEFENSE'] / (row['SALARY_ADJUSTED'] / 1_000_000) if row['SALARY_ADJUSTED'] > 0 else 0, axis=1)

    # Check how key players' VALUE ranks change
    df['Value_Rank_Original'] = df['PS_per_Million'].rank(ascending=False)
    df['Value_Rank_Offense'] = df['PS_per_Million_OFFENSE'].rank(ascending=False)
    df['Value_Rank_Defense'] = df['PS_per_Million_DEFENSE'].rank(ascending=False)
    
    # Display the results for key players
    validation_players_df = df[df['PLAYER'].isin(['Austin Reaves', 'Brook Lopez'])].copy()
    print("\n--- Value Rank Comparison for Key Players ---")
    display(validation_players_df[['PLAYER', 'Value_Rank_Original', 'Value_Rank_Offense', 'Value_Rank_Defense']])
    print("\nResult: If a player remains a top-tier value regardless of the formula, our recommendation is robust.")


    # --- 3. Scenario Modeling (Worst-Case Scenario) ---
    print("\n\n3. Scenario Modeling: 'Post-Contract Slump' for Austin Reaves")
    
    # ==================================================================
    # THIS IS THE FIX: Create the 'POS_Group' column before using it.
    # ==================================================================
    pos_map = {
        'PG': 'Guard', 'SG': 'Guard',
        'SF': 'Forward', 'PF': 'Forward',
        'C': 'Center'
    }
    df['POS_Group'] = df['POS_Primary'].map(pos_map)
    print("Helper column 'POS_Group' created.")
    # ==================================================================
    
    original_performance = austin_reaves_stats['Performance_Score']
    slump_performance = original_performance * 0.85 # 15% drop
    
    # Calculate his PS/$M in this slump scenario
    salary_in_millions = austin_reaves_stats['SALARY_ADJUSTED'] / 1_000_000
    slump_ps_per_million = slump_performance / salary_in_millions
    
    # Compare his "slump" value to the league's median value for his position
    median_guard_value = df[df['POS_Group'] == 'Guard']['PS_per_Million'].median()

    print(f"\nOriginal Performance Score: {original_performance:.2f}")
    print(f"Slump Performance Score (-15%): {slump_performance:.2f}")
    print(f"Original PS/$M: {austin_reaves_stats['PS_per_Million']:.2f}")
    print(f"Slump PS/$M: {slump_ps_per_million:.2f}")
    print(f"Median PS/$M for Guards: {median_guard_value:.2f}")

    if slump_ps_per_million >= median_guard_value:
        print("\nResult: Even with a 15% performance drop, his value would still be above the median for his position. This is a very strong signal for a safe investment.")
    else:
        print("\nResult: A 15% performance drop would make him a below-average value asset. This suggests the contract needs to have team-friendly options.")

    print("\n--- Validation Phase Complete ---")

--- Validation Phase Started ---

1. Assumption Test: Durability of 'Future Pillar' Austin Reaves
Test Case: Austin Reaves
Games Played: 64.0
Durability Threshold (>= 70 games): 70
Passes Durability Assumption: False
Result: This slightly weakens the case for a max-level investment without performance/games played incentives.


2. Sensitivity Analysis: Testing Alternative Performance Score Formulas

--- Value Rank Comparison for Key Players ---


Unnamed: 0,PLAYER,Value_Rank_Original,Value_Rank_Offense,Value_Rank_Defense
95,Brook Lopez,569.0,570.0,566.0
136,Austin Reaves,76.0,74.0,78.0
708,Brook Lopez,550.0,551.0,551.0
716,Austin Reaves,65.0,63.0,66.0



Result: If a player remains a top-tier value regardless of the formula, our recommendation is robust.


3. Scenario Modeling: 'Post-Contract Slump' for Austin Reaves
Helper column 'POS_Group' created.

Original Performance Score: 40.36
Slump Performance Score (-15%): 34.31
Original PS/$M: 24.37
Slump PS/$M: 20.72
Median PS/$M for Guards: 4.65

Result: Even with a 15% performance drop, his value would still be above the median for his position. This is a very strong signal for a safe investment.

--- Validation Phase Complete ---


### Confidence Assessment from Validate Phase

Our validation tests provide a clear, nuanced picture of the recommendation to sign Austin Reaves to a long-term extension.

**1. Durability Risk - A Point for Negotiation:**

* **Finding:** Our assumption test revealed that Austin Reaves, having played **64 games**, did not meet the rigorous 70-game threshold for high durability.

* **Conclusion:** This does not invalidate him as a target, but it introduces a measurable risk. The recommendation should be updated to include performance-based incentives tied to games played. This mitigates the team's risk while still rewarding the player for being available.

**2. Metric Robustness - High Confidence:**

* **Finding:** The sensitivity analysis showed that Reaves' value rank is extremely stable, remaining in the top tier regardless of whether the `Performance Score` formula emphasizes offense or defense.

* **Conclusion:** This gives us **high confidence** that his value is legitimate and not a statistical fluke of our specific model. His on-court contribution is versatile and consistently high.

**3. Worst-Case Scenario - Very High Confidence:**

* **Finding:** The scenario model demonstrated that even with a significant 15% drop in performance, Reaves' value (`PS/$M` of 20.72) would still be over **4 times higher** than the median value for his position (4.65).

* **Conclusion:** This is our strongest finding. The contract has an incredibly high floor and low financial risk. It validates that signing him is a safe, high-upside strategic move.

**Overall Strategic Recommendation:**
Proceed with offering a long-term extension to Austin Reaves. The data strongly supports this decision. However, the negotiation strategy should leverage the durability finding (64 games played) to push for team-friendly contract structures that include games-played or performance-based bonuses.

--- 
## Part D: EXTEND - Building the Actionable 24-Month Roadmap

**Goal:** To translate my validated insights into a clear, prioritized, and actionable strategic plan that a GM can implement. This directly addresses the feedback for "more actionable ideas" and "quantified success metrics."

### Gemini Prompt: Extend
"Based on my validated findings, our strategy should focus on extending young, high-value players like RaiQuan Gray, and Jacob Gilyard while seeking to trade or not re-sign high-cost, declining veterans like Brook Lopez and DeMar DeRozan.

Help me develop a comprehensive, actionable strategic plan for an NBA GM:
1. Create three strategic initiatives with specific KPIs (e.g., 'Initiative 1: Secure Future Pillars').
2. Develop a 24-month implementation timeline with key decision points for each off-season.
3. For my top recommendation (extending Player X), help me calculate the potential ROI and frame it as a compelling business case.
4. What are the primary risks associated with this overall strategy, and what are the mitigation steps?"

In [12]:
# ===============================================================================
# Part D: EXTEND - Python Code
# ===============================================================================
# This cell assumes the DataFrame 'df' from the previous phases is available.

# Ensure the DataFrame 'df' exists
if 'df' not in locals():
    print("DataFrame 'df' not found. Please run the previous phase code cells first.")
else:
    print("--- Extend Phase Started ---")

    # --- 1. Strategic Categorization ---
    # Goal: Assign a clear strategic action to each player based on our validated findings.
    print("\n1. Categorizing players into strategic buckets...")

    def assign_strategy(row):
        # Priority Extensions: Young, high-value, proven role players
        if row['PLAYER'] == 'Austin Reaves' or (row['AGE'] < 26 and row['PS_per_Million'] > 25 and row['MP'] > 20):
            return 'Priority Extension'
        
        # Trade Candidates: Expensive, older, underperforming their contract
        if row['AGE'] > 30 and row['Rank_Gap'] > 40:
            return 'Trade Candidate'
            
        # Monitor & Evaluate: High-potential young players with smaller roles
        if row['AGE'] < 25 and row['PS_per_Million'] > 100:
            return 'Monitor & Evaluate'
        
        # Do Not Re-Sign: Older players with low performance scores
        if row['AGE'] > 32 and row['Performance_Score'] < 30:
            return 'Do Not Re-Sign'
            
        # Core Players: Everyone else who is performing as expected for their contract
        return 'Core Player'

    df['Strategy'] = df.apply(assign_strategy, axis=1)

    # Display the strategic categorization for key players mentioned
    key_players = ['Austin Reaves', 'RaiQuan Gray', 'Jacob Gilyard', 'Brook Lopez', 'Nikola Vučević']
    strategy_df = df[df['PLAYER'].isin(key_players)]

    print("\n--- Strategic Plan for Key Players ---")
    print(strategy_df[['PLAYER', 'AGE', 'TEAM', 'Strategy', 'Performance_Score', 'Rank_Gap', 'PS_per_Million']])


    # --- 2. ROI & Impact Quantification ---
    # Goal: Quantify the financial impact of our top recommendation.
    print("\n\n2. ROI Calculation for Extending Austin Reaves")

    # Assumptions for ROI calculation
    austin_reaves_stats = df[df['PLAYER'] == 'Austin Reaves'].iloc[0]
    current_salary = austin_reaves_stats['SALARY_ADJUSTED']
    
    # Let's project his market value if he were a free agent.
    # A conservative estimate would be to find a player with a similar Performance Score and see their salary.
    similar_perf_score = austin_reaves_stats['Performance_Score']
    # Find players with a similar score (+/- 5%)
    similar_players = df[(df['Performance_Score'].between(similar_perf_score * 0.95, similar_perf_score * 1.05)) & (df['PLAYER'] != 'Austin Reaves')]
    projected_market_salary = similar_players['SALARY_ADJUSTED'].median()
    
    # Proposed new contract (example)
    proposed_new_salary = 20_000_000 # $20M per year

    # Calculate ROI
    value_created_per_year = projected_market_salary - proposed_new_salary
    roi_percentage = (value_created_per_year / proposed_new_salary) * 100

    print(f"Current Salary: ${current_salary:,.2f}")
    print(f"Projected Market Salary (based on peers): ${projected_market_salary:,.2f}")
    print(f"Proposed New Salary: ${proposed_new_salary:,.2f}")
    print(f"Value Created Per Year by Extending Early: ${value_created_per_year:,.2f}")
    print(f"Estimated ROI on new contract: {roi_percentage:.2f}%")
    print("\nThis provides a quantifiable business case for the contract extension.")


    # --- 3. Strategic Roadmap Visualization ---
    # Goal: Create a clear timeline for the front office.
    print("\n\n3. 24-Month Strategic Roadmap")
    
    roadmap_data = {
        'Timeline': [
            'Next 30 Days', 'Next 30 Days',
            'This Off-Season (1-6 months)', 'This Off-Season (1-6 months)',
            'Next Season (6-18 months)', 'Next Season (6-18 months)',
            'Next Off-Season (18-24 months)'
            ],
        'Action': [
            'Initiate extension talks with Austin Reaves.',
            'Begin exploring trade market for Brook Lopez.',
            'Sign Austin Reaves to 4-year extension.',
            'Aggressively shop high-risk veterans like Nikola Vucevic.',
            'Evaluate development of "Monitor & Evaluate" players.',
            'Give increased minutes to top-performing young assets.',
            'Make decisions on team options for young players.'
            ],
        'Strategic Goal': [
            'Lock in value before it becomes more expensive.',
            'Gauge interest and potential returns.',
            'Secure a core pillar for the future.',
            'Free up salary cap space and acquire assets.',
            'Identify next wave of "Future Pillars".',
            'Maximize on-court production from value contracts.',
            'Retain high-performers, move on from others.'
            ]
    }
    roadmap_df = pd.DataFrame(roadmap_data)

    # For better display in a notebook, we can print it as a markdown table
    print(roadmap_df.to_markdown(index=False))

    print("\n--- Extend Phase Complete ---")


--- Extend Phase Started ---

1. Categorizing players into strategic buckets...

--- Strategic Plan for Key Players ---
             PLAYER  AGE TEAM            Strategy  Performance_Score  \
78   Nikola Vučević   32  CHI         Core Player              58.30   
94     RaiQuan Gray   23  BRK  Priority Extension              55.50   
95      Brook Lopez   34  MIL     Trade Candidate              50.88   
136   Austin Reaves   24  LAL  Priority Extension              40.36   
581   Jacob Gilyard   24  MEM  Priority Extension              29.50   
708     Brook Lopez   34  MIL     Trade Candidate              54.08   
716   Austin Reaves   24  LAL  Priority Extension              48.74   

     Rank_Gap  PS_per_Million  
78       40.0        2.501994  
94      818.0     8958.837772  
95       61.0        3.454258  
136     527.0       24.371859  
581     464.0     4761.904762  
708      96.0        3.671507  
716     620.0       29.432221  


2. ROI Calculation for Extending Austin Reave

### Strategic Recommendations from Extend Phase

Based on our validated findings, here is a comprehensive strategic plan designed to maximize the team's competitive window by reallocating capital from inefficient contracts to high-value, emerging talent.

--- 
#### 1. Three Strategic Initiatives

**Initiative 1: Secure Future Pillars on High-Value Contracts**

* **What:** Immediately prioritize long-term contract extensions for **Austin Reaves**, **RaiQuan Gray**, and **Jacob Gilyard**.
* **Why:** Our analysis proves these players are delivering elite-level value (`PS/$M`). Signing them now, before they enter their prime performance years (25-28), allows the team to lock in core contributors at a cost significantly below their future market value. This is the most effective way to build a sustainable and flexible payroll.
* **KPIs:**
    * Sign at least two of the three priority players to 3+ year extensions before the next season begins.
    * Ensure the average annual value (AAV) of these new contracts is at least 20% below their projected market salary in two years.

**Initiative 2: Aggressively Reallocate Capital from High-Risk Contracts**

* **What:** Actively explore trade scenarios for **Brook Lopez**.
* **Why:** This player has been identified as a "Trade Candidate" due to his high salary combined with a significant negative `Rank Gap`. His contract represents inefficient capital allocation. Trading him, even for less than his original perceived value, frees up crucial salary cap space that can be reinvested into our "Future Pillars."
* **KPIs:**
    * Reduce the team's total salary commitment to players over 32 by 25% within 12 months.
    * Execute at least one trade involving a high-risk veteran that returns either a young asset or future draft capital.

**Initiative 3: Cultivate a Continuous Value Pipeline**

* **What:** Increase the playing time and responsibility for players in the "Monitor & Evaluate" category.
* **Why:** A championship team needs a constant influx of low-cost, productive talent. By giving these high-potential players more minutes, we can assess their development in real-game situations and identify the next wave of "Future Pillars" before we have to pay them market rates.
* **KPIs:**
    * Increase the average minutes per game for the "Monitor & Evaluate" cohort by 15% over the next season.
    * Successfully transition at least one player from "Monitor & Evaluate" to "Core Player" status within 18 months.

--- 
#### 2. Implementation Timeline (24-Month Roadmap)

This roadmap, generated directly from our analysis, provides a clear, step-by-step action plan.

| Timeline                       | Action                                                 | Strategic Goal                                 |
| :----------------------------- | :----------------------------------------------------- | :--------------------------------------------- |
| Next 30 Days                   | Initiate extension talks with Austin Reaves.           | Lock in value before it becomes more expensive. |
| Next 30 Days                   | Begin exploring trade market for Brook Lopez.          | Gauge interest and potential returns.          |
| This Off-Season (1-6 months)   | Sign Austin Reaves to 4-year extension.                | Secure a core pillar for the future.           |
| This Off-Season (1-6 months)   | Aggressively shop high-risk veterans like Nikola Vucevic. | Free up salary cap space and acquire assets.   |
| Next Season (6-18 months)      | Evaluate development of "Monitor & Evaluate" players.  | Identify next wave of "Future Pillars".        |
| Next Season (6-18 months)      | Give increased minutes to top-performing young assets. | Maximize on-court production from value contracts. |
| Next Off-Season (18-24 months) | Make decisions on team options for young players.      | Retain high-performers, move on from others.   |

--- 
#### 3. Business Case: The ROI of Extending Austin Reaves

The financial case for extending Austin Reaves is compelling, but the details matter.

* __The Flaw in a Simple ROI:__ Your analysis shows that signing him to a **\$20M/year** contract would result in a __negative 49.49% ROI__ because his peers with similar *current* performance earn only about **\$10.1M**. Paying him \$20M today would be a massive overpay.
* **The Real Strategic ROI:** The true value is not in paying him his current market rate, but in signing him to a contract that will be a bargain *in the future*.
    * **Actionable Recommendation:** Offer Austin Reaves a 4-year contract starting at **\$12-14M per year**.
    * **Quantified Impact:** This is slightly above his current peer-based market value **(~\$10.1M)**, rewarding him for his performance. However, as he enters his prime and his performance improves, his market value could easily exceed **\$20-25M**. By signing him early, the team could be saving **\$8-10M per year** in the final two years of the deal, creating **\$16-20M in surplus value** over the life of the contract.

--- 
#### 4. Risk Mitigation

* **Risk:** Young players may not develop as projected (The "One-Hit Wonder" risk).
    * **Mitigation:** Structure new contracts with team-friendly options or performance-based incentives rather than fully guaranteed money. This protects the team from a player's potential stagnation.
* **Risk:** It may be difficult to trade expensive veterans without attaching assets (like draft picks).
    * **Mitigation:** Broaden the scope of trade discussions to include multi-team deals. Be willing to accept a lower return (e.g., second-round picks instead of a first) to achieve the primary strategic goal of freeing up cap space.