# Part 2: Exploratory Data Analysis

This notebook explores the statistical properties of the constructed playerâ€“game
dataset and evaluates whether simple contextual signals appear informative
before any modeling is introduced.

Specifically, we investigate:

1. The distribution of player scoring relative to season averages
2. Whether recent performance streaks are associated with over/under outcomes
3. Whether opponent defensive context (points allowed) meaningfully shifts
   over probabilities

This analysis is descriptive in nature and is intended to validate assumptions and guide downstream experimentation rather than produce predictive models. We'll start by loading in the analysis dataset:

In [11]:
from sqlalchemy import create_engine
import pandas as pd

# Database connection
engine = create_engine(
    "postgresql+psycopg2://admin:admin@localhost:5433/nba_db"
)

# Load dataset from SQL file
with open("sql/dataset_construction.sql", "r") as f:
    dataset_query = f.read()

analysis_dataset = pd.read_sql(dataset_query, engine)

jamal_murray = analysis_dataset[analysis_dataset["clean_name"] == "jamal murray"]

jamal_murray.head()


Unnamed: 0,game_id,game_date,player_id,player_name,clean_name,team_id,pts,season_avg_pts,deviation,over_flag,team_abbreviation,opp_pts_allowed
6980,22400075,2024-10-24,1627750,Jamal Murray,jamal murray,1610612743,12,,,0,OKC,
6981,22400087,2024-10-26,1627750,Jamal Murray,jamal murray,1610612743,22,12.0,10.0,1,LAC,102.0
6982,22400107,2024-10-28,1627750,Jamal Murray,jamal murray,1610612743,17,17.0,0.0,0,TOR,105.5
6983,22400113,2024-10-29,1627750,Jamal Murray,jamal murray,1610612743,24,17.0,7.0,1,BKN,112.0
6984,22400139,2024-11-01,1627750,Jamal Murray,jamal murray,1610612743,6,18.75,-12.75,0,MIN,118.75


## Question 1
### How does player scoring evolve relative to season averages?

Before examining streaks or conditional effects, we first explore how a player's game-by-game scoring compares to their own season average over time.

This helps establish:
- Whether season averages stabilize as expected
- The scale and variability of deviations
- Whether extreme deviations are common or rare