In [3]:
import pandas as pd

# 📌 Step 1: Load Dataset
file_path = "/users/tonyngo/Downloads/nba-forecasts/nba_elo.csv"  # Adjust path if needed
df = pd.read_csv(file_path)

# 📌 Step 2: Select Relevant Columns
df = df[['date', 'season', 'team1', 'team2', 'elo1_pre', 'elo2_pre', 'elo_prob1', 'elo_prob2', 'score1', 'score2']]

# 📌 Step 3: Convert 'date' to Datetime Format
df['date'] = pd.to_datetime(df['date'])

# 📌 Step 4: Display Overview as Table
print("\n📊 Dataset Overview:")
print(df.head().to_string(index=False))  # Displays without index for better readability

# 📌 Step 5: Basic Analysis - Average Scores Per Team
avg_scores = df.groupby('team1')[['score1']].mean().rename(columns={'score1': 'avg_score'}).reset_index()

# Display average scores as a table
print("\n📈 Average Scores Per Team:")
print(avg_scores.to_string(index=False))  # Displays without index for better readability


📊 Dataset Overview:
      date  season team1 team2  elo1_pre  elo2_pre  elo_prob1  elo_prob2  score1  score2
1946-11-01    1947   TRH   NYK    1300.0 1300.0000   0.640065   0.359935      66      68
1946-11-02    1947   PRO   BOS    1300.0 1300.0000   0.640065   0.359935      59      53
1946-11-02    1947   STB   PIT    1300.0 1300.0000   0.640065   0.359935      56      51
1946-11-02    1947   CHS   NYK    1300.0 1306.7233   0.631101   0.368899      63      47
1946-11-02    1947   DTF   WSC    1300.0 1300.0000   0.640065   0.359935      33      50

📈 Average Scores Per Team:
team1  avg_score
  ANA 112.000000
  AND  88.631579
  ATL 105.375686
  BAL 114.903292
  BLB  82.100719
  BOS 107.061782
  BRK 106.432836
  BUF 106.875723
  CAP 101.954545
  CAR 113.575221
  CHH 101.939551
  CHI 103.385292
  CHO 101.319790
  CHP 112.129630
  CHS  78.075000
  CHZ 111.920000
  CIN 115.408537
  CLE 103.015487
  CLR  73.935484
  DAL 105.158782
  DEN 110.393819
  DET 105.194669
  DLC 114.617778
  DNA 125

# NBA Elo Ratings Dataset Analysis

## 📌 Overview

This project analyzes the NBA Elo Ratings dataset, which contains historical game data, Elo ratings, and win probabilities for teams. The dataset is useful for evaluating team performance trends, predicting outcomes, and exploring advanced analytics in basketball.

## 📊 Dataset Summary

Data Source: FiveThirtyEight’s NBA Elo Ratings dataset
Key Columns:
date – Game date
season – NBA season year
team1, team2 – Competing teams
elo1_pre, elo2_pre – Pre-game Elo ratings for both teams
elo_prob1, elo_prob2 – Win probabilities based on Elo ratings
score1, score2 – Final game scores
🔍 Key Insights

## Elo Ratings as Performance Indicators
Teams with higher Elo ratings before the game tend to have higher win probabilities.
Elo rating fluctuations reflect winning/losing streaks over time.
Score Trends
The dataset allows calculation of average points per team, revealing offensive strength across seasons.
A comparison of score1 and score2 helps identify close games vs. blowouts.
Predictive Potential
Elo probabilities (elo_prob1, elo_prob2) correlate well with actual game results, making the dataset valuable for predictive modeling.
## 📈 Next Steps

Advanced Metrics: Implement analysis of Win Shares, PER, Offensive & Defensive Ratings.
Visualization: Create interactive plots to track Elo trends over time.
Machine Learning: Build a predictive model to estimate game outcomes.