# Sports Betting Data Analysis

## Introduction
This notebook aims to analyze historical game statistics and betting odds data to develop predictive models for sports betting. The dataset consists of game statistics and betting odds, which we will preprocess and engineer to create meaningful features for analysis.

## Import Libraries

In [None]:
import pandas as pd 
import numpy as np 

## Load Datasets
We begin by loading the game statistics and betting odds datasets into Pandas DataFrames.

In [None]:
game_stats_path = 'datasets/Game.csv'
betting_odds_path = 'datasets/BettingOdds_History.csv'

game_stats = pd.read_csv(game_stats_path)
betting_odds = pd.read_csv(betting_odds_path)

## Preprocessing and Cleaning

### Handling Missing Values in Game Statistics Data

We begin by handling missing values in the game statistics dataset. Rows with missing `GAME_ID` are removed. For numeric columns, missing values are filled with the column mean. Rows with more than 20% missing data are also removed.

In [None]:
# Handle missing values in game stats data
game_stats.dropna(subset=['GAME_ID'], inplace=True)

In [None]:
# Fill missing numeric data with mean or placeholder value
numeric_cols = game_stats.select_dtypes(include=[np.number]).columns
game_stats[numeric_cols] = game_stats[numeric_cols].fillna(game_stats[numeric_cols].mean())

In [None]:
game_stats.dropna(thresh=int(0.8 * len(game_stats.columns)), inplace=True)

### Handling Missing Values in Betting Odds Data

Similarly, we handle missing values in the betting odds dataset. Rows with missing `GAME_ID` are removed, and numeric columns are filled with their mean. Additionally, rows with missing values in key columns like `HomeSpread_AtClose` and `HomeML` are removed.

In [None]:
# Handle missing values in betting odds data
betting_odds.dropna(subset=['GAME_ID'], inplace=True)

In [None]:
# Fill missing numeric data with mean
betting_numeric_cols = betting_odds.select_dtypes(include=[np.number]).columns
betting_odds[betting_numeric_cols] = betting_odds[betting_numeric_cols].fillna(betting_odds[betting_numeric_cols].mean())

In [None]:
betting_odds.dropna(subset=['HomeSpread_AtClose', 'HomeML'], inplace=True)

### Merging the Datasets

We merge the cleaned game statistics and betting odds datasets on the `GAME_ID` column.

In [None]:
# Merge the two dataframes on the common 'GAME_ID' column
merged_df = pd.merge(game_stats, betting_odds, on='GAME_ID')