# Exploratory Data Analysis

In this notebook, we will perform exploratory data analysis (EDA) on the fantasy statistics dataset. The goal is to visualize trends, understand the dataset, and prepare for feature engineering and model training.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
df = pd.read_csv('../data/raw/fantasy_stats.csv')

# Display the first few rows of the dataset
df.head()

In [2]:
# Check for missing values
missing_values = df.isnull().sum()
missing_values[missing_values > 0]

In [3]:
# Visualize the distribution of fantasy points
plt.figure(figsize=(10, 6))
sns.histplot(df['fantasy_points_ppr'], bins=30, kde=True)
plt.title('Distribution of Fantasy Points (PPR)')
plt.xlabel('Fantasy Points (PPR)')
plt.ylabel('Frequency')
plt.show()

In [4]:
# Average fantasy points by position
plt.figure(figsize=(10, 6))
sns.barplot(data=df, x='position', y='fantasy_points_ppr', estimator='mean')
plt.title('Average Fantasy Points (PPR) by Position')
plt.xticks(rotation=45)
plt.ylabel('Fantasy Points (PPR)')
plt.show()

In [5]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = df.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

In this exploratory data analysis, we have visualized the distribution of fantasy points, examined the average points by position, and analyzed the correlation between different features. This analysis will guide our feature engineering and model training processes.