# Exploratory Data Analysis

In this notebook, we will perform exploratory data analysis (EDA) on the player statistics and match outcomes from the four Grand Slam tournaments. The goal is to visualize the data, understand distributions, and identify patterns that may help in predicting match outcomes.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [None]:
# Load the data
raw_data_path = '../data/raw/us_open_data.csv'
data = pd.read_csv(raw_data_path)

# Display the first few rows of the dataset
data.head()

In [None]:
# Summary statistics
data.describe()

In [None]:
# Visualize the distribution of match outcomes
plt.figure(figsize=(10, 6))
sns.countplot(x='match_outcome', data=data)
plt.title('Distribution of Match Outcomes')
plt.xlabel('Match Outcome')
plt.ylabel('Count')
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

In this exploratory analysis, we have visualized the distribution of match outcomes and examined the relationships between various player statistics. These insights will guide us in feature selection and model training for predicting match outcomes.