# Exploratory Data Analysis

This notebook is used for exploratory data analysis (EDA) of the Tide dynamic pricing dataset. The goal of EDA is to understand the data distributions, visualize relationships between features, and identify any patterns or anomalies in the data.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [None]:
# Load the processed data
data_path = '../data/processed/processed_data.csv'
data = pd.read_csv(data_path)

# Display the first few rows of the dataset
data.head()

In [None]:
# Summary statistics
data.describe()

In [None]:
# Visualize the distribution of a key feature
plt.figure(figsize=(10, 6))
sns.histplot(data['key_feature'], bins=30, kde=True)
plt.title('Distribution of Key Feature')
plt.xlabel('Key Feature')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', square=True)
plt.title('Correlation Heatmap')
plt.show()

## Conclusion

This exploratory analysis provides insights into the dataset, highlighting key features and their distributions, as well as relationships between features. Further analysis and feature engineering will be necessary to prepare the data for modeling.