# 01 - Data Exploration: Shaved Ice Dataset

**ICPE 2026 Data Challenge**  
**Objective:** Initial exploration of Snowflake's VM demand dataset

---

## Setup

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Configure notebook display
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 6)
sns.set_style('whitegrid')

# Import custom modules
import sys
sys.path.append('../src')

from data_loader import load_shaved_ice_data, validate_time_range
from plotting import setup_plot_style
from utils import add_time_features

# Setup plotting style
setup_plot_style()

print("✅ Imports complete")

## Load Dataset

**TODO:** Update the file path once dataset is downloaded

In [None]:
# Load the Shaved Ice dataset
# Update this path after downloading the dataset
DATA_PATH = '../data/raw/shavedice-dataset/demand.csv.gz'

# df = load_shaved_ice_data(DATA_PATH)
# Uncomment above line after downloading dataset

print("⏳ Dataset not yet downloaded. See README.md for download instructions.")

## Basic Data Inspection

In [None]:
# Display first few rows
# df.head()

In [None]:
# Dataset info
# df.info()

In [None]:
# Statistical summary
# df.describe()

## Time Range Validation

In [None]:
# Validate time coverage and check for gaps
# validation = validate_time_range(df, timestamp_col='timestamp', expected_freq='H')
# print(f"\nData spans: {validation['start_date']} to {validation['end_date']}")
# print(f"Completeness: {validation['completeness']:.2f}%")

## Simple Time Series Plot

In [None]:
# Plot demand over time (first 7 days as example)
# sample_df = df.head(7 * 24)  # First week of hourly data
# 
# plt.figure(figsize=(12, 6))
# plt.plot(sample_df['timestamp'], sample_df['demand'], linewidth=1.5)
# plt.xlabel('Time')
# plt.ylabel('VM Demand')
# plt.title('VM Demand Over Time (First Week)', fontweight='bold')
# plt.grid(True, alpha=0.3)
# plt.tight_layout()
# plt.show()

## TODO: Next Steps

After downloading the dataset, continue with:

1. **Data Quality Assessment**
   - Check for missing values
   - Identify outliers
   - Examine data distributions

2. **Exploratory Analysis**
   - Analyze demand patterns by hour of day
   - Compare weekday vs weekend patterns
   - Examine regional differences (if applicable)
   - Look for seasonal trends

3. **Feature Engineering**
   - Add time-based features (hour, day of week, etc.)
   - Create lag features
   - Calculate rolling statistics

4. **Initial Insights**
   - Document interesting patterns
   - Note potential forecasting challenges
   - Update lab_notebook.md with observations