### Exploratory Data Analysis

Our goal is to understand temperature patterns at RDU Airport to inform feature engineering decisions.

Key questions:
1. Is there a trend component?
2. What seasonality patterns exist (daily, weekly, annual)?
3. What are the autocorrelation patterns?
4. Are there anomalies in the data?

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Load data
df = pd.read_csv("rdu_weather_full.csv", index_col=0, parse_dates=True)

print("Dataset Overview")
print("="*60)
print(f"Date range: {df.index.min()} to {df.index.max()}")
print(f"Total hours: {len(df)}")
print(f"Missing values: {df['temp'].isna().sum()} ({df['temp'].isna().sum()/len(df)*100:.2f}%)")
print(f"\nTemperature statistics:")
print(df['temp'].describe())

Dataset Overview
Date range: 2018-01-01 00:00:00-05:00 to 2025-09-30 23:00:00-04:00
Total hours: 67919
Missing values: 0 (0.00%)

Temperature statistics:
count    67919.000000
mean        17.045498
std          9.192286
min        -15.600000
25%         10.000000
50%         18.300000
75%         23.900000
max         40.000000
Name: temp, dtype: float64
