# 1a. Temporal features

These are exogenous, time-varying features that correlate with daily transit usage.
Since we first analyze aggregate ridership, we select features that uniformly affect the city&mdash;in particular, the weather, holidays, the days of the week, and the months of the year.

In [None]:
import pandas as pd
from final_project.config import FEATURES_DIR, RIDERSHIP_DIR, STATION_IDS
from final_project.data import noaa
from final_project.features import temporal

These features should go as far back as the ridership series go.

In [None]:
y = pd.read_csv(RIDERSHIP_DIR / 'y.csv', index_col='date', parse_dates=True)
y.head()

## Weather

Get weather data as reported from O'Hare and Midway International Airports.
Variations in temperature and precipitation across the city each day are assumed to be miniscule enough that we can take one daily weather measurement as representative of the whole city on that day.

O'Hare has been the official weather reporting station for Chicago since 1980.
Fill missing values in the O'Hare data with corresponding values from Midway.
Impute 0 to any value missing from both airports.

In [None]:
date_range = y.index
weather_df = temporal.get_weather_features(date_range[0], date_range[-1])
weather_df.head()

Note the `nan` counts to confirm that this weather data is complete enough.

In [None]:
ohare_weather = noaa.get_weather(
    STATION_IDS['ohare'],
    date_range[0],
    date_range[-1]
)
midway_weather = noaa.get_weather(
    STATION_IDS['midway'],
    date_range[0],
    date_range[-1]
)
# nan counts from O'Hare, Midway, and both.
pd.DataFrame({
    'ohare_nan': ohare_weather.isna().sum(),
    'midway_nan': midway_weather.isna().sum(),
    'both_nan': (ohare_weather.isna() * midway_weather.isna()).sum()
})

## Weekends, months, and holidays

Create dummies for weekends, each month, and federal holidays.

In [None]:
is_weekend = temporal.create_weekend_dummies(date_range)
is_holiday = temporal.create_holiday_dummies(date_range)
month_dummies = temporal.create_month_dummies(date_range)

## Save data

Concatenate these data into a daily feature matrix, and save it.

In [None]:
X_temp = pd.concat([weather_df, is_weekend, is_holiday, month_dummies], axis=1)
X_temp.to_csv(FEATURES_DIR / 'X_temp.csv')
X_temp.head()