### [Feature Engineering for Time Series Forecasting](https://python.plainenglish.io/feature-engineering-for-time-series-forecasting-in-python-7c469f69e260)

Feature engineering is the process of creating additional input features from raw time series data to improve the performance of predictive models.

In [1]:
!pip install -q numpy pandas matplotlib
!pip install -q scikit-learn statsmodels

##### Scaling Values

Scaling methods include:

- Min-Max Scaling: Rescales values to a specific range, e.g., `[0, 1]`.

- Standardization (Z-score): Rescales values to have a mean of 0 and standard deviation of 1.

Use *Min-Max Scaling* for data where the range matters.

Use *Standardization* when the scale is unknown or when working with models sensitive to variance.

In [2]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import numpy as np
# Example Time Series
y = np.array([10, 20, 30, 40, 50]).reshape(-1, 1)
# Min-Max Scaling
scaler = MinMaxScaler()
y_minmax = scaler.fit_transform(y)
# Standardization
scaler_std = StandardScaler()
y_std = scaler_std.fit_transform(y)
print("Min-Max Scaled:", y_minmax.flatten())
print("Standardized:", y_std.flatten())

Min-Max Scaled: [0.   0.25 0.5  0.75 1.  ]
Standardized: [-1.41421356 -0.70710678  0.          0.70710678  1.41421356]


##### Looking at Changes in Values

Instead of analyzing absolute values, focusing on changes can remove trends and reveal stationarity.

First-order differencing removes trends to make the data stationary and highlighting short-term changes in the series. 

In [3]:
import pandas as pd
# Example Time Series
y = pd.Series([10, 12, 15, 19, 24])
# First Difference
y_diff = y.diff().dropna()
print("First Difference:", y_diff.values)

First Difference: [2. 3. 4. 5.]


##### Derivatives: Rate of Change and Acceleration
Derivatives measure the rate of change in a time series, which can highlight momentum or acceleration patterns.

In [4]:
import numpy as np
# Example Time Series
y = np.array([10, 12, 15, 19, 24])
# First Derivative (Rate of Change)
dy = np.gradient(y)
print("First Derivative (Rate of Change):", dy)
# Second Derivative (Acceleration)
d2y = np.gradient(dy)
print("Second Derivative (Acceleration):", d2y)

First Derivative (Rate of Change): [2.  2.5 3.5 4.5 5. ]
Second Derivative (Acceleration): [0.5  0.75 1.   0.75 0.5 ]


*First derivatives* (measures the rate of change) help capture trends and momentum.

*Second derivatives* (measures the rate of change of the rate of change) detect points of inflection or changes in acceleration.

##### Embedding Prior Values: Building “Memory”
Embedding previous observations as features allows models to “remember” past values. This is especially important for models that do not inherently capture temporal dependencies (e.g., regression).

In [5]:
import pandas as pd
import numpy as np
# Simulated Time Series Data
np.random.seed(42)
data = pd.Series(np.cumsum(np.random.randn(200))) # Random walk time series
# Create Features: Lagged Values and Rate of Change
df = pd.DataFrame({
'value': data,
'lag_1': data.shift(1),
'lag_2': data.shift(2),
'rate_of_change': data.diff()
}).dropna()

df.head()

Unnamed: 0,value,lag_1,lag_2,rate_of_change
2,1.006138,0.35845,0.496714,0.647689
3,2.529168,1.006138,0.35845,1.52303
4,2.295015,2.529168,1.006138,-0.234153
5,2.060878,2.295015,2.529168,-0.234137
6,3.640091,2.060878,2.295015,1.579213


##### Rolling Statistics
Calculate rolling means, variances, or medians over a window to smooth the series.

In [6]:
import pandas as pd
# Example Time Series
y = pd.Series([10, 12, 15, 19, 24])
rolling_mean = y.rolling(window=3).mean()
print("Rolling Mean:", rolling_mean)

Rolling Mean: 0          NaN
1          NaN
2    12.333333
3    15.333333
4    19.333333
dtype: float64


##### Extracting Seasonality

Decompose a series into trend, seasonal, and residual components.

```python
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(y, period=12)
result.seasonal.head()
```

##### Fourier Transforms
Use Fourier transformations to identify dominant frequencies in seasonal data.

##### Time-Based Features
Extract calendar-related features like month, day of the week, or hour to capture seasonality.

###### Adding Time Features
```python
df.index = pd.date_range(start="2023–01", periods=len(df), freq="M")
df_features = pd.DataFrame({"month": df.index.month, "year": df.index.year})
print(df_features.head())
```

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the ERCOT data
df = pd.read_csv("https://raw.githubusercontent.com/jgscott/ECO395M/refs/heads/master/data/ercot/load_data.csv")
df['date'] = pd.to_datetime(df['Time'])  # Ensure 'date' is in datetime format
df['values'] = pd.to_numeric(df['ERCOT'], errors='coerce')  # Convert 'values' to numeric
df = df.sort_values('date')  # Sort by date

# Drop rows with missing or NaN values
df = df.dropna()

# Resample the data to hourly frequency (mean aggregation)
df = df.set_index('date').resample('h').mean().reset_index()  # Resample to hourly frequency

# Define hold-out period (e.g., last 24 hours)
hold_out_hours = 24  # Hold-out size (24 hours = 1 day)
train = df.iloc[:-hold_out_hours]
hold_out = df.iloc[-hold_out_hours:]

In [None]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from statsmodels.tsa.seasonal import seasonal_decompose

# Ensure train and hold_out datasets are already defined
# Assuming train and hold_out data are from the split

# Scaling: Min-Max and Standardization (for train data only)
scaler_minmax = MinMaxScaler()
scaler_std = StandardScaler()

train_values = train['values'].values.reshape(-1, 1)  # Reshape for scalers
values_scaled_minmax = scaler_minmax.fit_transform(train_values)
values_scaled_std = scaler_std.fit_transform(train_values)

print("Min-Max Scaled Train Values:\n", values_scaled_minmax.flatten())
print("Standardized Train Values:\n", values_scaled_std.flatten())

# First Difference (for train data only)
values_diff = train['values'].diff().dropna()
print("First Difference:\n", values_diff.head())

# First and Second Derivative (Rate of Change and Acceleration) (for train data only)
values_gradient = np.gradient(train['values'].values)
values_acceleration = np.gradient(values_gradient)
print("First Derivative (Rate of Change):\n", values_gradient[:5])
print("Second Derivative (Acceleration):\n", values_acceleration[:5])

# Rolling Mean (for train data only)
rolling_mean = train['values'].rolling(window=3).mean()
print("Rolling Mean (window=3):\n", rolling_mean.head())

# Seasonal Decomposition (for train data only)
seasonal_decomposition = seasonal_decompose(train['values'], period=12, model='additive')
print("Seasonal Component Head:\n", seasonal_decomposition.seasonal.head())

# Create Lagged Features and Rate of Change (for train data only)
df_features_train = pd.DataFrame({
    'value': train['values'],
    'lag_1': train['values'].shift(1),
    'lag_2': train['values'].shift(2),
    'rate_of_change': train['values'].diff()
}).dropna()

# Adding Time-Based Features (for train data only)
df_features_train['month'] = train['date'].dt.month
df_features_train['year'] = train['date'].dt.year
print("Time-Based Features (Train Data):\n", df_features_train[['month', 'year']].head())

# Save rolling mean plot for train data
plt.figure(figsize=(10, 6))
plt.plot(train['date'], train['values'], label="Train Values", color='Blue')
plt.plot(train['date'], rolling_mean, label="Rolling Mean (window=3)", color='Red')

plt.plot(hold_out['date'], hold_out['values'], label="Hold-Out Values", color='Green')
plt.title("Hold-Out Values")
plt.xlabel("Date")
plt.ylabel("Values")
plt.legend()
plt.grid(True)
# plt.savefig("holdout_values_ercot.png")
plt.show()

In [None]:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from sklearn.metrics import mean_absolute_percentage_error

# Fit an Exponential Smoothing model on the training data
# Adjust seasonal_periods and trend/seasonal type as per data characteristics
model = ExponentialSmoothing(
    train['values'],
    trend="additive",
    seasonal="additive",
    seasonal_periods=24  # Assuming daily seasonality in hourly data
)
fitted_model = model.fit()

# Forecast values for the hold-out period
forecast_values = fitted_model.forecast(steps=len(hold_out))

# Add forecast to the hold_out DataFrame for easier plotting
hold_out['forecast'] = forecast_values.values

# Calculate MAPE (Mean Absolute Percentage Error)
mape_value = mean_absolute_percentage_error(hold_out['values'], hold_out['forecast'])
print(f"MAPE: {mape_value * 100:.2f}%")

# Plot the results
plt.figure(figsize=(12, 6))

# Plot training data
plt.plot(train['date'], train['values'], label="Training Data", color='blue')

# Plot hold-out data
plt.plot(hold_out['date'], hold_out['values'], label="Hold-Out Data (Actual)", color='green')

# Plot forecasted data
plt.plot(hold_out['date'], hold_out['forecast'], label="Forecast", color='red', linestyle="--")

# Customize the plot
plt.title(f"Forecasting Hold-Out Data \n MAPE: {mape_value * 100:.2f}%")
plt.xlabel("Date")
plt.ylabel("Load Values")
plt.legend()
plt.grid(True)
plt.tight_layout()

# Save the plot
# plt.savefig("forecast_holdout_ercot.png")
plt.show()