# Student Assignment: Multi-Step-Ahead Electricity Demand Forecasting

**Objective:** Adapt the provided lab script to forecast the next 24 hours of electricity demand using a real-world dataset. This will involve modifying the data preparation, model architecture, and evaluation steps to handle a sequence-to-sequence forecasting problem.

**Dataset:** AEP Hourly Energy Consumption (available on Kaggle).

---

### Step 1: Data Loading and Exploration

1.  **Download and Load:** Download the "AEP_hourly.csv" dataset. Load it into a pandas DataFrame.
2.  **Data Cleaning & Preparation:**
    * Sort the DataFrame by the index to ensure the time series is in chronological order.
3.  **Visualize:** Plot the entire time series to observe its trends, yearly seasonality, and weekly patterns. You may also want to plot a smaller slice (e.g., one month) to see the daily patterns more clearly.

---

### Step 2: Preprocessing for Multi-Step Forecasting

This is the most critical part of the assignment. The goal is to create sequences where the input (`X`) is a window of past hours and the output (`y`) is the subsequent 24 hours.

1.  **Normalize the Data:** Just like in the lab, use `MinMaxScaler` to scale the `demand` column to a range between 0 and 1.
2.  **Modify `create_sequences`:** Rewrite the sequence creation function. It should now accept three arguments: the data, `n_past` (the number of hours to use as input), and `n_future` (the number of hours to predict).
    * **Logic:** The function should iterate through the data. For each iteration, it will grab `n_past` hours for the input and the *next* `n_future` hours for the output.
    * **Example:** If `n_past = 72` (3 days) and `n_future = 24` (1 day), the first sample would be:
        * `X[0]`: Data from hour 0 to 71.
        * `y[0]`: Data from hour 72 to 95.
3.  **Create the Data:** Use your new function to generate `X` and `y`. A good starting point is `n_past = 168` (1 week) and `n_future = 24`.
4.  **Reshape:** Reshape `X` to be `[samples, n_past, 1]` as required by the LSTM layer. `y` should have the shape `[samples, n_future]`.

---

### Step 3: Build and Train the LSTM Model

You need to modify the model so it can output a sequence of 24 values.

1.  **Modify the Output Layer:** The simplest way to achieve this is to change the final `Dense` layer. Instead of having 1 unit, it should have `n_future` (e.g., 24) units.
    ```python
    # Example model architecture
    model = Sequential()
    model.add(LSTM(units=100, input_shape=(n_past, 1)))
    model.add(Dense(units=n_future)) # Output layer predicts 24 steps
    model.compile(optimizer='adam', loss='mean_squared_error')
    ```
2.  **Train the Model:** Split your data into training and testing sets (remember not to shuffle!) and train the model using the `.fit()` method as before.

---

### Step 4: Evaluate and Visualize the Forecast

1.  **Make Predictions:** Use `model.predict()` on your `X_test` data.
2.  **Inverse Transform:** Remember that your predictions and the actual test values (`y_test`) are scaled. Use `scaler.inverse_transform()` to return them to their original MW scale before calculating the error.
3.  **Calculate RMSE:** Calculate the Root Mean Squared Error between your predictions and the actual values.
4.  **Visualize Results:** The most important step! Create a plot that shows the actual vs. predicted values.
    * Pick a few example periods from your test set (e.g., 3 different 24-hour windows).
    * For each period, plot the `n_past` hours of actual data that the model used as input.
    * On the same plot, continue the line with the 24 hours of *actual* future data and overlay your model's 24-hour *prediction*. This will clearly show how well your model is forecasting.

---

### Step 5: Optimize Your Model (Your Main Task)

Your initial model will be a good baseline, but the goal is to improve it. Experiment with the following and document how each change affects your final RMSE and visualizations.

* **Hyperparameters:**
    * Change the number of `units` in the LSTM layer (e.g., try 50, 100, 150).
    * Experiment with the `n_past` lookback window. Does using more or less history help?
    * Adjust the `batch_size` and number of `epochs` during training.
* **Architecture:**
    * **Stack more layers:** Add a second LSTM layer. Remember to set `return_sequences=True` on the first LSTM layer so it passes its full output sequence to the next one.
    * **Add Dropout:** Add `Dropout` layers after your LSTM layers to help prevent overfitting (e.g., `Dropout(0.2)`).
