# Machine Learning Energy Demand Forecasting

This project focuses on forecasting electricity demand for the **Western Regional Hub of PJM**, a major electric transmission organization. The hub represents a defined geographic area where electricity prices are aggregated for commercial energy trading and financial futures contracts.  

To build the forecast, I trained an **XGBoost regression model** on 12 years of historical hourly demand data (measured in megawatts). The model generates predictions of future electricity demand, which could support use cases such as validating infrastructure projects or informing energy trading strategies.  

The primary aim of this project, however, is to demonstrate my capability in applying **machine learning techniques for demand forecasting**. The same approach can be generalized to forecast demand for virtually any product or service, provided there is consistently recorded historical data available over time.

## Methodology  

### Data Exploration/ Exploration
Begin with a thorough exploration of the dataset to understand its structure, quality, and key patterns. This step involves examining time-series trends, identifying seasonal or cyclical effects, and checking for anomalies or missing data.  

### Feature Engineering  
Derive informative features that capture the underlying drivers of demand. These may include calendar variables (hour, day, month, season), lagged values of the target variable, rolling statistics to capture recent trends, and domain-specific indicators such as peak hours, business hours, or special events. Where relevant, external variables like weather or pricing data can also be integrated.  

### Model Development  
Select and train an appropriate machine learning model, ensuring that the temporal nature of the data is respected. This typically involves time-based train/test splits or rolling cross-validation. Hyperparameters are tuned to balance accuracy and generalization, while evaluation metrics such as MAE, RMSE, or R² are used to benchmark performance.  

### Results and Challenges  
Evaluate the model’s predictive accuracy and assess how well it captures both baseline demand and extreme values. Discuss challenges encountered, such as handling missing data, underestimation of peaks, or the absence of external drivers like weather variables.  

### Conclusion  
Summarize the effectiveness of the approach, highlight the main findings, and reflect on limitations. Finally, outline opportunities for improvement, such as incorporating additional data sources, experimenting with alternative algorithms, or applying the framework to other domains.

## Data Exploration/ Exploration

![image.png](./Graphs/energy_usage.png)