## Advanced Time Series Forecasting with Attention Mechanism

This project focuses on forecasting a multivariate time series consisting of three correlated variables: Sales_Target, Foot_Traffic and Market_Noise. The dataset contains 1095 daily observations representing approximately three years of data. The objective is to compare a deep learning sequence model against a traditional machine learning baseline in order to understand how temporal dependency modeling affects prediction performance.

The deep learning model implemented in this work is an Attention-based LSTM network. A standard LSTM processes past observations sequentially but treats each timestep with equal importance. However, real world time series often contain periods that are more informative than others. The attention mechanism improves the model by allowing it to automatically assign higher importance to relevant historical timesteps while reducing the influence of noisy observations. This helps the model learn long-term temporal relationships and improves prediction stability across time windows.

As a baseline model, an XGBoost regression algorithm was used. Unlike sequence models, XGBoost treats the task as a tabular supervised learning problem and does not inherently understand temporal ordering. Although it can capture short-term patterns through feature relationships, it lacks the ability to model sequential dependency explicitly.

To evaluate the models correctly, rolling-origin cross validation was used instead of a simple train-test split. In this strategy, the model is repeatedly trained on past observations and tested on future unseen data while the training window expands forward in time. This evaluation better reflects real world forecasting conditions where future data is not available during training and provides a more reliable estimate of generalization performance.

Model performance was measured using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). Predictions were converted back to the original scale before calculating metrics to ensure meaningful interpretation.

The results show that the Attention LSTM produces more stable predictions across multiple forecasting windows and captures temporal dependencies more effectively. The XGBoost model occasionally achieves lower short-term error but struggles to maintain consistency because it does not understand sequential context. Rolling validation confirms that the deep learning model generalizes better for time series forecasting tasks.

In conclusion, attention-based sequence models are more suitable for multivariate time series forecasting problems because they learn temporal structure and focus on important historical patterns, whereas traditional regression models mainly learn static relationships between variables.
