# Store Sales Time Series Forecasting with Neural Networks

## Project Overview

This project investigates the application of neural networks for demand forecasting in retail environments. The primary objective is to answer whether neural models – from simple dense networks (MLP) to more advanced architectures like LSTM – can effectively predict product demand and thereby support optimization of supply chain costs and inventory management.

## Project Objectives

1. **Develop and Compare Neural Network Architectures** for time series forecasting in retail sales
2. **Evaluate Performance** against traditional baseline methods (Linear Regression, Prophet)
3. **Provide Practical Insights** for inventory management and demand planning

## Dataset Description

### Source
**Kaggle Competition**: [Store Sales - Time Series Forecasting](https://www.kaggle.com/competitions/store-sales-time-series-forecasting/data)

**Corporación Favorita Grocery Sales Dataset**
- **Time Period**: 2013-01-01 to 2017-08-31 (1,684 days)
- **Stores**: 54 retail stores across Ecuador
- **Product Families**: 33 different product categories
- **Total Records**: ~3 million daily sales records
- **Coverage**: Complete grid format (store × product × date)

### Data Files
1. **train.csv** - Historical sales data
   - `date`: Date of sale
   - `store_nbr`: Store identifier (1-54)
   - `family`: Product family/category
   - `sales`: Units sold (target variable)
   - `onpromotion`: Number of items on promotion

2. **test.csv** - Test set for predictions (15 days after training period)

3. **stores.csv** - Store metadata
   - `store_nbr`: Store identifier
   - `city`: Store location
   - `state`: State/province
   - `type`: Store type (A, B, C, D, E)
   - `cluster`: Store cluster (1-17)

4. **oil.csv** - Daily oil prices (Ecuador's economy is oil-dependent)
   - `date`: Date
   - `dcoilwtico`: Oil price

5. **holidays_events.csv** - Holiday and event information
   - `date`: Date
   - `type`: Holiday type (Holiday, Event, etc.)
   - `locale`: Geographic scope (National, Regional, Local)
   - `transferred`: Whether holiday was transferred

6. **transactions.csv** - Daily transaction counts by store
   - `date`: Date
   - `store_nbr`: Store identifier
   - `transactions`: Number of transactions

### Data Characteristics
- **Temporal Granularity**: Daily sales records
- **Missing Data**: Zero sales records (~31% of dataset)
- **Sales Range**: $0.01 to $3,502 per day per store-product
- **Seasonality**: Clear weekly and monthly patterns
- **External Factors**: Oil prices, holidays, promotions impact sales


## Neural Network Architectures

### 1. Baseline Models
- **Linear Regression**: Simple linear relationship modeling
- **Prophet**: Facebook's time series forecasting tool

### 2. Multi-Layer Perceptron (MLP)
- **Architecture**: Dense feedforward network
- **Input**: Windowed time series features + categorical embeddings
- **Hidden Layers**: 2-4 layers with 128-512 neurons
- **Activation**: ReLU for hidden layers, Linear for output
- **Regularization**: Dropout (0.2-0.5), L2 regularization

### 3. Long Short-Term Memory (LSTM)
- **Architecture**: Recurrent neural network with memory cells
- **Input**: Sequential time series data (lookback window)
- **LSTM Layers**: 1-3 layers with 64-256 units
- **Output**: Dense layer for final prediction
- **Variants**: Vanilla LSTM, Bidirectional LSTM, Stacked LSTM

### 4. TabPFN (Tabular Prior-Data Fitted Networks)
- **Architecture**: Pre-trained Transformer for small tabular datasets
- **Approach**: Prior-Data Fitted Networks using synthetic datasets
- **Advantages**: 
  - No hyperparameter tuning required
  - Extremely fast inference (seconds)
  - Strong performance on small tabular problems
  - No training needed - uses pre-trained weights
- **Application**: Feature-based forecasting (convert time series to tabular format)
- **Limitations**: Maximum 100 features, 10,000 samples per prediction
- **Use Case**: Benchmark against traditional tabular ML approaches


## Evaluation Metrics

### Primary Metrics

#### 1. Root Mean Square Logarithmic Error (RMSLE)
Advantages:
- Penalizes underestimation more than overestimation
- Handles zero values well
- Scale-invariant

#### 2. Mean Absolute Error (MAE)
Advantages:
- Easy to interpret (average prediction error)
- Robust to outliers
- Same units as target variable

#### 3. Root Mean Square Error (RMSE)
Advantages:
- Penalizes large errors more heavily
- Commonly used benchmark
- Same units as target variable

### Secondary Metrics

#### 4. Mean Absolute Percentage Error (MAPE)
Advantages:
- Scale-independent
- Easy to interpret as percentage
- Good for business understanding


### Evaluation Strategy
#### Expanding window validation
Expanding window validation - a specialized cross-validation technique designed specifically for time series data where temporal order matters.

- Growing training sets: Each fold uses more historical data than the previous one
- Fixed validation window: Each validation period has the same size
- Chronological order: Never uses future data to predict the past
- Progressive validation: Simulates real-world forecasting conditions

#### Time Series Cross-Validation
- Using expanding window validation
- Train data grows with each fold
- Fixed-size validation window
- Maintains temporal order of data
- Common in time series forecasting

#### Evaluation by Segments
1. **By Product Family**: Performance for each of 33 product categories
2. **By Store Type**: Performance across different store types (A, B, C, D, E)
3. **By Store Cluster**: Performance across 17 store clusters
4. **By Time Period**: Performance during different seasons/months
5. **By Sales Volume**: Performance for high vs low-volume products

## Experimental Design

### Model Training Pipeline

#### Phase 1: Data Exploration 
- [x] Load and explore dataset
- [x] Analyze sales patterns and distributions
- [x] Identify data quality issues
- [x] Visualize temporal patterns
- [x] Analyze store and product performance

#### Phase 2: Data Preprocessing
- [ ] Implement feature engineering pipeline
- [ ] Handle zero sales appropriately
- [ ] Create categorical embeddings
- [ ] Implement time series data splitting
- [ ] Create data loaders for different model types

#### Phase 3: Baseline Models
- [ ] Implement linear regression baseline
- [ ] Implement Prophet model
- [ ] Implement moving averages
- [ ] Establish baseline performance metrics

#### Phase 4: Neural Network Implementation
- [ ] Implement MLP architecture
- [ ] Implement LSTM architecture
- [ ] Implement TabPFN approach (feature engineering + pre-trained model)
- [ ] Hyperparameter tuning
- [ ] Model training and validation

#### Phase 5: Model Comparison and Analysis
- [ ] Compare all models using established metrics
- [ ] Statistical significance testing
- [ ] Error analysis by segments
- [ ] Feature importance analysis
- [ ] Computational efficiency comparison

---

## Expected Outcomes

### Research Questions to Answer

1. **Model Performance Hierarchy**
   - Do neural networks outperform traditional methods?
   - Which architecture performs best for different product categories?
   - How does performance vary across stores and time periods?
   - Can TabPFN compete with specialized time series methods?

2. **Feature Importance**
   - Which features are most predictive of sales?
   - How important are external factors (oil prices, holidays)?
   - Do categorical embeddings improve performance?
   - What feature engineering works best for TabPFN?

3. **Practical Considerations**
   - What is the computational cost vs accuracy trade-off?
   - How stable are the models across different time periods?
   - Can the models handle seasonal variations effectively?
   - Is TabPFN's zero-shot approach viable for retail forecasting?