This notebook outlines the modeling workflow I used to solve this challenge

## EDA

From the given EDA (notebooks/eda1.ipynb), I got a sense of the features in the dataset.
Since the series were not of the same length, it was essential to take a model which could handle that.
Each store-dept combination would have different amount of data to learn from

The sales were skewed, with some weeks having very huge sale numbers (the sales were also skewed for certain departments)
This made me create a configurable option to log transform the weekly_sales

The ACF plot indicated there is +ve correlation upto 5 weeks, but the weeks after are not significant.
Temperature, Fuel_price, Marrkdown 2,3,4 were removed as they were not highly correlated with fuel price. However, correlation only measures a linear relationship, so the flexibility to add these features was given through a CONFIG.yaml file.

Markdown 4 is highly correlated with markdown 1, so it was dropped.

Store size was highly correlated with sales, so it seems that is an important feature (understandably,larger stores have more sales)

I wanted to do some extra exploratory analysis to identify data quality. It was seen that 0.23 % of sales are negative. I converted those to 1 so they could log transform to zero. The markdowns had a lot of missing values, so I imputed them with zero (to nullify their effect). But this imputation would be more informed if given the implication of the markdown column.

Summary statistics of numeric columns
![](figures/describe.png)

Since the sales are on different scales for each store and dept, I scaled them in a MinMax fashion. 
This helped me analyse their trends and seasonality. 
The data did not have a significant trend, but was very seasonal. The last month (december) recorded high sales across all years

Store sales
![Store sales](figures/store_sales.png)
Department sales
![Random departments for a random store](figures/department_sales.png)
Trends and Seasonality
![Trends](figures/trend_seasonality.png)
Sales per year
![Sales per year](figures/sales_per_year.png)


From the above plots, it seems while all stores have a similar pattern, each department is very unique. 
So department would be very important to forecast the sales


## Data preparation
Data is prepared sequentially, with data being preprocessed, then creating various features and splitting the dataset into train and test.
The numeric columns are on different scales, so they are standardized ensuring that there is no data leakage between train and test.


## Feature engineering
Here, I mainly focused on encoding the features which were categorical. I created time based features like year, month, week number. I also added lag features to the model so it can be used in a supervised setting (user can control number of lags from config file)

However, this would require me to recursively add lags at each predicted time step (Current prediction would be a lag for the next prediction). I left that due to time constraints



## Model and evaluation
I unfortunately was not able to experiment as much as I'd have liked. I saw the data would be need to be processed in different ways to use models like VARIMA, but it is not as powerful (I was not sure if the generated time series followed the ARIMA process, or if there are linear dependencies ) and I wanted to use the continuous and categorical features (exogenous). Hiearchical forecasting was also on my list to try, but I was not sure about the results or implementation.
Rather than try models which may or may not work, I focused on XGBoost which would be able to handle the differing scales and types of features. It was also robust to seasonality
I added the time based features to convert it to a supervised problem.

I also experimented with DeepAR from PyTorch Forecasting, and was able to get it running. However, the results were not that strong (see implementation in notebooks/deepar.py) and I did not have time to investigate. However, I believe it is a strong candidate model for this problem as it is designed for multiple time series and used recurrent neural nets


For evaluation, I used MAPE, Symmetric MAPE and WMAPE. The reason for using them are for their advantages outlined in this [post](https://medium.com/@vinitkothari.24/time-series-evaluation-metrics-mape-vs-wmape-vs-smape-which-one-to-use-why-and-when-part1-32d3852b4779) 

The results of my hyperparamter tuning are in notebooks/hyperparameter.ipynb. Squared error was used as the objective function for the model.

Final results for evaluation - {'mape': 12.54, 'smape': 0.103, 'wmape': 0.113}

Qualitative evaluation

Summed forecasts for all stores and departments
![](figures/summed_forecasts.png)


Forecast for store 1
![](figures/1.png)


Forecast for store 4 department 3
![](figures/1-3.png)


## Pipeline
The pipeline is very streamlined, it prints and saves the forecasts for the 3 week period by running main.py. The store and department number can be passed as command line arguments to print specific forecasts.

All the flexibility is included in CONFIG.yaml where the features, lags, train_split_date, forecast_create_date, transform_target etc can be controlled. 

I also was not sure how to include forecast_create_date, but in general the test set would be created to include dates greater than train_split_date and would be less than or equal to forecast_create_date.

There are many ways this could be optimized but I refrained due to time constraints
1. Create a checkpoint folder for version control and save model params as metadata
2. Create a cmd argument to use a specific model for the forecasts
3. Use MLFlow tracking to record experiments
4. Automatically pick parameters form the best model rather than model_config.yaml
5. Create an endpoint for serving the model predictions given data. This could include a feature store which would take in the provided features (like store, department), calculate the others from databases (size, type of store) and obtain others from different apis (CPI, unemployment etc.)


As for well engineered code, I missed out on documenting and including asserts/tests for all functions. But this would be done ideally.

## Final Comments

All in all, I had a lot of fun in this assesment. I did my best given the time, but I could definitely identify areas of improvement. It was a challenging problem to think and implement, so it helped me understand the complexities of the role.