<br>
<p style="color:red; font-size: 50px " > Research </p>
<br>

# 1. 6 Powerful Feature Engineering Techniques For Time Series Data (using Python)

[Source](https://www.analyticsvidhya.com/blog/2019/12/6-powerful-feature-engineering-techniques-time-series/)

## Feature engineering:
1. Date Features
    - Extract day, month, year, day_of_week_num, day_of_week_name

1. Lag Features
    - Previous day's prices are important to make decisions about the future
    - Shift series ahead by 1, 2, ... k and use them as features
1. Rolling window features
    - Do Feature engineering on rolling window
    - e.g Rolling mean with period of 7 days
1. Domain specific features
    - Adding domain specific features can improve the quality of predictions
    - e.g adding features specific to stock market such as technical indicators will help us better predict value of stock
    
## Comments:
1. These features can be added to our feature set

# 2. Some simple forecasting methods

[Source](https://otexts.com/fpp2/simple-methods.html)

1. Average method
    - We forecast future values as equal to average of all time series
    - Let historical data be $y_{1},\dots,y_{T}$
    >- $\hat{y}_{T+h|T} = \bar{y} = (y_{1}+\dots+y_{T})/T$

1. Naïve method
    - For naïve forecasts, we simply set all forecasts to be the value of the last observation.
    - This method works very well for financial time series
    >- $\hat{y}_{T+h|T} = y_{T}$

1. Seasonal naïve method
    - In this method we set forecast to be equal to last observed value in the season
    >- $\hat{y}_{T+h|T} = y_{T+h-m(k+1)}$
    
## Comments:
1. These are naïve methods for forecasting
1. We can use them as base line models for forecasting

# 3. The Complete Guide to Time Series Analysis and Forecasting

[Source]( https://towardsdatascience.com/the-complete-guide-to-time-series-analysis-and-forecasting-70d476bfe775)

## Overview:
This source explains various terms associated with time series such as:
1. Stationarity
    - A time series is said to be stationary if its statistical properties do not change over time.
2. Seasonality
    - Seasonality refers to periodic fluctuations.
3. Autocorrelation
    - Autocorrelation is a mathematical representation of the degree of similarity between a given time series and a lagged version of itself over successive time intervals.

It also explains briefly about various ways to model time series:
1. Moving average
2. Exponential moving average
3. Double exponential moving average
4. Triple exponential moving average
5. ARIMA method (This is discussed in later part in detail)

## Comments:
1. Time series has various properties, understanding these properties are very important in order to apply techniques specific to that time series
2. Time series can be modelled using Moving averages. 
    - But we can try using these as features for machine learning model.
    - Using various moving averages might add value to feature set
    - Similarly we can use moving averages on lag features to add to feature set
3. We can experiment with this feature set for modelling purpose

# 4. LSTM-MSNET

[Source](https://arxiv.org/pdf/1909.04293.pdf)
 
## Overview:

This paper discusses about a new architecture LSTM-MSNET which deals with multiseasonality in time series. The paper deals with Time Series Forecasting and proposes a method which takes inspiration from both the Statistical and Deep Learning World.

The paper is divided into 4 parts:
1. Time series preprocessing
    - Normalization 
    - Variance stabilization using log
    - Moving window transformation
2. Seasonal Decomposition
    - Multiple STL Decomposition (MSTL)
    - Seasonal-Trend decomposition by Regression (STR)
    - Trigonometric, Box-Cox, ARMA, Trend, Seasonal (TBATS)
    - Prophet
    - Fourier Transformation
3. Modelling using LSTM MSNet
4. Deseasonalized approach

We are will be focusing more on 1. and 2. 

## Main ideas:
1. Time series preprocessing for stability can be done using methods discussed in this paper
1. Deseasonalising time series prior to modelling is useful

## Comments:
1. We will use the preprocessing step for our time series data
1. We will try using deseasonalising before modelling

<br>
<p style="color:blue; font-size: 40px " > My unique ideas </p>
<br>

1. Using technical indicators for feature engineering
1. Using amount of sale of certain item as volume
1. Creating multi seasonal time series
    - Say we have a time series y1, y2, y3, ... yk
    - Use y1, y7, y14, ... to create weekly seasonal time series
    - Similarly create different seasonalities

<br>
<p style="color:green; font-size: 40px " > Existing Solutions </p>
<br>

# 1. Very fst model [kernel]

[Source](https://www.kaggle.com/ragnar123/very-fst-model/notebook)


## Data preparation:
1. Reducing Memory usage of dataframe
>- Change datatypes
>- Melt function usage
1. Join and prepare data for training

## Feature engineering:
1. Lag features
>- Shift columns ahead by a certain period to create Lag in features
1. Rolling demand features
>- Rolling features calculated on demand of items
>- Rolling mean, std, skewness, kurtosis
1. Rolling price features
>- Rolling features calculated on price of items
>- Rolling max price, max price change in lag prices
1. Use date features
>- Date day
>- Date month
>- Date year
>- Week day


## Modelling:
1. LGBM used for training simple baseline model

## Key takeaways:
1. Memory reduction techniques
1. Lag features
1. Rolling features
1. Using date features and LGBM we can extract the importance of time as a feature
1. Attained RMSE of 2.119324

# 2. M5 First Public Notebook Under 0.50 [kernel]

[Source]( https://www.kaggle.com/kneroma/m5-first-public-notebook-under-0-50)


## Data preparation:
1. Create training data using merge
1. Reduce memory using melt operation
1. Changed data types to reduce memory


## Feature engineering:
1. Lag features
>- Shift columns ahead by a certain period to create Lag in features
1. Rolling features on Lag features
>- Using rolling features such as rolling mean on Lag features for different windows
1. Use date features
>- Date day
>- Date month
>- Date year
>- Week day
>- Week month
>- Quarter
1. Use of FIRST_DAY parameter
>- This was done to avoid memory overflow
1. Categorical features
>- item_id, dept_id,store_id, cat_id, state_id, event_name_1, event_name_2, event_type_1, event_type_2 used as it is


## Modelling:
1. LGBM used for training model to predict sales
1. Attained score below 0.50
1. Used RMSE as metric to monitor performance of model
1. RMSE attained 2.32389
1. Parameters to train lgb
```python
params = {
    "objective" : "poisson",
    "metric" :"rmse",
    "force_row_wise" : True,
    "learning_rate" : 0.075,
    "sub_row" : 0.75,
    "bagging_freq" : 1,
    "lambda_l2" : 0.1,
    "metric": ["rmse"],
    'num_iterations' : 1200,
    'num_leaves': 128,
    "min_data_in_leaf": 100,
}
```

## Key takeaways:
- Usage of new date features : Quarter and week month 
- lag_*period*: feature lag_*period* this feature gives better context of a *period* by shifting a feature by *period* e.g: lag_7 gives us better context of a week
-  The reason for using lagged values of the target variable is to reduce the effect of self-propagating errors through multiple predictions of the same model
- Similarly using rolling on lag features gives us very rich information
- Suppose we are at day 7 average of sales on day 7 is average of sales from day 1-7 using rolling mean on lag features allows us to capture this information
- Looking at parameters we can see that following parameters play important role in training:
    - sub_row
    - num_iterations
    - num_leaves
    - min_data_in_leaf

# 3. M5 - Simple FE [kernel]

[Source]( https://www.kaggle.com/kyakovlev/m5-simple-fe)


## Data preparation:
1. Data preparation was done in very similar way like earlier kernels

## Feature engineering:
1. Basic aggregation for prices dataframe was done on level like [store_id, item_id]
1. On calculating min, max values on these aggregations we can use this to normalise price values
1. Some momentum based features were added in this kernel
1. price_momentum, price_momentum_m (for monthly basis), price_momentum_y (for yearly basis)
1. price_nunique, item_nunique: Some items are inflation dependent whereas some are stable, these features capture essence of inflation from the data

## Key takeaways:
1. Aggregation features was a new approach here
1. Aggregated normalization adds new information about values
1. Technique for normalising was min_max_scaling
1. This kernel introduces us to momentum based features 
1. price_nunique and item_nunique can turn out to be important features which capture inflation of the items


# References:
Important Kaggle discussion:
- [Top discussion](https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/163414)

Forecasting Principles and Practice:
- [Notes](https://robjhyndman.com/uwafiles/fpp-notes.pdf)



# First Cut Solution

1. Data Loading:
    - Based on the existing kernels we can see that the problem requires huge amount of memory.
    - In order to load dataset we need to reduce the memory usage by changing data types of columns.
    - Complete dataset for training can be obtained by joining 3 tables/ csv files : sales_train_validation.csv, calendar.csv, sell_prices.csv .
    - If the dataset is very large we will take some part/ subset of the dataset before moving to the cloud platform.
    - Also dask can be used if we are working on local machine.

1. Exploratory Data Analysis:
    - We will do basic time series analysis to get insights about the data.
    - We will try to get understanding of distribution of sales.
    - How distribution of sales is changing with respect to time (due to occurence of any special events).
    - Check how the sales are being affected on weekdays and weekends.
    - Check how the sales are being affected on monthly basis, etc.

1. Feature Engineering:
    - For time series data we can engineer following features in the beginning:
        - `Lag features` : These features will help us understand how previous data is affecting current sell prices, we can use multiple lag features with various shifts, eg: lag_7 (7 days shift) will help us understand how prices of goods 7 days before are affecting current prices. Similarly we can use lag_14, lag_28 like features to get understanding of different periods.
        - `Rolling features` : Rolling features will try to summarize how changes in a certain window period affect the sales, some features we can extract from rolling window are, rolling mean, rolling standard devivation, with different periods, also we can apply the same on Lag Features.
        - `Date features` : Extracting day, month, year, week day, week number, etc will give us understanding of, if it is any special day that will boost sales of a certain item.
    
    - Some complex features that we can generate in later parts are:
        - `Seasonal decomposition`: Decomposing time series into season and trend will help us understand if the time series is sensitive to any certain season or is following any trend. We will break time series into various different periods.
        - `Technical indicators`: Some technical indicators from stock market might serve as good features for time series, they are meant to capture essence of complex stock price movemnets.

1. Data Normalization and preprocessing:
    - Taking log of prices stabilizes the time series.
    - We can normalize using window wise normalization dividing each price with sum of all prices in that window
    - To handle non stationarity we will take log of these normalized prices
    - Normalizing might not be useful if we use non parametric models like decision trees/ random forest, we can skip this step in that case.
    
1. Creating a data pipeline:
    - For training we need to pass data in one of the two formats:
        a. Sliding window method - Preferred when data is more and memory is less
        b. Growing window method - Preferred when data is less (more robust)
    - We will create both the pipelines in order to see which is working well on our dataset

1. Creating a baseline model:
    - Just like we use Random model to gauge the performance of classification model, here we will use Naive Prediction model
    - There are multiple approaches to do so, such as average method, naive method, seasonal naive method.
    - We will use the method which gives us least Key performance metric for the case study i.e Weighted Root Mean Squared Scaled Error (RMSSE).
    - Say if seasonal naive method gives leas RMSSE then we will use it as a baseline model

1. Modelling:
    - We will try to model using LightGBM for this case study.
    - It is known that Xgboost produces very good results, but training time for Xgboost can be very high.
    - We will pass data using data pipelines created, save the weights of model for each iteration.
    - And load weights for next iteration (next window of data).
    - We will initially do this for a smaller subset of the data see which features are driving the predictions of sales.
    - Using RMSE we will monitor the performance of our model.
    - On knowing feature importances we will scale down the feature set accordingly.

1. Hyper parameter optimization:
    - In order to achieve best result we can tune our model using hyperopt library.
    - This library uses Bayesian Optimization for getting the best results

1. Future improvements:
    - In order to improve this solution we can use Deep learning models to model this complex data.
    - LSTMS can be used in further stages of this project.
    - Training LSTMS can be very complex and might require huge resources.