# Predict tomorrow's fashion

### The "Why" of exercise

The clothing industry occupies a significant weight in the world economy: on its own, it represents approximately 10% of international trade.

The clothes, before reaching the stores, follow a long and complex journey: an average of one year passes between the initial design, the sourcing of the textiles, their assembly on the production lines, and their delivery to their place of sale. ... to name just a few of the stages of their journey.

In the traditional sector, **the production stages can take from one to two years**. We understand that with such inertia, the finest possible anticipation of the future market is essential:
- producing too much is expensive and leads to scandals to get rid of surpluses (H&M is regularly singled out for its mass destruction),
- producing too little is a missed opportunity that competitors may not miss.

Predicting the future allows, in this context, to make the best production choices.

### The "What"

For years now, **social networks have become the nexus of fashion**, where some show off their best clothes, and where others seek inspiration to revamp their wardrobe.

Understanding fashion can therefore be done online. This is the niche chosen by Heuritech, a Parisian fashion trend prediction startup, which offers customers its predictions for a gigantic directory of clothing trends.

Here we are going to take a (tiny) part of the data used by Heuritech, taken from social networks, to see how it can be used to **predict the fashion of tomorrow**.

### The "How"

We seek to predict the future of time series. It is therefore a **time regression problem**.

To properly model the temporal dynamics, it is necessary to identify the **period of seasonality**, with dynamics that can be recurrent on a daily, weekly, monthly, annual basis... It is also necessary take care to make the separation "test" / "train" so as to properly evaluate the performance of the algorithm.

## 1. Exploratory Analysis

Download the file [`trends.csv`](https://drive.google.com/file/d/1mWMq6cY6PGPdrJFBZvGJNHLBa4Kif0dR/view?usp=sharing). Its columns correspond to different types of trends, and its rows to dates from 2015 to 2019.

The values ​​correspond to an internal scale of **popularity on social networks**.

This popularity is the data that we are going to study and predict.

<b>1.A)</b> Load the csv file and store it as a Data Frame which you will call `df`.

What is the basic time unit (the "time step") of the data?

<b>1.B)</b> Display the data of the 5 trends on a single graph using `plotly.express`.

What can you say about the magnitude of these trends?

<b>1.C)</b> Do you see any trends appearing, ie overall upward or downward dynamics, across the entire dataset?

<b>1.D)</b> **Seasonality** is an essential element in temporal modelling.

Do you see a common seasonality appearing in these time series? If yes which one ? What is the period of the seasonality, that is to say the length <i>m</i> of the cycle which makes yᵗ⁺ᵐ close to yᵗ?

<b>1.E)</b> To confirm our intuitions on the trend and on the seasonal nature of time series, we will use the decomposition into `Trend` + `Season` + `Residual`.

Run the cell below to break down into these three elements for the "denim pants" trend.

- Can you interpret the seasonality of denim? What times of the year do people wear more denim?

- What do you think of the popularity of denim between 2015 and 2018?

Modify the code to visualize the decompositions of other time series, trying to see if your fashion intuitions hold true.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose

import matplotlib.pyplot as plt

r = seasonal_decompose(df['pants_denim'], period=52)
fig = r.plot()
fig.set_figwidth(12)
fig.set_figheight(8)
plt.tight_layout()
for ax in fig.get_axes():
    ax.set_xticks([])

- Reading the `Seasonal` line, we see that people wear denims in the shoulder seasons, i.e. spring and autumn (these are the two recurring "humps" of seasonality)

- Reading the `Trend` line, we see that the trend is up between 2015 and 2018, then stabilizes

## 2. Data preparation before modeling

<b>2.A)</b> To prepare the modeling, the Data Frame must be well formatted. In particular, we want to have:

1. an index that is of periodic type, with a weekly frequency (`dtype: 'period[W-MON]'`)

2. as well as a column named `date` and which is of time type (`dtype: datetime64[ns]`)

Do the necessary pre-processing.

<details>
<summary><i>Click for a hint</i></summary>
    ⟿ you will need to use the `to_datetime` and `to_period` functions (the course details the procedure)
</details>

<b>2.B)</b> Split your dataset into two: everything except the last time period for training (`df_train`), and the last time period for validating performance (`df_test`).

## 3. Modeling

With all the previous steps done, it's now time to tackle the modeling.

For reasons of simplicity, we will restrict ourselves in this part to the time series `top_tanksleeve_tshirtneck`.

<b>3.A)</b> Apply a so-called "naive" seasonal model, and visualize the predictions alongside the real data. What do you think of the quality of this model?

<details>
<summary><i>Click for a hint</i></summary>
To display two plots on the same plot with `plotly.express`, the easiest way is to build a Data Frame with two columns, one for actual data and filled

<b>3.B)</b> Calculate the MASE, and give their interpretation.

How to interpret this metric? Does the result make sense to you?

<b>3.C)</b> Now calculate the MAE and the SMAPE, and give their interpretation.

How to interpret these metrics? Did you expect such a difference?

<b>3.D)</b> Take a good look at seasonality and trend, and decide how you want to model:

- the trend, which can be additive (`trend='add'`) or multiplicative (`trend='mul'`)

- seasonality, which can also be additive (`seasonal='add'`) or multiplicative (`seasonal='mul'`)

Justify this choice, then apply a Holt-Winters type model to this time series.

View predictions and error metrics. Are you getting better performance?

<b>3.E)</b> To verify the intuition we had in the previous question, we now want to compare all the Holt-Winters models:

- `Tadd-Sadd` (additive trend and additive season)
- `Tadd-Smul` (additive trend and multiplicative season)
- `Tmul-Sadd` (multiplicative trend and additive season)
- `Tmul-Smul` (multiplicative trend and multiplicative season)

Compare the MASEs of these models. Did you have the right intuition?

## 4. To go further

To finish the practical work, we now suggest that you redo the same approach as before, but with the SARIMA model, mentioned at the end of the course and whose relevance is historical.

If you still have time, we suggest you apply TBATS, a model dating from 2011 and which is in practice very efficient on many problems. Like the other models seen here, it is implemented within the `sktime` library (reference [here](https://www.sktime.org/en/v0.5.2/api<_>reference/modules/auto<_>generated/sktime.forecasting.tbats.TBATS.html)). Its mathematical derivation is complex, so we prefer to consider it here as a "black box" (we refer the bravest to the succinct explanations of [this page](https://yintingchou.com/posts/2017-05-03-bats-and-tbats-model/) and at [original paper](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.450.8320&rep=rep1&type=pdf))