# What is happening at the end of October every year?

>While trying to apply different logic to get a better recommendation system, I figured out that every year in October we have a spike in sales data. Yeah, actually we have several other spikes as well, but since we need to build a recommender system for the end of September, I guess October spikes are rather interesting to have a look at.

>We are not given any information if H&M has a special promotion at these times or they are due to the start of seasonal sales. Nevertheless, I would recommend keeping in mind these spikes if you are building a model which relies on the most popular (sold) items. Here, previous years' data for the same season would be quite helpful.  

In [None]:
# necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# load data
# since we are interested in only sales data over time, we will read only those columns.
# also we set index directly to datetime index, which would be easier to have a daily sum of sales later
transactions = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv', index_col='t_dat', usecols=['t_dat', 'price'], parse_dates=['t_dat'], infer_datetime_format=True)
transactions.head()

In [None]:
# pandas resample() function is quite usefull to deal with timeseries data
# we will get daily sum of price column and plot it for two year
transactions['price'].resample('1D').sum().plot(figsize=(20, 6), alpha=0.6)
plt.title('Daily sales data for 2 years', fontsize=18, color='r');

> **Obviously, we have several spikes, especially in October 2019. Let's have a closer look at these spikes for 2019 and 2018.**

In [None]:
# apply similar technique as above but for the data before 30-10-2018
transactions['price'].resample('1D').sum().loc[:'2018-10-30'].plot(figsize=(20, 6), alpha=0.6)
plt.title('Daily sales data for OCTOBER of 2018', fontsize=18, color='r');

In [None]:
# apply similar technique as above but for the data between 15-09-2019 to 30-10-2019
transactions['price'].resample('1D').sum().loc['2019-09-15':'2019-10-30'].plot(figsize=(20, 6), alpha=0.6)
plt.title('Daily sales data for OCTOBER of 2019', fontsize=18, color='r');

In [None]:
# Ending date of dataset
transactions.index.max()

## Surprise! Surprise!
**H&M asked us to predict items a customer will buy in the next 7-day period after the training time period. Well, as we have seen above, both in 2018 and 2019, the end of September is a vibrant period. So in your model construction, particularly, if you rely on the most popular items sold in the last weeks, I would strongly recommend keeping in mind that last weeks' popular products may not work ideally. Better have a look at last years' same period.**

> if you liked this observation and think it is helpful, please upvote it!
or if you have comments, please leave them below!