# Data Modeling Seasonality

Now that we've explored what the seasonality dataset looks like, the goal is to be able to make modifications to our average prediction from the initial regression to account for seasonality of pricing. This basically means that we will make +/- modifications to our average prediction based on the the day that we are projecting the data for. We can also do similar things for months of the year as well as holidays. This will hopefully reduce the residuals because the seasonality of the pricing data would cause some correlation among the residuals (based on time of the year) violating a lot of the OLS assumptions. We use averages as a way to explore seasonality. More advanced seasonality measurements could be used if we had more data over several years (where we could build an ARIMA or SARIMA model).

In [13]:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cmx
import matplotlib.colors as colors
import seaborn as sb
import numpy as np
import datetime as dt
from datetime import datetime
%matplotlib inline

### Process Overview:
The general idea behind this analysis is as follows: we aggregate prices by weekday for each listing. Then, we normalize each listing's price by the monday price to find an average multiplier for each listing for each day. Then, for each day we average across all listings to get a final average multiplier for each day. Lastly, we compare these predictions to a subset of the listings.

In [21]:
#Importing Datafile
results_nona = pd.read_csv('../datasets/seasonality_tomodel.csv')
results_multiplier = pd.read_csv('../datasets/seasonality_tomodel.csv')
b=['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
for i in b[1:7]:
    results_multiplier[i] = results_multiplier[i]/results_multiplier['Mon']
results_multiplier['Mon']= 1
b=['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
for i in b[1:7]:
    results_multiplier[i] = results_multiplier[i]/results_multiplier['Mon']
results_multiplier['Mon']= 1
results_multiplier.head(5)

Unnamed: 0,Mon,Tue,Wed,Thu,Fri,Sat,Sun,listing_id
0,1,1.0,1.0,1.0,1.0,1.0,1.0,3604481.0
1,1,1.0,1.0,1.0,1.0,1.0,1.0,2949128.0
2,1,0.991826,0.991826,0.999846,1.138965,1.138965,1.0,4325397.0
3,1,1.0,1.0,1.0,1.0,1.0,1.0,4325398.0
4,1,0.991494,0.994952,1.004395,1.015027,1.011263,0.99895,3426149.0


We see that the dataframe now contains a multiplier for each day of the week for each listing. Now we take an average for each day(averaging across all listings) to see an average multiplier value for each day

In [22]:
results_multiplier.mean()

Mon           1.000000e+00
Tue           9.998007e-01
Wed           9.998361e-01
Thu           1.000367e+00
Fri           1.030631e+00
Sat           1.030674e+00
Sun           1.000927e+00
listing_id    2.560792e+06
dtype: float64

The results are very much in line with what we saw earlier in our seasonality-exploration file. Monday and Tuesday see a slight dip in their prices(99.9%) while Friday and Saturday see a sizable increase in prices (103%). These are thus the numbers we will be using to apply seasonality to the averages from our previous models.

## Predicting prices using our seasonality averages

Now, it is important to test the performance of the averages we arrived at

## Using ARIMA time series models for future forecasting