# Time Series
A time series model is one that attempts to relate the value of a variable(s) at one time point with values of the variable(s) at previous time points, for example,
$$GNP_{t+1} = f (GNP_t ,GNP_{t−1},GNP_{t−2},...)+ \text{Error}.$$
Here, $t$ denotes the time. Thus “simple” time series models, like the one above, are “black-box”.
More complex time series models are explanatory in that they try to relate the value of the variable of interest not simply with its previous values but also with previous values of other “explanatory” variables.


In [None]:
#Importing required libraries
from pandas import read_excel
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

Australian monthly electricity production, displays a clear trend and
seasonality. The code cell below loads the datasets into a pandas dataframe, and plots it.

In [None]:
fig, ax = plt.subplots(figsize=(12,10))
series = read_excel('Electricity.xls', sheet_name='Data', header=0, index_col=0, parse_dates=True)
series.plot(ax=ax)
plt.show()

Here below, seasonal graphs are produced and plotted for the same dataset.

In [None]:
# to show the seasonality trend
series = read_excel('Electricity.xls', sheet_name='SeasData', header=0, index_col=0, parse_dates=True)
x = np.array([0,1,2,3,4,5,6,7,8,9,10,11])
months = ['Jan','Feb','Mar','Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
fig, ax = plt.subplots(figsize=(12,10))
for column in series.columns:
    plt.plot(x, series[column])

plt.xticks(x, months)
years = [1957,1958, 1960, 1961, 1963, 1968]
plt.legend(years)

plt.show()

The following bricks production data in Australia does not show a clear trend

In the cell below, following the previous example, load the dataset contained in the 'ClayBricks.xls' file with sheet BRICKSQ into a pandas dataframe, and plot it, to visually look for trends and variations. The data corresponds to the Australian clay brick production.

In [None]:
fig, ax = plt.subplots(figsize=(12,10))
# add your code here
series = read_excel()

Australian clay brick production contains occasional large fluctuations which
are difficult to explain, and hence predict, without knowing the underlying causes. In the cell below, obtain and plot the seasonal graphs. 

In [None]:
# add your code here




# Correlation
$Cov_{XY} = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})$

$Cov_{XY}$ represents the covariance between X and Y.
n is the number of observations.
$X_i$ and $Y_i$ are the individual observations of X and Y, respectively.
$\bar{X}$ and $\bar{Y}$ are the means of X and Y, respectively.
The sum is taken over all n observations.

Pearson's correlation coefficient is defined as $r_{XY} = \frac{Cov_{XY}}{S_X S_Y} = \frac{\sum_{i=1}^{n}(X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n}(X_i - \bar{X})^2} \sqrt{\sum_{i=1}^{n}(Y_i - \bar{Y})^2}}$, where n is the number of observations.
$X_i$ and $Y_i$ are the individual observations of X and Y, respectively.
$\bar{X}$ and $\bar{Y}$ are the means of X and Y, respectively.
$S_X$ and $S_Y$ are the standard deviations of X and Y, respectively.
The sum is taken over all n observations.


Here below, some bank data with four features, labeled as DEOM, AAA, Tto4, D3to4 is loaded, and the variable-by variable correlation plots and coefficent produced.

In [None]:
# Covariance matrix
series = read_excel('Bank.xls', sheet_name='Data3', header=0,
                     dtype=float)

#Plotting the scatter plots of each variable against the other one
pd.plotting.scatter_matrix(series, figsize=(8, 8))
plt.show()

# Correlation matrix for all the variables, 2 by 2
CorrelationMatrix = series.corr()
print(CorrelationMatrix)

Here below, automobile data of 19 Japanese cars is loaded, and the correlation between mileage and price is computed.

In [None]:
# Correlation calculation
series1 = read_excel('JapaneseCars.xls', sheet_name='Data', header=0,
                      dtype=float)
correlval=np.corrcoef(series1['Mileage'], series1['Price'])
correlval=correlval[1,0]
print(correlval)

Now, compute manually using numpy the same Pearson's correlation coefficient. To obtain a numpy array from a column of a pandas dataframe, you can for example do `mileage_array = series1['Mileage'].to_numpy()`

In [None]:
# add your code here

Here below, following the example discussed in the slides this morning, compute instead Spearman's correlation coefficient. You can use for insance scipy's or panda's built-in functions for that.

In [None]:
# add your code here

Below, seasonal plots and the autocorrelation function (ACF) plot are shown for a cement production dataset.

In [None]:
from pandas.plotting import autocorrelation_plot
from statsmodels.graphics.tsaplots import plot_acf
series1 = read_excel('CementProduction.xls', sheet_name='Data', header=0,
              index_col=0, parse_dates=True)
series2 = read_excel('CementProduction.xls', sheet_name='SeasonalData', header=0,
                    index_col=0, parse_dates=True)
fig, ax = plt.subplots(figsize=(12,10))
series2.plot(title='Seasonal plots building materials time series', ax=ax)
plt.show()

fig, ax = plt.subplots(figsize=(12,10))
plot_acf(series1, title='ACF plot of building materials time series', lags=60, ax=ax)
plt.show()

fig, ax = plt.subplots(figsize=(12,10))
autocorrelation_plot(series1, ax=ax)
plt.show()

Now, use statsmodel to compute the ACF for 90 lags. Plot the results and compare to the previous plot: you should obtain the same graph.

Documentation on statsamodel: https://www.statsmodels.org/stable/generated/statsmodels.tsa.stattools.acf.html


In [None]:
# add your code here

# Decomposition
The basic approach in analysing the underlying structure of a time series is to decompose it as
$$Y_t = f (S_t ,T_t ,E_t ),$$
where $Y_t$ is the observed value at time $t$ and the variables are defined as follows:
- $S_t$ is the seasonal component at time $t$;
- $T_t$ is the trend-cycle component at time $t$;
- $E_t$ is an irregular (random) component at time $t$.


# Additive Decomposition

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

series = read_excel('HouseSales.xls', sheet_name='Data', header=0, index_col=0, parse_dates=True)
result = seasonal_decompose(series, model='additive')

fig, (ax1,ax2,ax3,ax4) = plt.subplots(4,1, figsize=(12,10))

# Plot original series
ax1.plot(result.observed)
ax1.set_title('Original Series')

# Plot trend component
ax2.plot(result.trend)
ax2.set_title('Trend')

# Plot seasonal component
ax3.plot(result.seasonal)
ax3.set_title('Seasonal')

# Plot residual component
ax4.plot(result.resid)
ax4.set_title('Residual')

plt.tight_layout()
plt.show()


# Multiplicative Decomposition

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

series = read_excel('HouseSales.xls', sheet_name='Data', header=0, index_col=0, parse_dates=True)
result = seasonal_decompose(series, model='multiplicative')

fig, (ax1,ax2,ax3,ax4) = plt.subplots(4,1, figsize=(12,10))

# Plot original series
ax1.plot(result.observed)
ax1.set_title('Original Series')

# Plot trend component
ax2.plot(result.trend)
ax2.set_title('Trend')

# Plot seasonal component
ax3.plot(result.seasonal)
ax3.set_title('Seasonal')

# Plot residual component
ax4.plot(result.resid)
ax4.set_title('Residual')

plt.tight_layout()
plt.show()
