# Practice #1. "Trend analysis"

This notebook is dedicated to:
- Time series smoothing
- Trend estimation and extraction

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from statsmodels.tsa.holtwinters import ExponentialSmoothing
from statsmodels.tsa.seasonal import seasonal_decompose

import warnings
warnings.filterwarnings('ignore')

## 0. Data reading and visualization

Please, specify path to data

In [None]:
path_to_datafile = "../data/opsd_germany_daily.csv"

In [None]:
# data reading to pandas.DataFrame
df = pd.read_csv(path_to_datafile)

Please, rename time column to `ds` and data column to `y`(you can use `df.rename`) . If use dataset with multiple features select only one and drop NaN values

In [None]:
# your code here
# df.rename(columns={"...": "ds", "...": "y"}, inplace=True)


Convert date column to datetime format and set as index

In [None]:
df["ds"] = pd.to_datetime(df["ds"])
df.set_index("ds", inplace=True)

Number of data data points:

In [None]:
df.shape[0]

Print slice of the timeseries:

In [None]:
df.head()

Let's plot the data

In [None]:
plt.figure(figsize=(20, 5))
plt.ylabel("y")
plt.xlabel("ds")
plt.plot(df);

## 1. Timeseries smoothing

### 1.1 Moving average

Pandas provides method `pandas.DataFrame.rolling` for calculating moving average.<br/>There multiple ways to calculate moving average:

#### Centered moving average

$$
    x[t] = \frac{x[t - \lfloor n / 2 \rfloor] + ... + x[t  - 1] + ... + x[t] + x[t + 1] + ... + x[t + \lfloor n / 2 \rfloor]}{n}
$$
where $n$ - window size

Please, calculate centered moving average.

In [None]:
# your code here
# df_centered = ...

#### Trailing moving average

$$
x[t] = \frac{(x[t - n ] + ... + x[t  - 1] + x[t])}{n} 
$$
where $n$ - window size

Please, calculate trailing moving average.

In [None]:
# your code here
# df_trailing = ...

Smoothed data visualization and comparision

In [None]:
plt.figure(figsize=(20, 5))
plt.plot(df, linewidth=2, label='base')
plt.plot(df_centered, label='centered MA')
plt.plot(df_trailing, label='trailing MA')
plt.ylabel("y")
plt.xlabel("ds")
plt.legend();

Please, smooth the data using different window sizes and compare results

In [None]:
# your code here

Questions:<br>
- How the window size influence smoothed time-series?

### 1.2 Exponential smoothing

Pandas provides method `pandas.DataFrame.ewm` for exponential smoothing.

Please, calculate exponential smoothing.

In [None]:
# your code here
# df_exp = ...

Also, statsmodels liblary provides methods for calculation exponential smoothing `statsmodels.tsa.holtwinters.ExponentialSmoothing`.<br>
Please, uase also the statsmodels method with the same alpha.

In [None]:
# your code here
# df_exp_statm = ...

Results of the both methods shold be identical 

In [None]:
plt.figure(figsize=(20, 5))
plt.plot(df, linewidth=2, label='pandas ewm')
plt.plot(df_exp, label='alpha = 0.5')
plt.plot(df_exp_statm.level, "--", label='ExponentialSmoothing')
plt.ylabel("y")
plt.xlabel("ds")
plt.legend();

Please, smooth the data using different alpha and compare results

In [None]:
# your code here

Questions:<br>
- How the alpha coefficient influence smoothed time-series?

### 1.3 Double exponential smoothing

`statsmodels.tsa.holtwinters.ExponentialSmoothing` alows to calculate double and triple exponential smooothing also.<br>
Please, calculate double exponential smoothing.

In [None]:
# your code here
# df_dexp = ...

In [None]:
plt.figure(figsize=(20, 5))
plt.plot(df, linewidth=2, label='base')
plt.plot(df_dexp.level, label='smoothed')
plt.ylabel("y")
plt.xlabel("ds")
plt.legend();

Please, smooth the data using different alpha and beta coefficients and compare results. Also try to use different types of trend("add", "mul").

In [None]:
# your code here

Questions:<br>
- How the beta coefficient influence smoothed time-series?
- Is there a relationship between coefficients?

## 2. Trend extraction

### 2.1 Linear regression

One of the simplest ways to estimate trend is the Linear Regression model.<br>
Better is to smooth time-series before applying the model using one of the methods above. You can use `sklearn.linear_model.LinearRegression`.

In [None]:
# your code here
# trend = ...

In [None]:
plt.figure(figsize=(20, 5))
plt.plot(df, linewidth=2, label='base')
plt.plot(trend, label='trend')
plt.legend();

### 2.2 Time series decomposition

Also `statsmodels` liblary provides method for time-series decomposition - `statsmodels.tsa.seasonal.seasonal_decompose`. Please, extrat trend using this method.

In [None]:
# your code here
# trend = ...

In [None]:
plt.figure(figsize=(20, 5))
plt.plot(df, linewidth=2, label='base')
plt.plot(trend, label='trend')
plt.legend();

Plsease, find difference between trends extracted from smoothed and not smoothed timeseries.

In [None]:
# your code here
# mean_squared_error(...)