Univariate time series data analysis is the most popular type of temporal data, where a single numeric observation is recorded sequentially over equal time periods. Only the variable observed and its relation to time is considered in this analysis.

The forecasting of future values of this univariate data is done through univariate modeling. In this case, the predictions are dependent only on historical values. The forecasting can be done through various statistical methods. 

## Moving Average Forecast

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import numpy as np

First, let's work trough a very simple example (from [the documentation of Pandas itself](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html)).

In [None]:
df = pd.DataFrame({'value': [0, 1, 2, 3, 4]})
df.rolling(2)

In [None]:
df.rolling(2).sum()

See what happens if we introduce a none-value

In [None]:
df = pd.DataFrame({'value': [0, 1, 2, np.nan, 4, np.nan, 7, 8, 9]})
df.rolling(2).sum()

Now let's use some real data.

In [None]:
df = pd.read_csv('../data/GDPUS.csv', header=0)
df.head()

In [None]:
plt.plot(df.Year, df.GDP, label='GDP')
plt.legend(loc='best')
plt.show()

In [None]:
df_avg = df.copy()
#calculating the rolling mean - with window 5
df_avg['moving_avg_forecast'] = df['GDP'].rolling(5).mean()

In [None]:
df_avg

In [None]:
plt.plot(df['GDP'], label='GDP')
plt.plot(df_avg['moving_avg_forecast'], label='GDP MA(5)')
plt.legend(loc='best')
plt.show()

## Autoregressive model (AR)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.graphics.tsaplots import plot_pacf

In [None]:
df = pd.read_csv('../data/opsd_germany_daily.csv')
df.head()

Note that the `Date` property is already in the ISO-8601 format, so we can just change the type of this column to `datetime`. Alternatively, we could have told Pandas that this column contains a date.

In [None]:
df['Date'] = pd.to_datetime(df['Date'])
df.info()

# the alternative would have been as follows:
#df = pd.read_csv('../data/opsd_germany_daily.csv', parse_dates=[0], date_format='ISO8601')
#df.info()

In [None]:
import matplotlib.dates as mdates
import datetime

years = mdates.YearLocator()   # every year
months = mdates.MonthLocator()  # every month
yearsFmt = mdates.DateFormatter('%Y')

fig, ax = plt.subplots()
ax.plot(df['Date'], df['Consumption'])

ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.xaxis.set_minor_locator(months)
ax.tick_params(axis='x', labelrotation=90)

datemin = datetime.date(df.Date.min().year, 1, 1)
datemax = datetime.date(df.Date.max().year + 1, 1, 1)
ax.set_xlim(datemin, datemax)
ax.set_xlabel('Year')
ax.set_ylabel('Consumption')

Create and train a model

In [None]:
train_df = df['Consumption'][:len(df)-100]
test_df = df['Consumption'][len(df)-100:]
test_yrs = df['Date'][len(df)-100:].astype(str)

In [None]:
model = AutoReg(train_df, lags=8).fit()
model.summary()


In [None]:
test_df.head()

In [None]:
preds = model.predict(start=len(train_df), end=(len(df)-1), dynamic=False)

We want to make one dataframe again, with the dates, the actual and the predicted values.

In [None]:
plot_df = pd.concat([test_yrs, test_df, preds], axis=1)
plot_df.rename(columns={0:'Predicted'}, inplace=True)
plot_df['Date'] = pd.to_datetime(plot_df['Date'])
plot_df

In [None]:
fig, ax = plt.subplots()

ax.plot(plot_df['Date'], plot_df['Consumption'], label='Actual')
ax.plot(plot_df['Date'], plot_df['Predicted'], label='Predicted')

ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
ax.xaxis.set_minor_locator(months)
ax.tick_params(axis='x', labelrotation=90)

# datemin = datetime.date(plot_df['Date'].min().year, 1, 1)
# datemax = datetime.date(plot_df['Date'].max().year + 1, 1, 1)
# ax.set_xlim(datemin, datemax)
ax.set_xlabel('Year')
ax.set_ylabel('Consumption')
ax.legend()