```statsmodels``` is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct. The package is released under the open source Modified BSD (3-clause) license. The online documentation is hosted at ```statsmodels.org```.

In [6]:
# We will display plots right inside Jupyter Notebook
%matplotlib inline

In [7]:
from imports import * 
from structured import *

Using TensorFlow backend.


In [121]:
#for the notebook only (not for JupyterLab) run this command once per session
alt.renderers.enable('notebook')

RendererRegistry.enable('notebook')

#### Introduction to Time Series Data

**Time** does play a role in normal machine learning datasets. Predictions are made for new
data when the actual outcome may not be known until some future date. The future is being
predicted, but all prior observations are treated equally. Perhaps with some very minor temporal
dynamics to overcome the idea of concept drift such as only using the last year of observations
rather than all data available.
A time series dataset is different. *Time series adds an explicit order dependence between
observations: a time dimension. This additional dimension is both a constraint and a structure
that provides a source of additional information.*

**A time series is a sequence of observations taken sequentially in time.** - *Time Series Analysis: Forecasting and Control.*

```Time Series Jargon```

- ```t-n```: A prior of lag time(eg: ```t-1``` for the previous time).
- ```t```: A current time and point in difference. 
- ```t+n```: A future or forecast time (e.g: ```t+1``` for the next time).

*Describing vs. Predicting*

###### Time Series Analysis

In *descriptive modeling*, or time series analysis, a time series is modeled to determine its components in terms of seasonal patterns, trends, relation to external factors, and the like. [...] In contrast, time series forecasting uses the information in a time series (perhaps with additional information) to forecast future values of that series.

Time series analysis involves developing models that best capture or describe an observed time series in order to understand the underlying causes. This field of study seeks the why behind a time series dataset. This often involves making assumptions about the form of the data and decomposing the time series into constitution components. The quality of a descriptive model is determined by how well it describes all available data and the interpretation it provides to better inform the problem domain.

The primary objective of time series analysis is to develop mathematical models that provide plausible descriptions from sample data. *-  Page 11, Time Series Analysis and Its Applications: With R Examples.*

###### Time Series Forecasting

Making predictions about the future is called extrapolation in the classical statistical handling of time series data. More modern fields focus on the topic and refer to it as time series forecasting. Forecasting involves taking models t on historical data and using them to predict future observations. Descriptive models can borrow from the future (i.e. to smooth or remove noise), they only seek to best describe the data. An important distinction in forecasting is that the future is completely unavailable and must only be estimated from what has already happened. The skill of a time series forecasting model is determined by its performance at predicting the future. This is often at the expense of being able to explain why a specic prediction was made, confidence intervals and even better understanding the underlying causes behind the
problem.

###### Components of a Time Series

A time series can be broken down into 4 constituent parts

- *Level*: the baseline value for the series if ti were a straight line.
- *Trend*: The optional and often linear increasing or decresing behaviour of the series over time.
- *Seasonality*: The optional repeating patterns or cycles of behaviour over time. 
- *Noise*: Thje optional variability in the observations that cannot be explained by the model.

The main features of many time series are trends and seasonal variations [...] another
important feature of most time series is that observations close together in time tend
to be correlated (serially dependent)

**Concerns of Forecasting**: 

When forecasting, it is important to understand your goal. Use the Socratic method and ask lots
of questions to help zoom in on the specics of your predictive modeling problem. For example:
    
1. How much data do you have available and are you able to gather it all together?
More data is often more helpful, offering greater opportunity for exploratory data analysis,
model testing and tuning, and model fidelity.

2. What is the time horizon of predictions that is required? 
Short, medium or long term? Shorter time horizons are often easier to predict with higher confidence.
3. Can forecasts be updated frequently over time or must they be made once and remain static? 
Updating forecasts as new information becomes available often results in more accurate predictions.
4. At what temporal frequency are forecasts required?
Often forecasts can be made at a lower or higher frequencies, allowing you to harness down-sampling, and up-sampling
of data, which in turn can offer benefits while modeling.

Time series data often requires cleaning, scaling, and even transformation. For example:

- *Frequency*. Perhaps data is provided at a frequency that is too high to model or is
unevenly spaced through time requiring resampling for use in some models.
- *Outliers*. Perhaps there are corrupt or extreme outlier values that need to be identied
and handled.
- *Missing*. Perhaps there are gaps or missing data that need to be interpolated or imputed.

There is almost an endless supply of time series forecasting problems. Below are 10 examples
from a range of industries to make the notions of time series analysis and forecasting more
concrete.

- Forecasting the corn yield in tons by state each year.
- Forecasting whether an EEG trace in seconds indicates a patient is having a seizure or
not.
- Forecasting the closing price of a stock each day.
- Forecasting the birth rate at all hospitals in a city each year.
- Forecasting product sales in units sold each day for a store.
- Forecasting the number of passengers through a train station each day.
- Forecasting unemployment for a state each quarter.
- Forecasting utilization demand on a server each hour.
- Forecasting the size of the rabbit population in a state each breeding season.
- Forecasting the average price of gasoline in a city each day.

I expect that you will be able to relate one or more of these examples to your own time
series forecasting problems that you would like to address.

##### Time series as Supervised Learning Problem

Time series forecasting can be framed as a supervised learning problem. This re-framing of your time series data allows you access to the suite of standard linear and nonlinear machine learning algorithms on your problem.

**Sliding Window**

The use of prior time steps to predict the next time step is called the *sliding window method*.
For short, it may be called the window method in some literature. In statistics and time series
analysis, this is called a *lag or lag method*. The number of previous time steps is called the
window width or size of the lag. This sliding window is the basis for how we can turn any time
series dataset into a supervised learning problem. From this simple example, we can notice a
few things:

We can see how this can work to turn a time series into either a regression or a classication
supervised learning problem for real-valued or labeled time series values.

- We can see how once a time series dataset is prepared this way that any of the standard linear and nonlinear machine learning algorithms may be applied, as long as the order of the rows is preserved.
- We can see how the width sliding window can be increased to include more previous time steps.
- We can see how the sliding window approach can be used on a time series that has more than one value, or so-called multivariate time series.

univariate_timeseries             |  univariate_timeseries_supervised
:-------------------------:|:-------------------------:
![](./univariate_timeseries.PNG) | ![](./univariate_timeseries_supervised.PNG)

Multivariate timeseries turned into supervised problem.

multivariate_timeseries             |  multivariate_timeseries_supervised
:-------------------------:|:-------------------------:
![](./multivariate_timeseries.PNG)  |  ![](./multivariate_timeseries_supervised.PNG)

- **Univariate Time Series**: These are datasets where only a single variable is observed at each time, such as temperature each hour. The example in the previous section is a univariate time series dataset.


- **Multivariate Time Series**: These are datasets where two or more variables are observed at each time.

Supervised learning is the most popular way of framing problems for machine learning as a collection of observations with inputs and outputs. Sliding window is the way to restructure a time series dataset as a supervised learning problem. Multivariate and multi-step forecasting time series can also be framed as supervised
learning using the sliding window method.

#### Data Prepration

##### Chapter 4: Load and Explore Time Series Data

In [9]:
series = pd.read_csv('./daily-total-female-births-in-cal.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

In [11]:
type(series)

pandas.core.series.Series

In [12]:
series.head()

Date
1959-01-01    35
1959-01-02    32
1959-01-03    30
1959-01-04    31
1959-01-05    44
Name: Daily total female births in California, 1959, dtype: int64

In [14]:
len(series), series.size

(366, 366)

In [16]:
series.describe()

count     366.000000
mean       47.218579
std       100.472534
min        23.000000
25%        37.000000
50%        42.000000
75%        46.000000
max      1959.000000
Name: Daily total female births in California, 1959, dtype: float64

In [20]:
series['1959-01-01']

35

Time Series data must be re-framed as a supervised learning dataset before we can start using machine learning algorithms. There is no concept of input and output features in time series. Instead, we must choose the variable to be predicted and use feature engineering to construct all of the inputs that will be used to make predictions for future time steps.

##### Chapter 5: Basic Feature Engineering

At first we will look into three classes of features that we can create from out timeseries data. 

- **Date Time Features**: these are components of the timestep itself for each observation.
- **Lag Features**: these values at prior time steps. 
- **Window Features**: these are a summary of values over a fixed window of prior time steps.

The goal of feature engineering is to provide strong and ideally simple relationships between
new input features and the output feature for the supervised learning algorithm to model. In
effect, we are moving complexity.

Complexity exists in the relationships between the input and output data. In the case of time
series, there is no concept of input and output variables; we must invent these too and frame
the supervised learning problem from scratch. We may lean on the capability of sophisticated
models to decipher the complexity of the problem. We can make the job for these models easier
(and even use simpler models) if we can better expose the inherent relationship between inputs
and outputs in the data.

The diffculty is that we do not know the underlying inherent functional relationship between
inputs and outputs that we're trying to expose. If we did know, we probably would not need
machine learning. Instead, the only feedback we have is the performance of models developed
on the supervised learning datasets or views of the problem we create. In eect, the best default
strategy is to use all the knowledge available to create many good datasets from your time series
dataset and use model performance (and other project requirements) to help determine what
good features and good views of your problem happen to be.

For clarity, we will focus on a univariate (one variable) time series dataset in the examples,
but these methods are just as applicable to multivariate time series problems. Next, let's take a
look at the dataset we will use in this tutorial.

In [69]:
series = pd.read_csv('./daily-minimum-temperatures-in-me.csv', header =0 , index_col=0, parse_dates = True, squeeze=True)

In [70]:
series.head()

Date
1981-01-01    20.7
1981-02-01    17.9
1981-03-01    18.8
1981-04-01    14.6
1981-05-01    15.8
Name: Daily minimum temperatures in Melbourne, Australia, 1981-1990, dtype: object

In [25]:
series.size

3650

In [39]:
type(series)

pandas.core.series.Series

In [40]:
df = pd.DataFrame()

In [41]:
df['month'] = [series.index[i].month for i in range(len(series))]
df['day'] = [series.index[i].day for i in range(len(series))]
df['tempreature'] = [series[i] for i in range(len(series))]

- Minutes elapsed for the day.
- Hour of day.
- Business hours or not.
- Weekend or not.
- Season of the year.
- Business quarter of the year.
- Daylight savings or not.
- Public holiday or not.
- Leap year or not.

From these examples, you can see that you're not restricted to the raw integer values. You
can use binary 
ag features as well, like whether or not the observation was recorded on a public
holiday. In the case of the minimum temperature dataset, maybe the season would be more
relevant. It is creating domain-specic features like this that are more likely to add value to
your model. Date-time based features are a good start, but it is often a lot more useful to
include the values at previous time steps. These are called lagged values and we will look at
adding these features in the next section.

**Lag Features**

Lag features are the classical way that time series forecasting problems are transformed into
supervised learning problems. The simplest approach is to predict the value at the next time
(t+1) given the value at the current time (t). The supervised learning problem with shifted
values looks as follows: 

![](./lag_overview.PNG)

The Pandas library provides the ```shift()``` function1 to help create these shifted or lag
features from a time series dataset. Shifting the dataset by 1 creates the t column, adding a NaN
(unknown) value for the first row. The time series dataset without a shift represents the t+1.
Let's make this concrete with an example. The first 3 values of the temperature dataset are
20.7, 17.9, and 18.8. The shifted and unshifted lists of temperatures for the first 3 observations
are therefore:

![](./lag_pandas_shift.PNG)

We can concatenate the shifted columns together into a new DataFrame using the ```concat()``` function along the ```column axis (axis=1)```. Putting this all together, below is an example of

In [47]:
temps = pd.DataFrame(series.values)

In [48]:
temps

Unnamed: 0,0
0,20.7
1,17.9
2,18.8
3,14.6
4,15.8
5,15.8
6,15.8
7,17.4
8,21.8
9,20


In [49]:
df = pd.concat([temps.shift(1), temps], axis=1)
df.columns = ['t', 't+1']

In [51]:
df.head()

Unnamed: 0,t,t+1
0,,20.7
1,20.7,17.9
2,17.9,18.8
3,18.8,14.6
4,14.6,15.8


more lag terms

In [53]:
temps = pd.DataFrame(series.values)
dataframe = pd.concat([temps.shift(3), temps.shift(2), temps.shift(1), temps], axis=1)
dataframe.columns = ['t-2', 't-1', 't', 't+1']

In [54]:
dataframe

Unnamed: 0,t-2,t-1,t,t+1
0,,,,20.7
1,,,20.7,17.9
2,,20.7,17.9,18.8
3,20.7,17.9,18.8,14.6
4,17.9,18.8,14.6,15.8
5,18.8,14.6,15.8,15.8
6,14.6,15.8,15.8,15.8
7,15.8,15.8,15.8,17.4
8,15.8,15.8,17.4,21.8
9,15.8,17.4,21.8,20


Again, you can see that we must discard the first few rows that do not have enough data to
train a supervised model. A dificulty with the sliding window approach is how large to make
the window for your problem. Perhaps a good starting point is to perform a sensitivity analysis
and try a suite of different window widths to in turn create a suite of different views of your
dataset and see which results in better performing models. There will be a point of diminishing
returns. Additionally, why stop with a linear window? Perhaps you need a lag value from last week,
last month, and last year. Again, this comes down to the specific domain. In the case of the
temperature dataset, a lag value from the same day in the previous year or previous few years
may be useful. We can do more with a window than include the raw values. In the next section,
we'll look at including features that summarize statistics across the window.

**Rolling Window Statistics**

A step beyond adding raw lagged values is to add a summary of the values at previous time
steps. We can calculate summary statistics across the values in the sliding window and include
these as features in our dataset. Perhaps the most useful is the mean of the previous few values,
also called the ```rolling mean```.
We can calculate the mean of the current and previous values and use that to predict the
next value. For the temperature data, we would have to wait 3 time steps before we had 2
values to take the average of before we could use that value to predict a 3rd value. For example:

![](roll_window.PNG)

Pandas provides a ```rolling()``` function that creates a new data structure with the window
of values at each time step. We can then perform statistical functions on the window of values
collected for each time step, such as calculating the mean. First, the series must be shifted.
Then the rolling dataset can be created and the mean values calculated on each window of two
values. Here are the values in the first three rolling windows:

In [59]:
temps = pd.DataFrame(series.values)

In [60]:
shifted = temps.shift(1)

In [61]:
window = shifted.rolling(window=2)

In [62]:
means = window.mean()

In [63]:
dataframe = pd.concat([means, temps], axis=1)

In [64]:
dataframe.columns = ['mean(t-1,t)', 't+1']

In [65]:
print(dataframe.head(5))

  mean(t-1,t)   t+1
0         NaN  20.7
1        20.7  17.9
2        17.9  18.8
3        18.8  14.6
4        14.6  15.8


There are more statistics we can calculate and even different mathematical ways of calculating
the definition of the window. Below is another example that shows a window width of 3 and a
dataset comprised of more summary statistics, specifically the minimum, mean, and maximum
value in the window. You can see in the code that we are explicitly specifying the sliding window width as a
named variable. This allows us to use it both in calculating the correct shift of the series and in
specifying the width of the window to the rolling() function. In this case, the window width of 3 means we must shift the series forward by 2 time steps. This makes the first two rows NaN. Next, we need to calculate the window statistics with 3
values per window. It takes 3 rows before we even have enough data from the series in the
window to start calculating statistics. The values in the first 5 windows are as follows:

In [71]:
temps = pd.DataFrame(series.values)
width = 3
shifted = temps.shift(width - 1)
window = shifted.rolling(window=width)
dataframe = pd.concat([window.min(), window.mean(), window.max(), temps], axis=1)
dataframe.columns = ['min', 'mean', 'max', 't+1']

In [72]:
dataframe.head()

Unnamed: 0,min,mean,max,t+1
0,,,,20.7
1,,,,17.9
2,20.7,20.7,20.7,18.8
3,17.9,17.9,17.9,14.6
4,18.8,18.8,18.8,15.8


**Expanding Window Statistics**

Another type of window that may be useful includes all previous data in the series. This is
called an expanding window and can help with keeping track of the bounds of observable data.
Like the ```rolling() function``` on DataFrame, Pandas provides an ```expanding()``` function that
collects sets of all prior values for each time step.

These lists of prior numbers can be summarized and included as new features.

In [75]:
temps = pd.DataFrame(series.values)
window = temps.expanding()

In [77]:
window.mean()

Unnamed: 0,0
0,20.7
1,17.9
2,18.8
3,14.6
4,15.8
5,15.8
6,15.8
7,17.4
8,21.8
9,20


In [74]:
dataframe = pd.concat([window.min(), window.mean(), window.max(), temps.shift(-1)], axis=1)
dataframe.columns = ['min', 'mean', 'max', 't+1']
print(dataframe.head(5))

    min  mean   max   t+1
0  20.7  20.7  20.7  17.9
1  17.9  17.9  17.9  18.8
2  18.8  18.8  18.8  14.6
3  14.6  14.6  14.6  15.8
4  15.8  15.8  15.8  15.8


> these things have error in them, correct please

##### Chapter 6: Data Visualization


Time series lends itself naturally to visualization. Line plots of observations over time are
popular, but there is a suite of other plots that you can use to learn more about your problem.
The more you learn about your data, the more likely you are to develop a better forecasting
model.

Specifically, after completing this tutorial, you will know:
- How to explore the temporal structure of time series with line plots, lag plots, and
autocorrelation plots.
- How to understand the distribution of observations using histograms and density plots.
- How to tease out the change in distribution over intervals using box and whisker plots
and heat map plots.

In this tutorial, we will take a look at 6
difierent types of visualizations that you can use on your own time series data. They are:

1. Line Plots.
2. Histograms and Density Plots.
3. Box and Whisker Plots.
4. Heat Maps.
5. Lag Plots or Scatter Plots.
6. Autocorrelation Plots.

In [145]:
series = pd.read_csv('./daily-minimum-temperatures-in-me.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

In [146]:
type(series)

pandas.core.series.Series

In [148]:
series.head()

Date
1981-01-01    20.7
1981-02-01    17.9
1981-03-01    18.8
1981-04-01    14.6
1981-05-01    15.8
Name: min_temp, dtype: object

In [151]:
series.index

DatetimeIndex(['1981-01-01', '1981-02-01', '1981-03-01', '1981-04-01',
               '1981-05-01', '1981-06-01', '1981-07-01', '1981-08-01',
               '1981-09-01', '1981-10-01',
               ...
               '1990-12-22', '1990-12-23', '1990-12-24', '1990-12-25',
               '1990-12-26', '1990-12-27', '1990-12-28', '1990-12-29',
               '1990-12-30', '1990-12-31'],
              dtype='datetime64[ns]', name='Date', length=3650, freq=None)