# Definition of Time Series

A time series is a series of data points indexed (or listed/graphed) in time order.

## Stationarity

For a series to be considered as **stationary**, 3 conditions must be met:

 1. The mean of the series should not be a function of time,<br>
 ![mean](http://www.seanabu.com/img/Mean_nonstationary.png)
 2. The variance of the series should not be a function of time (series must be [homoscedastic](https://en.wikipedia.org/wiki/Homoscedasticity)),
 ![variance](http://www.seanabu.com/img/Var_nonstationary.png)
 3. The covariance of the terms should not be a function of time.
 ![covariance](http://www.seanabu.com/img/Cov_nonstationary.png)
 
Stationarity conditions are the main assumptions of most time series analysis models. However, most real-world data is time dependent and hence needs certain conversions and processing before being fed into TSA algorithms.

There are several ways to identify whether the series is stationary. Some of these are:
- Visualizing the data (to identify strict stationarity)
- Applying [Augmented Dickey-Fuller Test](https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test)(ADF)

Below is a template code to visualize TS and test for stationarity:


In [18]:
from statsmodels.tsa.stattools import adfuller
def test_stationarity(ts):

    #Determine rolling statistics
    rolmean = pd.rolling_mean(ts, window=5)
    rolstd = pd.rolling_std(ts, window=5)

    #Plot rolling statistics:
    fig = plt.figure(figsize=(12, 8))
    orig = plt.plot(timeseries, color='blue',label='Original')
    mean = plt.plot(rolmean, color='red', label='Rolling Mean')
    std = plt.plot(rolstd, color='black', label = 'Rolling Std')
    plt.legend(loc='best')
    plt.title('Rolling Mean & Standard Deviation')
    plt.show()
    
    #Perform Dickey-Fuller test:
    print('Results of Dickey-Fuller Test:')
    dftest = adfuller(timeseries, autolag='AIC')
    dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
    for key,value in dftest[4].items():
        dfoutput['Critical Value (%s)'%key] = value
    print(dfoutput) 

Let's apply this test to a sample dataset, ["Atmospheric CO2 from Continuous Air Samples at Mauna Loa Observatory, Hawaii, U.S.A.,"][1] which collected CO2 samples from March 1958 to December 2001.

[1]: http://cdiac.ess-dive.lbl.gov/trends/co2/sio-keel-flask/sio-keel-flaskmlo_c.html

In [35]:
data=sm.datasets.co2.load_pandas()

TypeError: __new__() got an unexpected keyword argument 'format'