# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources (README.md file)
- Happy learning!

In [None]:
import numpy as np 
import pandas as pd

# Challenge 1 - Loading and Evaluating The Data

In this lab, we will look at a dataset of sensor data from a cellular phone. The phone was carried in the subject's pocket for a few minutes while they walked around.

To load the data, run the code below.

In [None]:
# Run this code:

sensor = pd.read_csv('../sub_1.csv')
sensor.drop(columns=['Unnamed: 0'], inplace=True)

Examine the data using the `head` function.

In [None]:
# Your code here:
sensor

Check whether there is any missing data. If there is any missing data, remove the rows containing missing data.

In [None]:
# Your code here:
sensor_2 = sensor.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
sensor_2.head()

How many rows and columns are in our data?

In [None]:
# Your code here:
sensor_2.shape[0] # none are dropped all are filled 

In [None]:
'''
L.S. Good!
'''

pd.DatetimeIndex(['2018-01-01', '2018-01-03', '2018-01-05'], freq='infer')

To perform time series analysis on the data, we must change the index from a range index to a time series index. In the cell below, create a time series index using the `pd.date_range` function. Create a time series index starting at 1/1/2018 00:00:00 and ending at 1/1/2018 00:29:10. The number of periods is equal to the number of rows in `sensor`. The frequency should be set to `infer`.

In [None]:
# Your code here:
'''
L.S. Correct!
'''

date = pd.date_range(start='1/1/2018 00:00:00', end='1/1/2018 00:29:10', periods=sensor_2.shape[0])#, freq='infer')
date

Assign the time series index to the dataframe's index.

In [None]:
# Your code here:
sensor_2.index = pd.to_datetime(date)
sensor_2.index = pd.DatetimeIndex(sensor_2.index, freq='infer')
sensor_2

Our next step is to decompose the time series and evaluate the patterns in the data. Load the `statsmodels.api` submodule and plot the decomposed plot of `userAcceleration.x`. Set `freq=60` in the `seasonal_decompose` function. Your graph should look like the one below.

![time series decomposition](../images/tsa_decompose.png)

In [None]:
# Your code here:
import statsmodels.api as sm
res = sm.tsa.seasonal_decompose(sensor_2["userAcceleration.x"], freq = 60)
res.plot()

Plot the decomposed time series of `rotationRate.x` also with a frequency of 60.

# Challenge 2 - Modelling the Data

To model our data, we should look at a few assumptions. First, let's plot the `lag_plot` to detect any autocorrelation. Do this for `userAcceleration.x`

In [None]:
'''
L.S. Well done!
'''

# Your code here:
from pandas.plotting import lag_plot
lag_plot(sensor_2["userAcceleration.x"])

Create a lag plot for `rotationRate.x`

In [None]:
# Your code here:
lag_plot(sensor_2["rotationRate.x"])

What are your conclusions from both visualizations?

In [None]:
# Your conclusions here:
'''
The lagplot is a kind of this sendbased regression line 
If the dots of the lagplot are in one fluent line, then you might state that:
T is strongly related to(T-1), in other words, you can predict T with a big schange qiet well with the data of T-1

This is not the case in both graphs, in other wordt, you can not relay a lot on de data of a seccond ago 
to predict the next seccond. 

'''


The next step will be to test both variables for stationarity. Perform the Augmented Dickey Fuller test on both variables below.

In [None]:
# Your code here:
''' If there is learning, there is stationary '''
# H0: The data is not stationary
# H1: The data is stationary
# P is low, H0 must go (recet the null-hypothesis) if its lower then 0.05

from statsmodels.tsa.stattools import adfuller
p1 = adfuller(sensor_2['rotationRate.x'])[1]
p2 = adfuller(sensor_2['rotationRate.y'])[1]

'''
L.S. Good!
'''

def hypothese_check(p):
    if p < 0.05:
        output = "P is low, H0 must go"
    else: 
        output= "H0 is valid"
    return p, output
    
print('rotationRate.x', hypothese_check(p1), 'rotationRate.y', hypothese_check(p2)) 

What are your conclusions from this test?

In [None]:
# Your conclusions here:
'''H0 is in both cases not valid, so the data might be stationary. 
In other words, there might be a learning proces '''

Finally, we'll create an ARMA model for `userAcceleration.x`. Load the `ARMA` function from `statsmodels`. The order of the model is (2, 1). Split the data to train and test. Use the last 10 observations as the test set and all other observations as the training set. 

In [None]:
# Your code here:
from statsmodels.tsa.arima_model import ARMA

'''
L.S. Good!
'''

train, test = sensor_2["userAcceleration.x"][:-10], sensor_2["userAcceleration.x"][-10:]
model = ARMA(sensor_2["userAcceleration.x"], order=(2, 1))
model_fit = model.fit(disp=False)

predictions = model_fit.predict(len(sensor_2["userAcceleration.x"])-10, len(sensor_2["userAcceleration.x"])-1)
to_compare = pd.DataFrame({'observed':sensor_2["userAcceleration.x"][-10:], 'predicted':predictions})
to_compare

To compare our predictions with the observed data, we can compute the RMSE (Root Mean Squared Error) from the submodule `statsmodels.tools.eval_measures`. You can read more about this function [here](https://www.statsmodels.org/dev/generated/statsmodels.tools.eval_measures.rmse.html). Compute the RMSE for the last 10 rows of the data by comparing the observed and predicted data for the `userAcceleration.x` column.

In [None]:
# Your code here:
from statsmodels.tools.eval_measures import rmse

rmse(to_compare.observed, to_compare.predicted)#, axis=0)

In [None]:
maximum = np.max(to_compare.observed)
minimum= np.min(to_compare.observed)
print(maximum, minimum)

In [None]:
# looking at teh range of my date, we can see that the 0.09 is a qiet big number. 
# If the range was 1000 then the predections would be very good 
# but now the predictions are not so accurate 