# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources (README.md file)
- Happy learning!

In [47]:
# Import numpy and pandas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# Challenge 1 - Loading and Evaluating The Data

In this lab, we will look at a dataset of sensor data from a cellular phone. The phone was carried in the subject's pocket for a few minutes while they walked around.

To load the data, run the code below.

In [48]:
# Run this code:

sensor = pd.read_csv('../sub_1.csv')
sensor.drop(columns=['Unnamed: 0'], inplace=True)

Examine the data using the `head` function.

In [49]:
sensor.head()

Unnamed: 0,attitude.roll,attitude.pitch,attitude.yaw,gravity.x,gravity.y,gravity.z,rotationRate.x,rotationRate.y,rotationRate.z,userAcceleration.x,userAcceleration.y,userAcceleration.z
0,1.528132,-0.733896,0.696372,0.741895,0.669768,-0.031672,0.316738,0.77818,1.082764,0.294894,-0.184493,0.377542
1,1.527992,-0.716987,0.677762,0.753099,0.657116,-0.032255,0.842032,0.424446,0.643574,0.219405,0.035846,0.114866
2,1.527765,-0.706999,0.670951,0.759611,0.649555,-0.032707,-0.138143,-0.040741,0.343563,0.010714,0.134701,-0.167808
3,1.516768,-0.704678,0.675735,0.760709,0.647788,-0.04114,-0.025005,-1.048717,0.03586,-0.008389,0.136788,0.094958
4,1.493941,-0.703918,0.672994,0.760062,0.64721,-0.05853,0.114253,-0.91289,0.047341,0.199441,0.353996,-0.044299


Check whether there is any missing data. If there is any missing data, remove the rows containing missing data.

In [50]:
# Your code here:
sensor.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1751 entries, 0 to 1750
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   attitude.roll       1751 non-null   float64
 1   attitude.pitch      1751 non-null   float64
 2   attitude.yaw        1751 non-null   float64
 3   gravity.x           1751 non-null   float64
 4   gravity.y           1751 non-null   float64
 5   gravity.z           1751 non-null   float64
 6   rotationRate.x      1751 non-null   float64
 7   rotationRate.y      1751 non-null   float64
 8   rotationRate.z      1751 non-null   float64
 9   userAcceleration.x  1751 non-null   float64
 10  userAcceleration.y  1751 non-null   float64
 11  userAcceleration.z  1751 non-null   float64
dtypes: float64(12)
memory usage: 164.3 KB


In [51]:
sensor.isnull().head()

Unnamed: 0,attitude.roll,attitude.pitch,attitude.yaw,gravity.x,gravity.y,gravity.z,rotationRate.x,rotationRate.y,rotationRate.z,userAcceleration.x,userAcceleration.y,userAcceleration.z
0,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False


In [52]:
# Number of elements:
sensor.size

21012

How many rows and columns are in our data?

In [53]:
# Number of rows:
len(sensor)

1751

In [54]:
# Number of columns:
len(sensor.columns)

12

In [55]:
# Your code here:
sensor.shape

(1751, 12)

In [56]:
# Remove the rows containing missing data:
sensor_remove = sensor.dropna(how='all')

In [57]:
sensor_remove.head()

Unnamed: 0,attitude.roll,attitude.pitch,attitude.yaw,gravity.x,gravity.y,gravity.z,rotationRate.x,rotationRate.y,rotationRate.z,userAcceleration.x,userAcceleration.y,userAcceleration.z
0,1.528132,-0.733896,0.696372,0.741895,0.669768,-0.031672,0.316738,0.77818,1.082764,0.294894,-0.184493,0.377542
1,1.527992,-0.716987,0.677762,0.753099,0.657116,-0.032255,0.842032,0.424446,0.643574,0.219405,0.035846,0.114866
2,1.527765,-0.706999,0.670951,0.759611,0.649555,-0.032707,-0.138143,-0.040741,0.343563,0.010714,0.134701,-0.167808
3,1.516768,-0.704678,0.675735,0.760709,0.647788,-0.04114,-0.025005,-1.048717,0.03586,-0.008389,0.136788,0.094958
4,1.493941,-0.703918,0.672994,0.760062,0.64721,-0.05853,0.114253,-0.91289,0.047341,0.199441,0.353996,-0.044299


Assign the time series index to the dataframe's index.

In [58]:
# Your code here:
sensor_remove.set_index

<bound method DataFrame.set_index of       attitude.roll  attitude.pitch  attitude.yaw  gravity.x  gravity.y  \
0          1.528132       -0.733896      0.696372   0.741895   0.669768   
1          1.527992       -0.716987      0.677762   0.753099   0.657116   
2          1.527765       -0.706999      0.670951   0.759611   0.649555   
3          1.516768       -0.704678      0.675735   0.760709   0.647788   
4          1.493941       -0.703918      0.672994   0.760062   0.647210   
...             ...             ...           ...        ...        ...   
1746       1.797120       -0.562324      2.445889   0.824443   0.533154   
1747       1.814297       -0.569719      2.449655   0.817212   0.539396   
1748       1.830821       -0.578367      2.447967   0.809207   0.546658   
1749       1.849557       -0.586962      2.439458   0.800485   0.553834   
1750       1.869375       -0.596783      2.433775   0.790551   0.561984   

      gravity.z  rotationRate.x  rotationRate.y  rotationRate.

Our next step is to decompose the time series and evaluate the patterns in the data. Load the `statsmodels.api` submodule and plot the decomposed plot of `userAcceleration.x`. Set `freq=60` in the `seasonal_decompose` function. Your graph should look like the one below.

![time series decomposition](../images/tsa_decompose.png)

In [59]:
from statsmodels.tsa.seasonal import seasonal_decompose

In [60]:
res = seasonal_decompose(sensor_remove.userAcceleration.x, model="additive")

AttributeError: 'DataFrame' object has no attribute 'userAcceleration'

In [61]:
plt.plot(df.value)

NameError: name 'df' is not defined

In [62]:
res.plot();

NameError: name 'res' is not defined

Plot the decomposed time series of `rotationRate.x` also with a frequency of 60.

In [65]:
plt.figure(figsize=(60,5))
plt.plot(sensor_remove.rotationRate.x)
plt.plot(res.seasonal, c="g")
plt.plot(res.trend, c='r')
plt.title("rotationRate.x");

AttributeError: 'DataFrame' object has no attribute 'rotationRate'

<Figure size 4320x360 with 0 Axes>

# Challenge 2 - Modelling the Data

To model our data, we should look at a few assumptions. First, let's plot the `lag_plot` to detect any autocorrelation. Do this for `userAcceleration.x`

In [67]:
# Your code here:

plt.stem(sensor_remove.userAcceleration.x,sensor_remove.rotationRate.x)
plt.axvline(0,c="k")

AttributeError: 'DataFrame' object has no attribute 'userAcceleration'

Create a lag plot for `rotationRate.x`

In [10]:
# Your code here:



What are your conclusions from both visualizations?

In [11]:
# Your conclusions here:



The next step will be to test both variables for stationarity. Perform the Augmented Dickey Fuller test on both variables below.

In [12]:
# Your code here:



What are your conclusions from this test?

In [13]:
# Your conclusions here:



Finally, we'll create an ARMA model for `userAcceleration.x`. Load the `ARMA` function from `statsmodels`. The order of the model is (2, 1). Split the data to train and test. Use the last 10 observations as the test set and all other observations as the training set. 

In [15]:
# Your code here:



To compare our predictions with the observed data, we can compute the RMSE (Root Mean Squared Error) from the submodule `statsmodels.tools.eval_measures`. You can read more about this function [here](https://www.statsmodels.org/dev/generated/statsmodels.tools.eval_measures.rmse.html). Compute the RMSE for the last 10 rows of the data by comparing the observed and predicted data for the `userAcceleration.x` column.

In [16]:
# Your code here:

