[credit: The Data Analysis Workshop](https://smile.amazon.com/Data-Analysis-Workshop-state-art/dp/1839211385/ref=sr_1_1?dchild=1&keywords=The+Data+Analysis+Workshop+Solve+business+problems+with+state-of-the-art+data+analysis+models&qid=1612045402&sr=8-1)

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

In [None]:
hourly_data = pd.read_csv('../input/bike-sharing-dataset/hour.csv')

In [None]:
preprocessed_data = hourly_data.copy()
seasons_mapping = {1: 'winter', 2: 'spring', 3: 'summer', 4: 'fall'}
preprocessed_data['season'] = preprocessed_data['season'].apply(lambda x: seasons_mapping[x])
yr_mapping = {0: 2011, 1: 2012}
preprocessed_data['yr'] = preprocessed_data['yr'].apply(lambda x: yr_mapping[x])
weekday_mapping = {0: 'Sunday', 1: 'Monday', 2: 'Tuesday', \
3: 'Wednesday', 4: 'Thursday', 5: 'Friday', 6: 'Saturday'}
preprocessed_data['weekday'] = preprocessed_data['weekday'].apply(lambda x: weekday_mapping[x])
weather_mapping = {1: 'clear', 2: 'cloudy', 3: 'light_rain_snow', 4: 'heavy_rain_snow'}
preprocessed_data['weathersit'] = preprocessed_data['weathersit'].apply(lambda x: weather_mapping[x])
preprocessed_data['hum'] = preprocessed_data['hum'] * 100
preprocessed_data['windspeed'] = preprocessed_data['windspeed'] * 67

we perform a time series analysis on the rides columns (registered and casual) in the bike sharing dataset.  
a time series is *weakly stationary*, that is, its mean, standard deviation, and covariance are stationary with respect to time.
there are two different techniques for identifying time series stationarity: rolling statistics and augmented Dickey-Fuller stationarity test (we consider both of them).

Let's define function for plotting rolling statistics and ADF test for time series:

In [None]:
from statsmodels.tsa.stattools import adfuller
def test_stationarity(ts, window=10, **kwargs):
    # create dataframe for plotting
    plot_data = pd.DataFrame(ts)
    plot_data['rolling_mean'] = ts.rolling(window).mean()
    plot_data['rolling_std'] = ts.rolling(window).std()
    
    # compute p-value of Dickey-Fuller test
    p_val = adfuller(ts)[1]
    ax = plot_data.plot(**kwargs)
    ax.set_title(f"Dickey-Fuller p-value: {p_val:.3f}")

extract the daily registered and casual rides from our preprocessed data:

In [None]:
# get daily rides
daily_rides = preprocessed_data[["dteday", "registered", "casual"]]
daily_rides = daily_rides.groupby("dteday").sum()

# convert index to DateTime object
daily_rides.index = pd.to_datetime(daily_rides.index)

apply the function to registered and casual data:

In [None]:
#registered
test_stationarity(daily_rides.registered, figsize=(10, 8))

In [None]:
#casual
test_stationarity(daily_rides.casual, figsize=(10, 8))

we can see that neither the moving average nor standard deviations are stationary.  
Furthermore, the Dickey-Fuller test returns values of 0.355 and 0.372 for the registered and casual columns, respectively. This is strong evidence that the time series is not stationary.  
  
we need to process them in order to obtain a stationary one.
A common way to detrend a time series and make it stationary is to *subtract either its rolling mean or its last value*, or to *decompose it into a component that will contain its
trend, seasonality, and residual components*. 

# Let's first check whether the time series is stationary by *subtracting their rolling means and last values*:

subtract rolling mean:

In [None]:
registered = daily_rides["registered"]
registered_ma = registered.rolling(10).mean()
registered_ma_diff = registered - registered_ma
registered_ma_diff.dropna(inplace=True)

casual = daily_rides["casual"]
casual_ma = casual.rolling(10).mean()
casual_ma_diff = casual - casual_ma
casual_ma_diff.dropna(inplace=True)

In [None]:
#registered
test_stationarity(registered_ma_diff, figsize=(10, 8))

In [None]:
#casual
test_stationarity(casual_ma_diff, figsize=(10, 8))

subtract last value:

In [None]:
registered = daily_rides["registered"]
registered_diff = registered - registered.shift()
registered_diff.dropna(inplace=True)

casual = daily_rides["casual"]
casual_diff = casual - casual.shift()
casual_diff.dropna(inplace=True)

In [None]:
#registered
plt.figure()
test_stationarity(registered_diff, figsize=(10, 8))

In [None]:
#casual
plt.figure()
test_stationarity(casual_diff, figsize=(10, 8))

# Mpw let's also check whether the time series is stationary using the previously mentioned technique, that is, *time series decomposition* in Trend, Seasonality, and Residual Components.

Use the *statsmodel.tsa.seasonal. seasonal_decompose()* method to decompose the registered and casual rides:

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
registered_decomposition = seasonal_decompose(daily_rides["registered"])
casual_decomposition = seasonal_decompose(daily_rides["casual"])

To access each of these three signals, use .trend, .seasonal, and .resid variables.  
obtain visual results from the generated decompositions by calling the .plot() method:

In [None]:
#registered plot
registered_plot = registered_decomposition.plot()
registered_plot.set_size_inches(10, 8)

In [None]:
#casual plot
casual_plot = casual_decomposition.plot()
casual_plot.set_size_inches(10, 8)

Test the residuals obtained for stationarity:

In [None]:
#registered
plt.figure()
test_stationarity(registered_decomposition.resid.dropna(), figsize=(10, 8))

In [None]:
#casual
plt.figure()
test_stationarity(casual_decomposition.resid.dropna(), figsize=(10, 8))