## Hypothesis Testing:

### **Q1 — Commuter pattern**

#### Do average hourly rides (remember each row in the data is a ride) differ between working days and non-working days?

<br>

● What stakeholder would care about this test?

● How should you sample the data?

● What are the hypotheses, what alpha would you choose and why, how
would you explain the result (be sure to include confidence intervals)


In [2]:
# imports
import pandas as pd
from scipy import stats
from scipy.stats import t

In [11]:
path = '/Users/Marcy_Student/Desktop/Marcy-Modules/Mod4/M4-Final-Project/data/cleaned_hour.csv'
df = pd.read_csv(path)
df['dteday']= pd.to_datetime(df['dteday'])
df.keys()

Index(['Unnamed: 0', 'instant', 'dteday', 'season', 'yr', 'mnth', 'hr',
       'holiday', 'weekday', 'workingday', 'weathersit', 'temp', 'atemp',
       'hum', 'windspeed', 'casual', 'registered', 'cnt', 'season_names',
       'month_names', 'day_name'],
      dtype='object')

***
### Create groups to test:

In [97]:
# splitting into groups: working vs non-working
workingdays = df[df['workingday']==1]
nonworkingdays = df[df['workingday']==0]
# where 1 = working day, and 0 = non working day


#avg_working = workingdays.groupby('hr')['cnt'].sum().reset_index()
#avg_nonworking = nonworkingdays.groupby('hr')['cnt'].sum().reset_index()

***
### T-Test:

**Null Hypothesis**: There is no significant difference between hourly rides for working days vs. non-working days.

**Alt. Hypothesis**: There is a statistically significant difference between hourly rides for working days vs. non-working days.

In [96]:
#statistical scores
commuter_pattern = stats.ttest_ind(workingdays['cnt'], nonworkingdays['cnt'], equal_var=False, alternative='two-sided')
commuter_pattern.pvalue

np.float64(4.249478377549554e-05)

In [None]:
# standard error
working_mean = workingdays['cnt'].mean()
nonworking_mean = nonworkingdays['cnt'].mean()
working_n = len(workingdays)
nonworking_n = len(nonworkingdays)
working_s = workingdays['cnt'].std()
nonworking_s = nonworkingdays['cnt'].std()

se = (((working_s**2)/working_n)+((nonworking_s**2)/nonworking_n))**0.05

***

### **Q2 — Multi-group comparison**

#### Do mean hourly rides differ across categories of multi-level categorical variables such as season or weather condition (choose one)? If you find a difference, describe the appropriate post-hoc (after the test what other tests would you do) approach and what it would tell stakeholders.

<br>

● What stakeholder would care about this test? Does this change what multi-level
categorical variable you choose?

● How should you sample the data?

● What are the hypotheses, what alpha would you choose and why, how would you
explain the result (be sure to include confidence intervals)

***
### Creating groups:

In [141]:
# our groups
clear = df[df['weathersit']==1]
mist = df[df['weathersit']==2]
lightrain = df[df['weathersit']==3]
heavyrain = df[df['weathersit']==4]

***
### Our ANOVA Test:

In [142]:
# imports
from scipy.stats import f_oneway
from statsmodels.stats.multicomp import pairwise_tukeyhsd

In [143]:
# ANOVA
f_stat1, p_stat1 = f_oneway(
    clear['cnt'],
    mist['cnt'],
    lightrain['cnt'],
    heavyrain['cnt']
)
print(f'F-Statistic: {f_stat1}, P-Value: {p_stat1}')

F-Statistic: 127.17386949967266, P-Value: 1.7347820521802623e-81


In [144]:
# Tukey's Test
tukey_result = pairwise_tukeyhsd(endog=df['cnt'],
                                 groups=df['weathersit'],
                                 alpha=0.05)
print(tukey_result)

  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
group1 group2  meandiff p-adj    lower    upper   reject
--------------------------------------------------------
     1      2  -29.7038    0.0  -37.7909 -21.6166   True
     1      3    -93.29    0.0 -106.2676 -80.3123   True
     1      4 -130.5359 0.5886 -396.7534 135.6815  False
     2      3  -63.5862    0.0  -77.6067 -49.5658   True
     2      4 -100.8322 0.7649 -367.1025 165.4381  False
     3      4  -37.2459 0.9841 -303.7096 229.2177  False
--------------------------------------------------------


In [145]:
# checking means for contextual clarification
clear.describe(), mist.describe(), lightrain.describe(), heavyrain.describe()

(         Unnamed: 0       instant                         dteday  \
 count  11413.000000  11413.000000                          11413   
 mean    8695.750723   8696.750723  2012-01-02 10:23:55.205467392   
 min        0.000000      1.000000            2011-01-01 00:00:00   
 25%     4493.000000   4494.000000            2011-07-11 00:00:00   
 50%     8714.000000   8715.000000            2012-01-03 00:00:00   
 75%    12955.000000  12956.000000            2012-06-29 00:00:00   
 max    17378.000000  17379.000000            2012-12-31 00:00:00   
 std     4910.639765   4910.639765                            NaN   
 
              season            yr          mnth            hr       holiday  \
 count  11413.000000  11413.000000  11413.000000  11413.000000  11413.000000   
 mean       2.511084      0.505389      6.511872     11.731709      0.029878   
 min        1.000000      0.000000      1.000000      0.000000      0.000000   
 25%        2.000000      0.000000      4.000000      6.0

***

## Part C: Simulated A/B Test: Objective

This section is more guided than Part A & Part B. Here are some
highlights:

● PM’s objective: Does ridership increase on working days
during early evening after launching a small app feature
change (it is not necessary to know what was change to
conduct the analysis)


- Primary Metric (fixed): Average hourly rides (cnt) during 17:00–19:00 on working
days when weather is good.


Eligibility (fixed) — keep a row only if all are true:

● workingday == 1

● hr ∈ {17, 18, 19}

● weathersit ∈ {1, 2} and hum ≤ 0.70


***

In [111]:
filtered_df = df[(df['workingday'] ==1) 
                 & (df['hr'].isin([17,18,19])) 
                 & (df['weathersit'].isin([1,2]))
                 & (df['hum']<=0.70)]

filtered_df

Unnamed: 0.1,Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,...,temp,atemp,hum,windspeed,casual,registered,cnt,season_names,month_names,day_name
62,62,63,2011-01-03,1,0,1,17,0,1,1,...,0.24,0.2273,0.30,0.2239,11,146,157,Spring,January,Monday
63,63,64,2011-01-03,1,0,1,18,0,1,1,...,0.24,0.2576,0.32,0.1045,9,148,157,Spring,January,Monday
64,64,65,2011-01-03,1,0,1,19,0,1,1,...,0.20,0.2576,0.47,0.0000,8,102,110,Spring,January,Monday
85,85,86,2011-01-04,1,0,1,17,0,2,1,...,0.28,0.2727,0.48,0.2239,10,202,212,Spring,January,Tuesday
86,86,87,2011-01-04,1,0,1,18,0,2,1,...,0.26,0.2576,0.48,0.1940,3,179,182,Spring,January,Tuesday
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17301,17301,17302,2012-12-28,1,1,12,18,0,5,1,...,0.24,0.2424,0.60,0.1343,16,197,213,Spring,December,Friday
17302,17302,17303,2012-12-28,1,1,12,19,0,5,1,...,0.24,0.2273,0.65,0.1940,15,113,128,Spring,December,Friday
17372,17372,17373,2012-12-31,1,1,12,17,0,1,1,...,0.26,0.2879,0.48,0.0896,14,150,164,Spring,December,Monday
17373,17373,17374,2012-12-31,1,1,12,18,0,1,1,...,0.26,0.2727,0.48,0.1343,10,112,122,Spring,December,Monday


Windows (fixed) — pretend launch is 2012-09-01

● Pre (Baseline): 2012-08-04 → 2012-08-31 (inclusive)

● Post (Feature On): 2012-09-01 → 2012-09-28 (inclusive)

In [120]:
pre_filtered_df = df[(df['dteday']>='2012-08-04') & (df['dteday']<='2012-08-31')]
post_filtered_df = df[(df['dteday']>='2012-09-01') & (df['dteday']<='2012-09-08')]

for x in pre_filtered_df['dteday'], post_filtered_df['dteday']:
    print(f'Min:{x.min()}, Max: {x.max()}')

Min:2012-08-04 00:00:00, Max: 2012-08-31 00:00:00
Min:2012-09-01 00:00:00, Max: 2012-09-08 00:00:00


Use the appropriate hypothesis test to compare if there is a difference in the
average total bike count between Group A (Pre) and Group B (Post).

● State the null and alternative hypotheses

● α = 0.05 (default; you may choose a different α if you justify the product risk).

● Report: test statistic, p-value, 95% CI for, and decision that is stakeholder
specific

***

### **Hypotheses:**
**Null:** There is no significant difference between Group A and Group B.

**Alternative:** There is a significant difference between Group A and Group B.

In [154]:
group_test = stats.ttest_ind(pre_filtered_df['cnt'], post_filtered_df['cnt'], equal_var=False, alternative='two-sided')
group_test.pvalue

np.float64(0.3422455391059873)

I decided to use an independent t-test since it is unlikely that the two groups are related in variances. This tells us that there is no statistically significant difference.

In [159]:
pre_filtered_df['cnt'].describe()

count    672.000000
mean     286.438988
std      227.057677
min        3.000000
25%       86.750000
50%      263.500000
75%      431.250000
max      941.000000
Name: cnt, dtype: float64

In [160]:
post_filtered_df['cnt'].describe()

count    192.000000
mean     268.973958
std      223.607955
min        4.000000
25%       72.250000
50%      217.000000
75%      451.000000
max      863.000000
Name: cnt, dtype: float64

With closer analysis, it is clear that there is a significantly lower count for the post group, Group B. I would continue to track the count (representing hours) and the mean rides overall, and per hour as well.