In [200]:
import pandas as pd
import numpy as np
import scipy.stats as stats

### Data Processing

In [201]:
import pandas as pd

file_path = '"../raw-data/raw-data-processing.csv"'
df = pd.read_csv(file_path)

df = df.dropna()
#df = df.drop(columns = df.columns[0]) 
#data = df.to_numpy()

df

Unnamed: 0.1,Unnamed: 0,shaking,face-down
0,walking_hand,25,0
1,walking_bag,22,5
2,pocket,15,5
3,bag,12,0
4,passing_flipped,7,5
5,hold_phone_flat,2,12
6,selfie,9,7
7,dropping_phone,11,3


In [202]:
N = 30
df['diff'] = df[:]['shaking'] - df[:]['face-down']
df

Unnamed: 0.1,Unnamed: 0,shaking,face-down,diff
0,walking_hand,25,0,25
1,walking_bag,22,5,17
2,pocket,15,5,10
3,bag,12,0,12
4,passing_flipped,7,5,2
5,hold_phone_flat,2,12,-10
6,selfie,9,7,2
7,dropping_phone,11,3,8


### 1. t-test

In [203]:
N = 30
df['diff_prop'] = df[:]['shaking']/N - df[:]['face-down']/N
df

Unnamed: 0.1,Unnamed: 0,shaking,face-down,diff,diff_prop
0,walking_hand,25,0,25,0.833333
1,walking_bag,22,5,17,0.566667
2,pocket,15,5,10,0.333333
3,bag,12,0,12,0.4
4,passing_flipped,7,5,2,0.066667
5,hold_phone_flat,2,12,-10,-0.333333
6,selfie,9,7,2,0.066667
7,dropping_phone,11,3,8,0.266667


##### t-test for all the scenarios (assuming each scenario has similar daily occurence)

In [204]:
mean_diff = df['diff_prop'].mean()
s_std_diff = df['diff_prop'].std()

print(f'The mean of difference is {mean_diff} and the sample standard deviation is {s_std_diff}')

The mean of difference is 0.27499999999999997 and the sample standard deviation is 0.3531041484923955


In [205]:
t_statistics = (mean_diff - 0)/(s_std_diff/np.sqrt(8))
critical_t = 1.895 #degrees of freedom = 8 - 1 = 7
if t_statistics > critical_t:
    print(f'T value = {t_statistics}, Reject the Null Hypothesis (H0): There is no difference between the shaking and flipping gestures')
else:
    print(f'T value= {t_statistics}, Failed to reject the Null Hypothesis(H0): There is no difference between the shaking and flipping gestures')


T value = 2.202798983320224, Reject the Null Hypothesis (H0): There is no difference between the shaking and flipping gestures


##### **Reasoning for this test and conlusion:**
As above, we tested whether the proportion of difference of two gestures being accidentally activated (the proportion of shaking gesture being activated - the proportion of flipping down gesture being activated) is statistically different from 0 across the 8 scenarios. The t statistics is larger than the critical value at 95\% confidence interval (This is a one sided test if we are only interested in which gesture is more accident-prone). 

The t statics is calculated as $$\frac{\text{mean}}{\frac{\text{std}}{\sqrt{n}}}$$ where the degrees of freedom is 8 - 1 = 7 since this is one-sample t-test (one sample refers to the sample difference between the number of times that shaking and flipping gestures being activated)

The critical value is found on the t value table where the confidence interval is 95\%

It can be observed that our t-value is greater than the critical t-value and hence we reject our null hypothesis. Moreover, it can be seen that the p-value (P (T > our t-value)) has a low value (<< 0.1) and hence we can be quite sure that the null hypothesis should be rejected. Up to this point we can say the robustness of the 2 gestures is different and we need to determine which is more robust to accidental activation. Since we have procceded to do the right-tail test, we can conclude that the **flip down gesture is more robust to actidental activation** and with this we finish our proof.


##### **Limitation:**
This test assumes that the distribution of the proportion of difference is normally distributed.


### 2. Rank Sum Test

In [206]:
shaking = list(df['shaking'])
face_down = list(df['face-down'])
p_value = stats.mannwhitneyu(shaking, face_down).pvalue
if p_value < 0.05:
    print(f'P value = {p_value}, Reject the Null Hypothesis (H0): There is no difference between the shaking and flipping gestures')
else:
    print(f'P value= {p_value}, Failed to reject the Null Hypothesis(H0): There is no difference between the shaking and flipping gestures')


P value = 0.023227980334535776, Reject the Null Hypothesis (H0): There is no difference between the shaking and flipping gestures


##### **Reasoning for this test and conlusion:**
The Rank Sum Test is a non-parametric test used to compare two independent groups or samples to determine if there is a significant difference between them. The Rank Sum Test does not require the distribution of the samples to be normal distribution. It tests whether one group tends to have higher values than the other by ranking the data of each group.

Here, we directly import the rank sum test from scipy package. The p value obtained from the rank sum test is less than 0.5. This indicates that we can say that the number of times that shaking gesture being activated is statistically and significantlly different from that of the flipping down gesture with 95\% confidence. 

Moreover, because this is a right-sided test here, we can say that shaking is more accidental prone to the face-down gesture.
\

##### **Reasoning for this test and conlusion:**
The Rank Sum Test compares the data of each group by ranking (it is the ranks of the times of activation that really matters here) which might lead to potential information loss.