# Hypothesis Testing

Conduct a two-sample t-test to determine whether there is a statistically significant difference in the mean number of rides between iPhone and Android users. 

## Python Libraries

**pandas** – For data loading, cleaning, and transformation \
**scipy** - For statistical analysis

In [1]:
import pandas as pd
from scipy import stats

In [2]:
df = pd.read_csv(r'C:\Users\mqtth\Desktop\Projects\Waze_Churn_ML_Project\data\waze_dataset.csv')

**Binary Label Encoding**

In [3]:
# new dictionary to map phone types
map_dict = {'Android': 2, 'iPhone': 1}

df['device_type'] = df['device']

df['device_type'] = df['device_type'].map(map_dict)

df['device_type'].head()

0    2
1    1
2    2
3    1
4    2
Name: device_type, dtype: int64

In [4]:
df.groupby('device_type')['drives'].mean()

device_type
1    67.859078
2    66.231838
Name: drives, dtype: float64

Based on these values, drivers who use iPhones with Waze have a higher number of drives on average. This difference may be attributed to random sampling rather than being a true difference in the number of drives. We will have to conduct a hypothesis test to assess whether the difference is statistically significant. 

**Hypotheses:**

\begin{align*}
H_0 &: \text{There is no difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.} \\
H_1 &: \text{There is a difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.}
\end{align*}

In [5]:
# isolate drives column for iPhone users
iPhone = df[df['device_type'] == 1]['drives']

# isolate drives column for Android users
Android = df[df['device_type'] == 2]['drives']

# perform t-test
stats.ttest_ind(a=iPhone, b=Android, equal_var=False)

TtestResult(statistic=np.float64(1.463523206885235), pvalue=np.float64(0.14335197268020597), df=np.float64(11345.066049381952))

The p-value obtained from the t-test is 0.14, which is greater than the chosen significance level of 5%. We fail to reject the null hypothesis. There is not a statistically significant difference in the average number of drives between drivers who use iPhones compared tho those who use Android. 