# **Waze Project - Data exploration and hypothesis testing**


**Problem:** Conduct a two-sample hypothesis test (t-test) to analyze the difference in the mean amount of rides between iPhone users and Android users.

In [1]:
# Import any relevant packages or libraries
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

In [3]:
# 1. Create `map_dictionary`

map_dictionary = {'iPhone': 1, 'Android': 2}

# 2. Create new `device_type` column
# 3. Map the new column to the dictionary

df['device_type'] = df['device'].map(map_dictionary)

In [4]:
df.groupby('device')['drives'].mean().reset_index().rename(columns={'drives':'mean amount of rides'})

Unnamed: 0,device,mean amount of rides
0,Android,66.231838
1,iPhone,67.859078


Based on the averages shown, it appears that drivers who use an iPhone device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, we will conduct a hypothesis test.

$H_0$: There is no statistically significant difference in mean amount of rides between device type.

$H_1$: There is a statistically significant difference in mean amount of rides between device type.

In [5]:
# 1. Isolate the `drives` column for iPhone users.
iphone = df[df['device_type']==1]['drives']

# 2. Isolate the `drives` column for Android users.
android = df[df['device_type']==2]['drives']

# 3. Perform the t-test
stats.ttest_ind(a=iphone, b=android, equal_var = False)

Ttest_indResult(statistic=1.463523206885235, pvalue=0.143351972680206)

- The p-value is greater than the significance level, therefore we fail to reject the null hypothesis. There is no statistically significant difference in average amount of rides between iPhone and Android users.
- The average amount of rides is similar between iPhone and Android users. 
- Other factors that influence the user behaviour should be explore with further analysis.