# **Waze Churn Analysis Project - Hypothesis Testing**

## Research Question
Is there a statistically significant difference in the average amount of rides between iPhone and Android users?

## **Imports and Data Loading**




In [1]:
# Import any relevant packages or libraries
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
# Load dataset into dataframe
df = pd.read_csv('waze_dataset.csv')

## **Data Exploration**

Use descriptive statistics to get overview of the data and pointers to answer the research question.

In [8]:
# Get overview of the data
df.head()

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android
1,1,retained,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,iPhone
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone
4,4,retained,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,Android


In [10]:
# Show the mean of drives for each device type
print('Mean drives - iPhone:', df[df['device']=='iPhone']['drives'].mean())
print('Mean drives - Android:', df[df['device']=='Android']['drives'].mean())

Mean drives - iPhone: 67.85907775020678
Mean drives - Android: 66.23183780739629


Based on the means shown above, drivers who use iPhone devices to access Waze have a slightly higher number of drives on average compared to drivers who use Android devices. However, this difference might be by chance that arises from sampling variance, rather than being a true difference in the number of drives. To verify whether the difference is statistically significant, we'll conduct a **hypothesis testing using two-sample t-test**.

*Note that we are using two-sample t-test because we are comparing samples of two independent populations (Android users vs. iPhone users) and the populations' standard deviation are unknown*

## **Hypothesis Testing**

### **1. State Null Hypothesis and Alternative Hypothesis**
$H_0$: There is no difference in average number of drives between drivers who use iPhone devices and drivers who use Android devices.

$H_A$:There is a difference in average number of drives between drivers who use iPhone devices and drivers who use Android devices. 

### **2. Determine Significance Level**
We'll use the most common significance level which is 5% (0.05).


### **3. Calculate p-value**

In [14]:
# Isolate the `drives` column for iPhone users.
iphone_drives = df[df['device']=='iPhone']['drives']

# Isolate the `drives` column for Android users.
android_drives = df[df['device']=='Android']['drives']

# Perform the t-test
statistic, pvalue = stats.ttest_ind(a=iphone_drives, b=android_drives, equal_var=False)
print('p-value:', pvalue)

p-value: 0.1433519726802059


### **4. Evaluate the Null Hypothesis**
Based on the result above, the p-value is 0.14, which is larger than the significance level of 0.05 (5%). Therefore, we fail to reject $H_0$. It can be concluded that the difference in the average number of drives between drivers who use iPhone and drivers who use Android is **not** statistically significant.

## **Business Insights**
1. On average, drivers who use iPhone devices have a similar number of drives as those who use Android devices.
2. Potential next step: explore other factors that possibly influence the variation in the number of drives, and then run additional hypothesis tests. 
3. Other factors may provide more data to investigate user churn, such as changes in user interface of the Waze app or changes in marketing.