# Churn Analysis - Waze App

## Step 03: Hypothesis Testing

This notebook demonstrates **data exploration, descriptive statistics and hypothesis testing** using Python. The analysis focuses on
comparing user behavior between iPhone and Android users in the Waze navigation app.

### The research question for this project: <br>
**"Do drivers who open the application using an iPhone have the same number of drives on average as drivers who use Android?"**.

### Imports & Data Loading

In [16]:
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

from scipy import stats

In [17]:
#Load dataset
df = pd.read_csv('waze_dataset.csv')

In [18]:
df.head(10)

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android
1,1,retained,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,iPhone
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone
4,4,retained,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,Android
5,5,retained,113,103,279.544437,2637,0,0,901.238699,439.101397,15,11,iPhone
6,6,retained,3,2,236.725314,360,185,18,5249.172828,726.577205,28,23,iPhone
7,7,retained,39,35,176.072845,2999,0,0,7892.052468,2466.981741,22,20,iPhone
8,8,retained,57,46,183.532018,424,0,26,2651.709764,1594.342984,25,20,Android
9,9,churned,84,68,244.802115,2997,72,0,6043.460295,2341.838528,7,3,iPhone


### Data exploration

#### Label encoding

In order to perform descriptive statistics, we must turn the 'device' variable from categorical to an integer.

The following code assigns a '1' for 'iPhone' and a '2' for 'Android':

In [19]:
#Create a map dictionary
map_dictionary = {'Android': 2, 'iPhone': 1}

#Create a new 'device_type' column
df['device_type'] = df['device']

#Map the new column to the dictionary
df['device_type'] = df['device_type'].map(map_dictionary)

In [20]:
df['device_type'].head()

Unnamed: 0,device_type
0,2
1,1
2,2
3,1
4,2


##### Relationship between device type & number of drives

In [21]:
df.groupby('device_type')['drives'].mean()

Unnamed: 0_level_0,drives
device_type,Unnamed: 1_level_1
1,67.859078
2,66.231838


**Conclusion:** <br>
Drivers who use an iPhone device have a higher number of drives on average. <br>
To asses whether the differenece is statistically significant, we will conduct a hypothesis test.

### Hypothesis testing

To conduct a two-sample t-test, we use the following steps:

1. State the null hypothesis and the alternative hypothesis
2. Choose a significance level
3. Find the p-value
4. Reject or Fail to reject the null hypothesis

#### 1. Hypotheseses

**Null Hypothesis:** <br>
There is no difference in average number of drives between drivers who use iPhone devices and drivers who use Androids

**Alternative Hypothesis:** <br>
There is a difference in average number of drives between drivers who use iPhone devices and drivers who use Androids.

#### 2. Significance level


We choose **5% as the significance level**

#### 3. Find p-value


In [22]:
# 1. Isolate the 'drives' column for iPhone users.
iPhone = df[df['device_type'] == 1]['drives']

# 2. Isolate the 'drives' column for Android users.
Android = df[df['device_type'] == 2]['drives']

# 3. Perform the t-test
stats.ttest_ind(a=iPhone, b=Android, equal_var=False)

TtestResult(statistic=np.float64(1.463523206885235), pvalue=np.float64(0.143351972680206), df=np.float64(11345.066049381952))

#### 4. Reject or Fail to reject null hypothesis

p-value (=0.143) **<** significance level (=0.05)

-> Fail to reject the null hypothesis

### Conclusion

There is **not** a statistically significant difference between drivers who use iPhones and drivers who use Androids.