# **Waze Project**
**Course 4 - The Power of Statistics**

# **Course 4 End-of-course project: Data exploration and hypothesis testing**

**The goal** is to apply descriptive statistics and hypothesis testing in Python to analyze the relationship between mean amount of rides and device type
<br/>

*This activity has three parts:*

**Part 1:** Imports and data loading

**Part 2:** Conduct hypothesis testing

**Part 3:** Communicate insights with stakeholders

<br/>


# **Data exploration and hypothesis testing**

### Imports and data loading




In [2]:
import pandas as pd
from scipy import stats


In [3]:
df = pd.read_csv('dataset_encoded_features.csv')

### Data exploration

In [4]:
df.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,ID,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,km_per_driving_day,drives_per_driving_day,km_per_drive,sessions_per_activity_day,is_churned,device_code
0,0,0,0,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,138.360267,11.894737,11.632058,10.107143,0,2
1,1,1,1,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,1246.901868,9.727273,128.186173,10.230769,0,1
2,2,2,2,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,382.393602,11.875,32.201567,8.142857,0,2
3,3,3,3,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,304.530374,13.333333,22.839778,7.0,0,1
4,4,4,4,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,219.455667,3.777778,58.091206,3.111111,0,2


We are interested in the relationship between device type and the number of drives. One approach is to look at the average number of drives for each device type.

In [7]:
device_to_code = {"iPhone": 1, "Android": 2}

In [6]:
df.groupby("device_code")["drives"].mean()

device_code
1    80.943270
2    78.183946
Name: drives, dtype: float64

Based on the averages shown, it appears that drivers who use an iPhone (code 1) device to interact with the application have a higher number of drives on average. However, this difference might arise from random sampling, rather than being a true difference in the number of drives. To assess whether the difference is statistically significant, we can conduct a hypothesis test.


### Hypothesis testing

1.   State the null hypothesis and the alternative hypothesis
2.   Choose a signficance level
3.   Find the p-value
4.   Reject or fail to reject the null hypothesis

**Note:** This is a t-test for two independent samples. This is the appropriate test since the two groups are independent (Android users vs. iPhone users).

Recall the difference between the null hypothesis ($H_0$) and the alternative hypothesis ($H_A$).

**Question:** What are your hypotheses for this data project?

> $H_0$: mean drives are the same for iPhone and Android users

> $H_a$: mean drives are different depending on the OS

Next, choose 5% as the significance level and proceed with a two-sample t-test and perform the test.

In [8]:
# 1. Isolate the `drives` column for iPhone users.
iphone_drives = df[df["device_code"] == device_to_code["iPhone"]]["drives"]

# 2. Isolate the `drives` column for Android users.
android_drives = df[df["device_code"] == device_to_code["Android"]]["drives"]

# 3. Perform the t-test
stats.ttest_ind(a=iphone_drives, b=android_drives, equal_var=False)

TtestResult(statistic=2.1178691333186728, pvalue=0.034215567109655586, df=8365.185886553507)

**Question:** Based on the p-value you got above, do you reject or fail to reject the null hypothesis?

> P-value is 3%, which is fewer then 5%, so we **reject** the hypothesis. This means that there is statistical significance in the difference of the mean rides between android and iphone users

### Communicate insights with stakeholders

Now that you've completed your hypothesis test, the next step is to share your findings with the Waze leadership team. Consider the following question as you prepare to write your executive summary:

* What business insight(s) can you draw from the result of your hypothesis test?

>
> * On average, both Android and iPhone users a close number of drives (Android - \~78 drives last month, iPhone  \~81)
> * Hypothesis testing revealed that there is tatistical significance in the difference of the number of drives between these groups, so it is important to look into how OS used may inlfuence the number of drives

