## Waze Churn Analysis Project

This notebook contains the initial steps in analyzing the Waze app user data to prepare for a churn prediction model. The goal is to understand user behavior and identify features that contribute to user churn.

In [1]:
import pandas as pd
import numpy as np

In [3]:
df = pd.read_csv("waze_dataset.csv")
df.head()

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
0,0,retained,283,226,296.748273,2276,208,0,2628.845068,1985.775061,28,19,Android
1,1,retained,133,107,326.896596,1225,19,64,13715.92055,3160.472914,13,11,iPhone
2,2,retained,114,95,135.522926,2651,0,0,3059.148818,1610.735904,14,8,Android
3,3,retained,49,40,67.589221,15,322,7,913.591123,587.196542,7,3,iPhone
4,4,retained,84,68,168.24702,1562,166,5,3950.202008,1219.555924,27,18,Android


### Next Steps:
I’ll now:

Summarize the data types of each column.

Gather descriptive statistics to support EDA.

Draft a brief executive summary for both technical and non-technical stakeholders.

In [4]:
waze = df

In [7]:
waze.dtypes

ID                           int64
label                       object
sessions                     int64
drives                       int64
total_sessions             float64
n_days_after_onboarding      int64
total_navigations_fav1       int64
total_navigations_fav2       int64
driven_km_drives           float64
duration_minutes_drives    float64
activity_days                int64
driving_days                 int64
device                      object
dtype: object

### Observations:
The target column is label, which is categorical with values like retained.

The feature columns are mostly numerical, suitable for machine learning models.

device is a categorical variable (e.g., "Android", "iPhone").

Next, I’ll compute descriptive statistics to assist in exploratory data analysis.

In [8]:
waze_summary = waze.describe(include='all')
waze_summary

Unnamed: 0,ID,label,sessions,drives,total_sessions,n_days_after_onboarding,total_navigations_fav1,total_navigations_fav2,driven_km_drives,duration_minutes_drives,activity_days,driving_days,device
count,14999.0,14299,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999.0,14999
unique,,2,,,,,,,,,,,2
top,,retained,,,,,,,,,,,iPhone
freq,,11763,,,,,,,,,,,9672
mean,7499.0,,80.633776,67.281152,189.964447,1749.837789,121.605974,29.672512,4039.340921,1860.976012,15.537102,12.179879,
std,4329.982679,,80.699065,65.913872,136.405128,1008.513876,148.121544,45.394651,2502.149334,1446.702288,9.004655,7.824036,
min,0.0,,0.0,0.0,0.220211,4.0,0.0,0.0,60.44125,18.282082,0.0,0.0,
25%,3749.5,,23.0,20.0,90.661156,878.0,9.0,0.0,2212.600607,835.99626,8.0,5.0,
50%,7499.0,,56.0,48.0,159.568115,1741.0,71.0,9.0,3493.858085,1478.249859,16.0,12.0,
75%,11248.5,,112.0,93.0,254.192341,2623.5,178.0,43.0,5289.861262,2464.362632,23.0,19.0,


#### Early Observations:
Some users have zero sessions, drives, or activity—likely candidates for churn.

High values in driven_km_drives and duration_minutes_drives suggest loyal users with significant engagement.

The wide range in total_navigations_fav1 and fav2 could be key behavioral signals for retention.

### 🧾 Column Descriptions

| Column Name               | Type   | Description                                                                 |
|---------------------------|--------|-----------------------------------------------------------------------------|
| `ID`                      | int    | A sequential numbered index (unique identifier for each user)              |
| `label`                   | object | **Target variable** – whether the user was _retained_ or _churned_         |
| `sessions`                | int    | Number of times a user opened the app during the month                     |
| `drives`                  | int    | Number of times a user drove at least 1 km during the month                |
| `device`                  | object | Type of device used to start a session (e.g., Android, iOS)                |
| `total_sessions`          | float  | Estimated total number of sessions since the user onboarded                |
| `n_days_after_onboarding` | int    | Number of days since the user signed up for the app                        |
| `total_navigations_fav1` | int    | Total navigations to the user’s favorite place 1 since onboarding          |
| `total_navigations_fav2` | int    | Total navigations to the user’s favorite place 2 since onboarding          |
| `driven_km_drives`        | float  | Total kilometers driven by the user during the month                       |
| `duration_minutes_drives`| float  | Total duration (in minutes) the user drove during the month                |
| `activity_days`           | int    | Number of days the user opened the app during the month                    |
| `driving_days`            | int    | Number of days the user drove at least 1 km during the month               |

-

### Next Step:
I’ll draft a brief executive summary tailored for both the technical and cross-functional team members.