# Churn Rate Analysis

## Assumptions of the data
- Every tuple is referencing 1 customer
- Account_length is in months
- Charges are per year
- If they do not have a voicemail plan they cannot have any voicemails
- If they do not have an international plan they can still make international calls (at a cost)

## Import Big Query to VS code

In [2]:
## Import Big Query to VScode

from google.cloud import bigquery
client = bigquery.Client()

## In order to use Biq Query in VScode, import the package and assign the client.
## Use `def` to define `query_and_display` function. Each query is a multiline string variable, query_and_display( ''', which is passed as an argument to function.

def query_and_display(sql): 
    return display(client.query(sql).to_dataframe())

## Churn rate overview
The churn rate is the annual percentage rate that customers cancel their service, calculated as the number of customers who churned divided by the total number of customers, times by 100 to get a percentage.
First the churn rate and total_customers (sample size) is calcualted. This helps understand the baseline.
Then basic analysis (mean, median, mode, standard deviation) is done to get a better understanding of the key data points.
Finally segementation and analysis of the groups is conducted, and location specific analysis is done.



In [45]:
query_and_display("""
    SELECT 
        COUNT(DISTINCT unique_id) AS total_customer,
        SUM(churn_binary) AS churned_customer
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
"""
)

Unnamed: 0,total_customers,churned_customers
0,4250,598


In [49]:
query_and_display("""
    SELECT 
        SUM(churn_binary)/COUNT(area_code)*100 AS churn_rate
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
"""
)

Unnamed: 0,churn_rate
0,14.070588


## Mean, Median, Mode and Standard Deviation
For basic analysis, a churned comparison of the mean, median, mode and standard deviation for different variables is done.

### Number of Voicemail messages
Higher mean number of voicemail messages for active accounts
Median is 0 for churned for and acitve accounts


In [68]:
query_and_display("""
    SELECT 
        churn,
        AVG(number_vmail_messages) AS mean,
        APPROX_QUANTILES(number_vmail_messages,100)[OFFSET(50)] AS median,
        APPROX_TOP_COUNT(number_vmail_messages,1) AS mode,
        STDDEV(number_vmail_messages) AS stddev
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY churn
""")

Unnamed: 0,churn,mean,median,mode,stddev
0,False,8.177437,0,"[{'value': 0, 'count': 2623}]",13.706304
1,True,4.299331,0,"[{'value': 0, 'count': 516}]",11.124649


### Total International calls
Comparable mean number of total international calls, very slightly higher for active accounts
Median is 4 for both churned and active accounts

In [69]:
query_and_display("""
    SELECT 
        churn,
        AVG(total_intl_calls) AS mean, 
        APPROX_QUANTILES(total_intl_calls,100)[OFFSET(50)] AS median,
        APPROX_TOP_COUNT(total_intl_calls,1) AS mode,
        STDDEV(total_intl_charge) AS stddev
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY churn
""")

Unnamed: 0,churn,mean,median,mode,stddev
0,False,4.46057,4,"[{'value': 3, 'count': 743}]",0.742466
1,True,4.217391,4,"[{'value': 2, 'count': 124}]",0.754413


### Number of Customer service calls
Higher mean number of customer service calls for churned accounts
Median is 1 call higher for churned accounts

In [70]:
query_and_display("""
    SELECT 
        churn,
        AVG(number_customer_service_calls) AS mean, 
        APPROX_QUANTILES(number_customer_service_calls,100)[OFFSET(50)] AS median,
        APPROX_TOP_COUNT(number_customer_service_calls,1) AS mode,
        STDDEV(number_customer_service_calls) AS stddev
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY churn
""")

Unnamed: 0,churn,mean,median,mode,stddev
0,False,1.441676,1,"[{'value': 1, 'count': 1358}]",1.165159
1,True,2.27592,2,"[{'value': 1, 'count': 166}]",1.827334


### Account length
Slightly higher mean account length for churned accounts
Median is 2 months higher for churned accounts

In [71]:
query_and_display("""
    SELECT 
        churn,
        AVG(account_length) AS mean,
        APPROX_QUANTILES(account_length,100)[OFFSET(50)] AS median,
        APPROX_TOP_COUNT(account_length,1) AS mode,
        STDDEV(account_length) AS stddev
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY churn
""")

Unnamed: 0,churn,mean,median,mode,stddev
0,False,99.924973,99,"[{'value': 104, 'count': 332}]",39.748743
1,True,102.137124,101,"[{'value': 65, 'count': 55}]",39.369162


### Total charges
Higher mean total charges for churned accounts
Median is $8.15 higher for churned accounts

In [72]:
query_and_display("""
    SELECT
        churn,
        AVG(total_charges) AS mean,
        APPROX_QUANTILES(total_charges,100)[OFFSET(50)] AS median,
        APPROX_TOP_COUNT(total_charges,1) AS mode,
        STDDEV(total_charges) AS stddev
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY churn
""")

Unnamed: 0,churn,mean,median,mode,stddev
0,False,58.45784,59.06,"[{'value': 43.26, 'count': 162}]",9.274097
1,True,65.532174,67.21,"[{'value': 77.0, 'count': 28}]",13.882922


## Segmenting
Segmenting the customers into different cohorts can make it easier to identify the type of customers at risk of churn.
The following variables have been segmented; plan type (voicemail and international), account length, yrr and number of customer service calls
A churn analysis was run to compare the segments within the each variable.

### Account length
The cohorts with the most amount of churns are 4-8 and 8-12 years, they also have the most amount of active accounts so this is to be expected. As a percentage, the longest tenure accounts 16-20 years has a higher churn rate then the other groups. The 0-4 and 4-8 years shows the lowest percentages also.

In [9]:
query_and_display("""
    SELECT 
        SUM(churn_binary) AS churn_count,
        COUNT(
            CASE 
                WHEN NOT churn THEN 1 
            END) AS active_count,
            account_length_group_years
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY account_length_group_years
    ORDER BY churn_count DESC
""")

Unnamed: 0,churn_count,active_count,account_length_group_years
0,250,1442,8-12
1,215,1370,4-8
2,72,436,12-16
3,50,364,0-4
4,11,40,16-20


In [6]:
query_and_display("""
    SELECT
        ROUND((SUM(churn_binary)/COUNT(unique_id)*100),2) AS churn_rate,
        account_length_group_years
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY account_length_group_years
    ORDER BY churn_rate DESC
""")

Unnamed: 0,churn_rate,account_length_group_years
0,21.57,16-18
1,14.78,8-12
2,14.17,12-16
3,13.56,4-8
4,12.08,0-4


### YRR
The cohort with the most amount of churns is $60.00 - $79.99 $40.00 - $59.99, they also have the most amount of active accounts so this is to be expected. As a percentage, the cohort $80.00 - $99.99 has a much higher churn rate then the other groups. 

In [4]:
query_and_display("""
    SELECT 
        SUM(churn_binary) AS churn_count,
        COUNT(
            CASE 
                WHEN NOT churn THEN 1 
            END) AS active_count,
        yrr_group
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY yrr_group
    ORDER BY churn_count DESC
""")

Unnamed: 0,churn_count,active_count,yrr_group
0,274,1664,$60.00 - $79.99
1,226,1844,$40.00 - $59.99
2,82,26,$80.00 - $99.99
3,16,118,$20.00 - $39.99


In [5]:
query_and_display("""
    SELECT 
        ROUND((SUM(churn_binary)/COUNT(unique_id)*100),2) AS churn_rate,
        yrr_group
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY yrr_group
    ORDER BY churn_rate DESC
""")

Unnamed: 0,churn_rate,yrr_group
0,75.93,$80.00 - $99.99
1,14.14,$60.00 - $79.99
2,11.94,$20.00 - $39.99
3,10.92,$40.00 - $59.99


### Voicemail plan
Number of accounts with VM messages (and therefore VM plan) grouped by churn.


In [90]:
query_and_display("""
    SELECT 
        churn,
        SUM(IF (number_vmail_messages > 0, 1, 0)) AS voicemail_count,
    FROM 
        `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY 
        voice_mail_plan,
        churn
    HAVING voice_mail_plan IS TRUE
    ORDER BY voicemail_count
""")

Unnamed: 0,churn,voicemail_count
0,True,82
1,False,1029


#### churn rate - Voicemail
Churn rate comparison between has voicemail plan and does not have voicemail plan.
This can help identify if having voicemail plan increases churn rate

In [91]:
query_and_display("""
    SELECT 
        voice_mail_plan,
        ROUND((SUM(churn_binary)/COUNT(unique_id)*100),2) AS churn_rate
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY voice_mail_plan
""")

Unnamed: 0,voice_mail_plan,churn_rate
0,False,16.44
1,True,7.37


### International plan
Number of accounts with International plan grouped by churn and not churned.

In [92]:
query_and_display("""
    SELECT 
        churn,
        COUNT(
            CASE 
                WHEN NOT churn THEN 1 
            END) AS active_count,
        COUNT(
            CASE 
                WHEN churn THEN 1 
            END) AS churn_count,
        COUNT(
            CASE 
                WHEN international_plan THEN 1 
            END) AS international_plan
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY churn
""")

Unnamed: 0,churn,active_count,churn_count,international_plan
0,False,3652,0,229
1,True,0,598,167


#### Churn rate - International plan
Churn rate comparison between has international plan and does not have international plan.
This can help identify if having international plan increases churn rate.

In [93]:
query_and_display("""
    SELECT
        international_plan,
        ROUND((SUM(churn_binary)/COUNT(unique_id)*100),2) AS churn_rate
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY international_plan
""")

Unnamed: 0,international_plan,churn_rate
0,False,11.18
1,True,42.17


#### International plan costs and churn rate
International plan and international calls and charges comparison.

In [94]:
query_and_display("""
    SELECT
        churn,
        international_plan,
        SUM(total_intl_charge) AS intl_charge, 
        SUM(total_intl_calls) AS intl_calls, 
        SUM(total_intl_minutes) AS intl_minutes
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY 
        churn, 
        international_plan
""")

Unnamed: 0,churn,international_plan,intl_charge,intl_calls,intl_minutes
0,False,False,9449.06,15164,34990.0
1,False,True,604.96,1126,2240.1
2,True,False,1203.28,1874,4455.7
3,True,True,513.73,648,1902.5


In [95]:
query_and_display("""
    SELECT 
        ROUND((SUM(churn_binary)/COUNT(unique_id)*100),2) AS churn_rate,
        state
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY state
    ORDER BY churn_rate DESC
""")

Unnamed: 0,churn_rate,state
0,27.08,NJ
1,25.64,CA
2,22.5,WA
3,22.09,MD
4,21.25,MT
5,20.51,OK
6,20.48,NV
7,19.44,SC
8,19.39,TX
9,18.29,MS


In [96]:
query_and_display("""
    SELECT 
        state,
        COUNT(state) AS account_count,
        ROUND((SUM(churn_binary)/COUNT(unique_id)*100),2) AS churn_rate,
        SUM(churn_binary) AS churn_count, 
        SUM(number_vmail_messages) AS vmail_count,
        SUM(total_charges) AS charges_count, 
        SUM(number_customer_service_calls) AS cs_calls_count
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY state
    ORDER BY churn_rate DESC
""")

Unnamed: 0,state,account_count,churn_rate,churn_count,vmail_count,charges_count,cs_calls_count
0,NJ,96,27.08,26,653,5881.55,152
1,CA,39,25.64,10,229,2220.7,51
2,WA,80,22.5,18,437,4735.04,121
3,MD,86,22.09,19,549,5219.17,141
4,MT,80,21.25,17,666,4690.35,131
5,OK,78,20.51,16,517,4774.07,138
6,NV,83,20.48,17,642,4993.03,140
7,SC,72,19.44,14,606,4183.46,114
8,TX,98,19.39,19,618,5901.71,164
9,MS,82,18.29,15,677,4796.32,134


### Number of Customer service calls
There is a signifcant increase in churn rate when customer service calls are 4 and above

In [97]:
query_and_display("""
    SELECT 
        number_customer_service_calls,
        COUNT (number_customer_service_calls) AS call_count,
        ROUND((SUM(churn_binary)/COUNT(unique_id)*100),2) AS churn_rate
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY number_customer_service_calls 
    ORDER BY churn_rate DESC
""")

Unnamed: 0,number_customer_service_calls,call_count,churn_rate
0,9,2,100.0
1,6,28,67.86
2,5,81,60.49
3,7,13,53.85
4,8,2,50.0
5,4,209,44.02
6,3,558,11.29
7,0,886,10.95
8,1,1524,10.89
9,2,947,10.77


## Location data
### State
Location specific data is evaluated. This can help individual branches with marketing decisions and tailor retention process on a state by state basis

In [99]:
query_and_display("""
    SELECT 
        state,
        COUNT(state) AS account_count,
        ROUND((SUM(churn_binary)/COUNT(unique_id)*100),2) AS churn_rate,
        SUM(churn_binary) AS churn_count, 
        SUM(number_vmail_messages) AS vmail_count,
        SUM(total_charges) AS charges_count, 
        SUM(number_customer_service_calls) AS cs_calls_count
    FROM `nomadic-ocean-395807.churn_rate.customer_data`
    GROUP BY state
    ORDER BY churn_rate DESC
""")

Unnamed: 0,state,account_count,churn_rate,churn_count,vmail_count,charges_count,cs_calls_count
0,NJ,96,27.08,26,653,5881.55,152
1,CA,39,25.64,10,229,2220.7,51
2,WA,80,22.5,18,437,4735.04,121
3,MD,86,22.09,19,549,5219.17,141
4,MT,80,21.25,17,666,4690.35,131
5,OK,78,20.51,16,517,4774.07,138
6,NV,83,20.48,17,642,4993.03,140
7,SC,72,19.44,14,606,4183.46,114
8,TX,98,19.39,19,618,5901.71,164
9,MS,82,18.29,15,677,4796.32,134
