## Inferential Statistics: Telecom Churn Data

### Overview

The purpose of this analysis is to see if the **difference between the proportion of monthly and one year plus churn rates** are the same so that we can see if it is statistically **AND** practically significant to take action among the respective gender.

In [4]:
df = pd.read_csv('churn.csv') # Import csv
churn = df[['Churn','Contract']].copy() # Make a copy of only neccessary column

In [13]:
conditions = [ # Create list of current values
        df.Contract == 'One year',
        df.Contract == 'Month-to-month',
        df.Contract == 'Two year']
choices = ['One Year Plus', 'Month', 'One Year Plus'] # Create list of new values

In [15]:
df.Contract = np.select(conditions, choices) # Map conditions with choices
churn = df[['Churn','Contract']].copy() # Make a copy of only neccessary column

In [16]:
churn.head()

Unnamed: 0,Churn,Contract
0,No,Month
1,No,One Year Plus
2,Yes,Month
3,No,One Year Plus
4,Yes,Month


### Sample Proportion Calculation for Churn / No Churn

In [18]:
churn_m = churn[churn.Contract=='Month'][churn.Churn== 'Yes']['Churn'].count() # Month churn count

churn_y = churn[churn.Contract=='One Year Plus'][churn.Churn== 'Yes']['Churn'].count() # One Year Plus churn count

  """Entry point for launching an IPython kernel.
  This is separate from the ipykernel package so we can avoid doing imports until


In [20]:
n = len(churn)# number of customers

In [21]:
prop_m = churn_m/n # churn proportion for month contracts

prop_y = churn_y/n # churn proportion for year contracts

print("The sample proportion of churn for month contracts is "+str(round(prop_m,3)))
print("The sample proportion of churn for one year plus contracts is "+str(round(prop_y,3)))

The sample proportion of churn for month contracts is 0.235
The sample proportion of churn for one year plus contracts is 0.03


### Central Limit Theorem Conditions

**Random Condition:** Each customer is randomly obtained and recorded and thus, our sample **meets** the random condition of the Central Limit Theorem.

**Normal  Condition:** Both sample proportions of month and one year plus churn rates, when multiplied by sample size, are **greater than 10**. Their proportions are both in the **middle of 0 and 1** as well as have a **large number of records**. Thus, this sampling distribution for both sample proportions **meet** the normal condition.

**Independence Condition:** Both samples have sample sizes that are **less than 10%** of the number of members. Thus, this our sample distribution for both sample proportions **meet** the independent condition.


### Null & Alternative Hypothesis

*Null Hypothesis:* In terms of churn, there is **no difference** for customers who have monthly contracts and customers who have one year plus contracts.

*Alternative Hypothesis:* In terms of churn, there is **a difference** customers who have monthly contracts and customers who have one year plus contracts.

### Significance Level & Power

**Significance Level**: Î± = 0.01 

**Power**: We are worried of making a **Type I error** because if there is no difference between the sample proportion of churn for monthly and one year plus contracts and reject this, telecom companies will be taking action on something that is insignificant --- as a result they will be **wasting time** and **losing money on action items**.

### Margin of Error & Confidence Interval

In [22]:
prop_diff = prop_m - prop_y # sample statistic

std_error = math.sqrt((prop_m * (1-prop_m) / n) + (prop_y * (1-prop_y)/n)) # standard error

z_score = round(stats.norm.ppf(.995),3) # Z-score of 99% confidence interval

print("The difference between the sample proportions of monthly and one year plus contracts is", round(prop_diff,3))

The difference between the sample proportions of monthly and one year plus contracts is 0.205


In [23]:
moe = z_score*std_error # margin of error

lower = prop_diff - moe # lower level of confidence interval
upper = prop_diff + moe # upper level of confidence interval

print("Margin of Error:", round(moe,3))
print("Confidence Interval:", [round(lower,3),round(upper,3)])

Margin of Error: 0.014
Confidence Interval: [0.191, 0.219]


The is a **99% chance** that the true difference between monthly and one year plus churn percentages is between *.191 and .219.*

This means we are **99% confident** that there exists a difference between monthly contracts and one year plus churn percentages.

### Z-Score & P-Value

In [26]:
p = prop_m = prop_y # assuming null hypothesis is true

p_hat = (churn_m + churn_y) / (n + n) # new proportion assuming null-hypothesis is true

std_error = math.sqrt((2 * p_hat * (1-p_hat))/n) # standard error of sampling proportion assuming null-hypothesis is true

z_score = (prop_diff - 0)/std_error # calculate z score

p_value = stats.norm.sf(abs(z_score)) #calculate p-value

print('Z-score:', round(z_score,2))
print('P-Value:', round(p_value,4))

Z-score: 35.79
P-Value: 0.0


The probability of getting a Z-score **as extreme or more extreme** than 35.79 is 9%, *assuming the null-hypothesis is true*. 

Since our p-value is **less than** our predetermined significance level of 0.01, we **do reject** the null hypothesis and assume **statistical significance** in the difference between proportions of monthly and one year plus churn rates.

### Conclusion

Firstly, our statistical analysis says that we are *confident* that the true difference between churn rates for monthly and one year plus contracts is between .191 and 0.219, 99% of the time. Thus, we are quite confident that the true difference is between .191 and .219.

Thus, contract types will be a good predictor variable to look further into when implementing our model.

Thus, when advising a telecom to take action upon this analysis or not to take action, it is clear to focus on the contract types of the customers which was quite expected beforehand.