# Understanding the Drivers of Churn in Telecom Company

T11: Karshni Mitra (karshnim) & Yi-Hsueh Yang (yihsuehy)

Data Source: https://data.world/mcc450/telecom-churn


##Motivation

According to Harvard Business Review, the cost of acquiring a new customer is 
five times more expensive than retent an old one. When a customer leaves, companies lose not only the future revenue from this customer but also the resources spent to acquire the customer in the first place. The value proposition of retaining customers is undeniable, especially for the telecom industry, where the average churn rate can range from 10-65%, consistently higher than most industries. In the USA, this rate is about 22%. While calculating revenue churn (loss of revenue) may not be possible from the given data, we aim to identify churners and act before they churn, addressing this crucial issue in a subscription-relied company in a high churn industry.

Although it is believed that a company can earn some of the lost revenue from new acquired users, it is not a sustainable process since acquisition costs are generally higher. Additionally, according to marginal diminishing returns, new users of a new product might increase significantly in the beginning but the increase will eventually end, and the company should rely on its old customers to keep renewing its contract to earn revenue.

Looking at the churn rate at one company not only reflects the revenue it makes but also tells us whether the company is performing well in overall. Reasons vary from the quality of the product, the attitude of the service provider to the emerging competitors, etc., all affecting the churn rate and reflecting the performance and public image of a company at a given time. Therefore, to create a model that can provide a more accurate prediction on this rate is essential for the company to quickly find out the flaw in their service and make the needed repairments.


##Project Plan

In this project, we are data scientists in a telecom company, and our primary goal is going to help accurately predict the chances of a customer churning and the possible reason for them doing so with data-driven evidence. In this way, we could pass our analysis to the marketing team to provide value for them to set up differentiated marketing strategies for different groups of customers. The final goal of the whole project though is not shown in this notebook; it's still worth mentioning, we aim to optimize marketing / retention budget to gain more revenue from reaching a high customer retention rate. For example: we may not want to spend money on customers who have a high probability of churning, and focus on 'medium risk' churners.

We aim to approach this in two possible ways:
1. Compare and constrast users who churn vs those who do not. Understand why the customer is churning. Is it quality issues such as weak signal in a particular geographic area, perhaps cheaper rates at some competitor? 

2.   For customers who we predict will churn, analyze the 'most similar' customer(s) who do *not* churn to identify a retention strategy. eg: Discount on annual subscriptions etc.

The main idea is to predict the probability of customer churning and classify them according to their reasons. In this way, we can have valuable information giving us clearer picture of the segmentation of our customers that is beneficial for setting the marketing strategy.

The primary type of learning we are planning to implement is a supervised learning technique; classification. The classification will be binary (churn/no churn). We plan to also predict the reason of churn, which could be considered a multiclass classification problem.

Additionally, we may look into Clustering (Unsupervised Learning) to identify similar customers to devise a retention strategy. A good metric of evaluation would be Recall, to identify how many of the churners we can actually identify.

## Data


The site of our data is cited above. The domain of the data is customer churn information of a telecom company, primarily around the West Coast (California) of USA.

 It consists of three CSV files: telecom_customer_churn.csv has the primary data, telecom_data_dictionary.csv has the explanation of all columns in the telecom_customer_churn csv, telecom_zipcode_population.csv provides zip codes that could be linked to the primary dataset and its corresponding population. The main dataset consists of plans, geolocations, demographic information about customers and whether they churned or not.
 
 Although churn prediction is a common topic in the field of data science, these particular datasets are relatively less explored, and other than some EDA, we did not find any predictive modelling work on this online. Therefore, we aim to take a new approach to this common churn problem, by not only identifying *if* a customer will churn, but also *why*. Since we also have geographical data available, we could identify 'regions' of high churn rate, regions where our company is the dominant player and analyzing these could give us valuable insights on the companies strengths and weaknesses.



## Questions to Answer (Hypothesis)

> *   Amongst the reason people are churning, is there a clear majority?
*   Are the churners in a particular area? Do churners in a particular area of high churn have similar reasons for churning? (eg: Customer service issues)
*   Are churners / non-churners on a particular type of plan (Family/Unlimited)? This would be valuable in identifying best/worst selling services.
* Do some type of customers have more / less propensity to churn than others? If so, we could propose to focus company resources proportionately on the required customer groups.









 ## Possible Findings and Implications



We might find out that for all reasons of churning, the probability of them doing it might vary a lot, meaning that we might see the pattern that the probability range from those who churn because of the network reliability is [0.3, 0.7] and for those who churn because of product dissatisfaction is [0.5, 0.9]. This pattern is anticipated and is a good reference for us when helping with deciding which group of customers should be promoted with more ads or provided with free/discounted service. It could also potentially give us indicators on where the company can improve (customer service, network infrastructure).   

Sources: 

https://hbr.org/2022/12/in-a-downturn-focus-on-existing-customers-not-potential-ones

https://www.techmahindra.com/en-in/blog/maximizing-business-profit-through-telecom-analytics-solutions/

https://www.statista.com/statistics/816735/customer-churn-rate-by-industry-us/