# Telecom Customer Churn Analysis & Prediction

## Introduction 
Customer attrition, also known as churn, is a major cost for any organization. It represents the proportion of customers who discontinue using a company’s product or service within a given time frame.Our project, “Telecom Churn Analysis and Prediction” leverages machine learning to predict customer churn in the telecom industry. The importance of churn prediction cannot be overstated for telecom companies, as it aids in customer retention and reduces the costs associated with acquiring new customers.

## Problem Statement 
Telecommunication companies face a significant challenge with customer churn, which is the loss of customers discontinuing their services. The solution to this problem lies in identifying customers who are at risk of churning and taking proactive steps to retain them. Machine learning models can assist telecom companies in predicting which customers are most likely to churn. These predictions are based on a variety of factors, including customer usage patterns, payment history, and demographic information.

Telecom company Databel has been in the market for five years and has experienced significant growth. However, increasing competition and shifting consumer preferences have raised concerns about the churn rate — the percentage of subscribers who cancel their memberships each month. I identified the underlying causes of churn and develop strategies to retain the valuable customers.

In this project we are going to analyze the Databel customer data, provide insight on customer data, visualize the customer data, and build a churn prediction model using Python, Sklearn, Power Query, Power BI and Microsoft Fabric environment.

## Data Set 
The Databel customer data (Databel_Data.csv) is located at Kaggle and also my personal Github repository (https://github.com/mdagteki/data_sources)

## Methodology

We will utilize Microsoft Fabric Environment, Power Query, Power BI, Python and Sklearn to build a churn prediction model.

## Data Preparation
### Importing Data into Fabric Environment
At first we will create a new workspace named Churn_WS in Microsoft Fabric.

![Churn1.png](images/Churn1.png)

Now we can choose our new workspace and start working on it.

![Churn2.png](images/Churn2.png)

From the left bottom corner choose Synapse Data Engineering and in Data Engineering Home page choose new Lakehouse.

![Churn3.png](images/Churn3.png)

![Churn4.png](images/Churn4.png)

We will name our Lakehouse Churn_LH.

![Churn5.png](images/Churn5.png)

In our Churn_LH we need to ingest the data, we will use a Dataflow, from the menu choose New Dataflow Gen2. (Dataflow Gen2 is a great tool data Ingesting, preparation, cleaning, and transformation.)

![Churn6.png](images/Churn6.png)

On the Dataflow page from top right corner change the DataFlow name from Dataflow1 to Churn_DF

![Churn7.png](images/Churn7.png)

Now our data is located in a Github repository, we will choose Import from a Text/CSV file in our Churn_DF.

![Churn8.png](images/Churn8.png)

In the connection settings choose Link to file option and paste the File path URL. Leave all setting as it is and click Next then on next page click create.

![Churn9.png](images/Churn9.png)

Now our data ingested into a power query online page where we can clean and transform our data as we needed.

![Churn19.png](images/Churn19.png)

In power Query online we are going to apply necessary transformations like replacing empty Churn Category and Churn Reason values with Unknown, and checking data types and column quality, distribution and other statistics.
To be able to preview this column profile futures click options and check the necessary boxes under column profile.

![Churn10.png](images/Churn10.png)

Power Query online (and desktop) shows top 1,000 rows from a dataset as default, to be able to see all dataset details we need to activate Column profiling based on entire dataset option from left bottom or in options section.

![Churn11.png](images/Churn11.png)

Replacing empty values with Unknown

![Churn12.png](images/Churn12.png)

The Column profiling options.

![Churn13.png](images/Churn13.png)

After we finish all our necessary transformations we can publish our Chur_DF Dataflow into our Churn_LH Lakehouse. This will create a Delta table inside the lakehouse which we can work on. Delta tables are great options when you are working with high volume unstructured data, here that is not the case.

![Churn14.png](images/Churn14.png)