# <p style="background-color:coral;font-family:newtimeroman;font-size:150%;color:white;text-align:center;border-radius:20px 20px;"><b>Telco Churn Prediction with Logistic Regression</b></p>
![](https://kranthi.me/wp-content/uploads/2020/04/Telecom_Churn_Prediction-e1587281300645.jpg)

<a id="toc"></a>
# **Table of Contents**

**1.**  [**Reading Data**](#Step1)<br>
**2.**  [**Understanding Data**](#Step2)<br>
**3.**  [**Data Visualization**](#Step3)<br>
**4.**  [**Data Preprocessing**](#Step4)<br>
**5.**  [**Model Building**](#Step5)<br>
**6.**  [**Model Evaluation**](#Step6)<br>
**7.**  [**Predicting New Data**](#Step7)<br>

<a id="Step1"></a>
# <p style="background-color:coral;font-family:newtimeroman;font-size:150%;color:white;text-align:center;border-radius:20px 20px;"><b>1. Reading Dataset</b></p>
<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:white; background-color: #8B0000" data-toggle="popover">Content</a>

In [None]:
import pandas as pd 
df = pd.read_csv("../input/telco-customer-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv")
df.head()

<a id="Step2"></a>
# <p style="background-color:coral;font-family:newtimeroman;font-size:150%;color:white;text-align:center;border-radius:20px 20px;"><b>2. Understanding Dataset</b></p>
<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:white; background-color: #8B0000" data-toggle="popover">Content</a>

In [None]:
df.shape

In [None]:
df.dtypes

<a id="Step3"></a>
# <p style="background-color:coral;font-family:newtimeroman;font-size:150%;color:white;text-align:center;border-radius:20px 20px;"><b>3. Data Visualization</b></p>
<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:white; background-color: #8B0000" data-toggle="popover">Content</a>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
sns.set(rc = {"figure.figsize": (10,6), "figure.dpi" : 300})

## What is the ratio between males and females in the company?

In [None]:
x = round(df["gender"].value_counts()/df.shape[0]*100,2)
plt.pie(x,labels = ["Male", "Female"],  explode = [0.1,0], autopct= '%.2f%%')
plt.legend()
plt.show()

## What is the ratio between Senior Citizens and others in the company?

In [None]:
x = round(df["SeniorCitizen"].value_counts()/df.shape[0]*100,2)
plt.pie(x,labels = ["Yes", "No"],  explode = [0.1,0], autopct= '%.2f%%')
plt.legend()
plt.show()

## What is the ratio between who has partners and not in the company?

In [None]:
x = round(df["Partner"].value_counts()/df.shape[0]*100,2)
plt.pie(x,labels = ["Yes", "No"],  explode = [0.1,0], autopct= '%.2f%%')
plt.legend()
plt.show()

## What is the ratio between who has dependents and not in the company?

In [None]:
x = round(df["Dependents"].value_counts()/df.shape[0]*100,2)
plt.pie(x,labels = ["Yes", "No"],  explode = [0.1,0], autopct= '%.2f%%')
plt.legend()
plt.show()

## What is the ratio between who has MultipleLines and not in the company?

In [None]:
x = round(df["MultipleLines"].value_counts()/df.shape[0]*100,2)
plt.pie(x,labels = ["Yes", "No", "No phone service"],  explode = [0.05,0.05,0.05], autopct= '%.2f%%')
plt.legend(loc='lower right')
plt.show()

## What is the distribution of payment methods?

In [None]:
plt.figure(figsize=(10,6))
sns.countplot(x="PaymentMethod", data=df)

<a id="Step4"></a>
# <p style="background-color:coral;font-family:newtimeroman;font-size:150%;color:white;text-align:center;border-radius:20px 20px;"><b>4. Data Preprocessing</b></p>
<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:white; background-color:#8B0000" data-toggle="popover">Content</a>

In [None]:
df.TotalCharges = pd.to_numeric(df.TotalCharges, errors="coerce")

In [None]:
df.isnull().sum()

## Handling Missing Data

In [None]:
df.TotalCharges=df.TotalCharges.fillna(0)

In [None]:
df.isnull().sum().sum()

## Dealing with Columns and Values in the Dataset

In [None]:
df.columns = df.columns.str.lower().str.replace(" ", "_")

## Dealing with Values in the Columns

In [None]:
string_columns = list(df.dtypes[df.dtypes=="object"].index)
for col in string_columns:
    df[col]=df[col].str.lower().str.replace(" ","_")
df.head()

## Addressing the target and features

In [None]:
df.churn = (df.churn == "yes").astype(int)
df.head()

## Splitting into Dataset

In [None]:
from sklearn.model_selection import train_test_split
df_train_full, df_test = train_test_split(df, test_size=0.2, random_state=42)
df_train, df_val = train_test_split(df_train_full, test_size=0.25, random_state=42)
y_train = df_train.churn.values
y_val = df_val.churn.values
del df_train["churn"]
del df_val["churn"]

## Determining Categorical and Numerical Features

In [None]:
categorical = ['gender', 'seniorcitizen', 'partner', 'dependents',  
               'phoneservice', 'multiplelines', 'internetservice',  
               'onlinesecurity', 'onlinebackup', 'deviceprotection',  
               'techsupport', 'streamingtv', 'streamingmovies',  'contract', 
               'paperlessbilling', 'paymentmethod']
numerical = ['tenure', 'monthlycharges', 'totalcharges']

## One-Hot Encoding 

In [None]:
train_dict = df_train[categorical + numerical].to_dict(orient="records")
train_dict[:1]

In [None]:
from sklearn.feature_extraction import DictVectorizer
dv = DictVectorizer(sparse = False)
dv.fit(train_dict)
X_train = dv.transform(train_dict)
X_train[0]

In [None]:
dv.get_feature_names_out()

<a id="Step5"></a>
# <p style="background-color:coral;font-family:newtimeroman;font-size:150%;color:white;text-align:center;border-radius:20px 20px;"><b>5. Model Building</b></p>
<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:white; background-color:#8B0000" data-toggle="popover">Content</a>

In [None]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(solver="liblinear", random_state=42)
model.fit(X_train, y_train)

<a id="Step6"></a>
# <p style="background-color:coral;font-family:newtimeroman;font-size:150%;color:white;text-align:center;border-radius:20px 20px;"><b>6. Model Evaluation</b></p>
<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:white; background-color:#8B0000" data-toggle="popover">Content</a>

In [None]:
val_dict = df_val[categorical+numerical].to_dict(orient="records")
X_val = dv.transform(val_dict)
y_pred = model.predict_proba(X_val)
y_pred[:5]

In [None]:
print("The performance of the model on the validation dataset: ",model.score(X_val, y_val))
print("The performance of the model on the training dataset: ",model.score(X_train, y_train))

## Coefficients of the Model

In [None]:
print("Bias: ",model.intercept_[0])
print(dict(zip(dv.get_feature_names_out(), model.coef_[0].round(3))))

<a id="Step7"></a>
# <p style="background-color:coral;font-family:newtimeroman;font-size:150%;color:white;text-align:center;border-radius:20px 20px;"><b>7. Predicting New Data</b></p>
<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:white; background-color:#8B0000" data-toggle="popover">Content</a>

In [None]:
customer = {
 'customerid': '8879-zkjof',
 'gender': 'male',
 'seniorcitizen': 1,
 'partner': 'no',
 'dependents': 'no',
 'tenure': 41,
 'phoneservice': 'yes',
 'multiplelines': 'no',
 'internetservice': 'dsl',
 'onlinesecurity': 'yes',
 'onlinebackup': 'no',
 'deviceprotection': 'yes',
 'techsupport': 'yes',
 'streamingtv': 'yes',
 'streamingmovies': 'yes',
 'contract': 'one_year',
 'paperlessbilling': 'yes',
 'paymentmethod': 'bank_transfer_(automatic)',
 'monthlycharges': 79.85,
 'totalcharges': 2990.75,
}
x_new = dv.transform([customer])
model.predict_proba(x_new)

In [None]:
customer2 = {
 'gender': 'female',
 'seniorcitizen': 1,
 'partner': 'no',
 'dependents': 'no',
 'phoneservice': 'yes',
 'multiplelines': 'yes',
 'internetservice': 'fiber_optic',
 'onlinesecurity': 'no',
 'onlinebackup': 'no',
 'deviceprotection': 'no',
 'techsupport': 'no',
 'streamingtv': 'yes',
 'streamingmovies': 'no',
 'contract': 'month-to-month',
 'paperlessbilling': 'yes',
 'paymentmethod': 'electronic_check',
 'tenure': 1,
 'monthlycharges': 85.7,
 'totalcharges': 85.7
}
x_new2 = dv.transform([customer2])
model.predict_proba(x_new2)

### Thanks for reading 😀 I hope you enjoy it. If you like this notebook, please upvote it 👍
### Don't forget to follow us on [YouTube](http://youtube.com/tirendazacademy) | [Medium](http://tirendazacademy.medium.com) | [Twitter](http://twitter.com/tirendazacademy) | [Instagram](http://instagram.com/tirendazacademy) | [Tiktok](https://www.tiktok.com/@tirendazacademy) 😎