## 1.Load the dataset

In [1]:
# Load the dataset customer_churn_netflix.csv
import pandas as pd
df = pd.read_csv('customer_churn_netflix.csv')
df.head()

Unnamed: 0,user_id,subscription_plan,monthly_watch_hours,num_profiles,days_since_last_login,age,device_type,payment_issue_flag,international_usage,support_tickets,churn
0,1,Standard,10.1,1,51,50,Tablet,0,7,0,1
1,2,Premium,4.7,3,14,62,Mobile,0,6,0,1
2,3,Premium,12.0,3,45,46,Mobile,0,9,0,0
3,4,Standard,14.8,1,24,19,Tablet,0,0,0,0
4,5,Basic,9.0,1,20,52,Mobile,1,1,1,0


## 2. Data Preprocessing
Here, we have 2 columns **subscription_plan** and **device_type** which have categories in text format. We need to convert them to numerical category. For that purpose, we can use **_Label Encoding_**. <br>
Here are the usage of label encoding:
1.  Label Encoding changes text labels into numeric values so machine learning models can understand them.
2. It assigns a unique integer to each category, making it easy to use with most ML algorithms.
3. Each category gets a distinct number, avoiding confusion between different labels.


In [3]:
# Label Encoding for subscription_plan column
from sklearn.preprocessing import LabelEncoder
le_1 =  LabelEncoder()
df['subscription_plan'] = le_1.fit_transform(df['subscription_plan'])
df['subscription_plan']

0       2
1       1
2       1
3       2
4       0
       ..
1583    0
1584    2
1585    2
1586    2
1587    2
Name: subscription_plan, Length: 1588, dtype: int64

In [4]:
# Perform label encoding for df['device_type'] but need to create new object 
le_2 = LabelEncoder()
df['device_type'] = le_2.fit_transform(df['device_type'])
df['device_type']

0       3
1       1
2       1
3       3
4       1
       ..
1583    2
1584    2
1585    2
1586    1
1587    1
Name: device_type, Length: 1588, dtype: int64

In [5]:
df.sample(5)

Unnamed: 0,user_id,subscription_plan,monthly_watch_hours,num_profiles,days_since_last_login,age,device_type,payment_issue_flag,international_usage,support_tickets,churn
1283,392,2,15.110092,1,39,68,2,0,6,0,1
362,363,2,11.7,4,52,55,3,0,7,0,0
883,884,0,7.8,1,23,17,1,0,9,0,0
1382,16,2,8.210292,2,42,51,1,0,6,0,1
214,215,1,6.1,3,38,31,0,0,3,1,1


## 3. Split the dependent variables(y) and independent variables(X)

In [6]:
X = df.drop(columns = ['churn']) 
y = df['churn']
X

Unnamed: 0,user_id,subscription_plan,monthly_watch_hours,num_profiles,days_since_last_login,age,device_type,payment_issue_flag,international_usage,support_tickets
0,1,2,10.100000,1,51,50,3,0,7,0
1,2,1,4.700000,3,14,62,1,0,6,0
2,3,1,12.000000,3,45,46,1,0,9,0
3,4,2,14.800000,1,24,19,3,0,0,0
4,5,0,9.000000,1,20,52,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...
1583,665,0,9.064565,4,35,38,2,0,3,0
1584,470,2,4.242942,3,37,24,2,0,5,0
1585,327,2,8.228009,2,53,36,2,0,1,0
1586,228,2,4.605227,2,51,22,1,0,4,0


## 4. Train Test Split


In [7]:
# Perform train_test_split
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.1, random_state=32)
X_train.shape, X_test.shape

((1429, 10), (159, 10))

## 5. Train the model 

In [21]:
# import the LinearRegression model from sklearn
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(max_iter=1500)

In [22]:
# Train the model 
model.fit(X_train,y_train)

0,1,2
,penalty,'l2'
,dual,False
,tol,0.0001
,C,1.0
,fit_intercept,True
,intercept_scaling,1
,class_weight,
,random_state,
,solver,'lbfgs'
,max_iter,1500


## 6. Predicting the Model on the Test Set

In [24]:
# Predicting the dependent variable values for the test set
y_pred = model.predict(X_test)
y_pred

array([0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0,
       0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
       1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0,
       0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0,
       1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1,
       0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0,
       1, 1, 0, 1, 0])

In [25]:
y_test

588     0
654     0
1154    1
831     1
458     0
       ..
1444    1
757     0
944     0
1064    1
608     0
Name: churn, Length: 159, dtype: int64

## 7. Evaluate the model 

In [26]:
# compute confusion_matrix, classification_report
from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_test, y_pred)
clf = classification_report(y_test, y_pred)

print('Confusion Matrix:\n', cm)
print('\nClassification Report:\n', clf)

Confusion Matrix:
 [[55 19]
 [12 73]]

Classification Report:
               precision    recall  f1-score   support

           0       0.82      0.74      0.78        74
           1       0.79      0.86      0.82        85

    accuracy                           0.81       159
   macro avg       0.81      0.80      0.80       159
weighted avg       0.81      0.81      0.80       159



**What we learned:** 
- Quick Revision of _Logistic Regression_
- new term **_Label Encoding_** and its usage.ðŸŽ‰ðŸŽ‰