## 1.Load the dataset

In [2]:
# Load the dataset customer_churn_netflix.csv
import pandas as pd
df = pd.read_csv('customer_churn_netflix.csv')
df.head()

Unnamed: 0,user_id,subscription_plan,monthly_watch_hours,num_profiles,days_since_last_login,age,device_type,payment_issue_flag,international_usage,support_tickets,churn
0,1,Standard,10.1,1,51,50,Tablet,0,7,0,1
1,2,Premium,4.7,3,14,62,Mobile,0,6,0,1
2,3,Premium,12.0,3,45,46,Mobile,0,9,0,0
3,4,Standard,14.8,1,24,19,Tablet,0,0,0,0
4,5,Basic,9.0,1,20,52,Mobile,1,1,1,0


In [3]:
df.columns

Index(['user_id', 'subscription_plan', 'monthly_watch_hours', 'num_profiles',
       'days_since_last_login', 'age', 'device_type', 'payment_issue_flag',
       'international_usage', 'support_tickets', 'churn'],
      dtype='object')

## 2. Data Preprocessing
Here, we have 2 columns **subscription_plan** and **device_type** which have categories in text format. We need to convert them to numerical category. For that purpose, we can use **_Label Encoding_**. <br>
Here are the usage of label encoding:
1.  Label Encoding changes text labels into numeric values so machine learning models can understand them.
2. It assigns a unique integer to each category, making it easy to use with most ML algorithms.
3. Each category gets a distinct number, avoiding confusion between different labels.


In [5]:
# Label Encoding for subscription_plan column
from sklearn.preprocessing import LabelEncoder
le_1 =  LabelEncoder()
df['subscription_plan'] = le_1.fit_transform(df['subscription_plan'])

In [6]:
# Perform label encoding for df['device_type'] but need to create new object 
le_2 = LabelEncoder()
df['device_type'] = le_2.fit_transform(df['device_type'])


In [7]:
df.sample(5)

Unnamed: 0,user_id,subscription_plan,monthly_watch_hours,num_profiles,days_since_last_login,age,device_type,payment_issue_flag,international_usage,support_tickets,churn
1487,936,2,12.849618,3,38,38,2,0,2,0,1
195,196,2,13.6,3,15,43,1,0,3,0,0
1045,576,2,8.78505,3,59,46,2,0,1,0,1
443,444,2,9.6,4,21,17,0,0,1,1,0
18,19,2,10.2,1,17,17,3,0,0,0,0


## 3. Split the dependent variables(y) and independent variables(X)

In [9]:
X = df.drop(columns = ['churn']) 
y = df['churn']
X

Unnamed: 0,user_id,subscription_plan,monthly_watch_hours,num_profiles,days_since_last_login,age,device_type,payment_issue_flag,international_usage,support_tickets
0,1,2,10.100000,1,51,50,3,0,7,0
1,2,1,4.700000,3,14,62,1,0,6,0
2,3,1,12.000000,3,45,46,1,0,9,0
3,4,2,14.800000,1,24,19,3,0,0,0
4,5,0,9.000000,1,20,52,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...
1583,665,0,9.064565,4,35,38,2,0,3,0
1584,470,2,4.242942,3,37,24,2,0,5,0
1585,327,2,8.228009,2,53,36,2,0,1,0
1586,228,2,4.605227,2,51,22,1,0,4,0


## 4. Train Test Split


In [11]:
# Perform train_test_split
from sklearn.model_selection import train_test_split

## 5. Train the model 

In [13]:
# import the LinearRegression model from sklearn
from sklearn.linear_model import LogisticRegression
logistic=LogisticRegression()

In [14]:
# Train the model 
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)


## 6. Predicting the Model on the Test Set

In [16]:
# Predicting the dependent variable values for the test set
logistic.fit(X_train,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [17]:
y_pred = logistic.predict(X_test)

In [18]:
y_pred

array([0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0,
       1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1,
       0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0,
       1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0,
       1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1,
       0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
       1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0,
       1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1,
       1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0,

## 7. Evaluate the model 

In [20]:
logistic.score(X_test,y_test)

0.7421383647798742

In [21]:
# compute confusion_matrix, classification_report
from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_test,y_pred)
clf = classification_report(y_test,y_pred)

print('Confusion Matrix:\n', cm)
print('\nClassification Report:\n', clf)

Confusion Matrix:
 [[125  46]
 [ 36 111]]

Classification Report:
               precision    recall  f1-score   support

           0       0.78      0.73      0.75       171
           1       0.71      0.76      0.73       147

    accuracy                           0.74       318
   macro avg       0.74      0.74      0.74       318
weighted avg       0.74      0.74      0.74       318



**What we learned:** 
- Quick Revision of _Logistic Regression_
- new term **_Label Encoding_** and its usage.ðŸŽ‰ðŸŽ‰