# Teleco Customer Churn Project

Description of Project:

The Teleco Customer Churn project focuses on predicting customer churn for a telecommunications company. Customer churn refers to the phenomenon where customers discontinue their services or switch to a competitor. The goal of this project is to develop a predictive model that can identify customers who are likely to churn, enabling the company to take proactive measures to retain them.
The dataset used for this project contains information about telecommunication customers, including demographic details, service usage patterns, account information, and customer churn status. The dataset is preprocessed to handle missing values, encode categorical variables, and scale numerical features.

In [69]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

In [70]:
# Load the data
data = pd.read_csv('telco-custmer-churn.csv', delimiter='\t')

In [71]:
# Print the columns in the dataset
print(data.columns)

Index(['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents',
       'tenure', 'PhoneService', 'MultipleLines', 'InternetService',
       'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport',
       'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling',
       'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'],
      dtype='object')


In [72]:
# Convert categorical variables to numerical using one-hot encoding
data_encoded = pd.get_dummies(data, drop_first=True)


In [73]:
# Split the data into features (X) and target (y)
X = data_encoded.drop('Churn_Yes', axis=1)  # Remove the target column
y = data_encoded['Churn_Yes']

In [74]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [79]:
# Train the Random Forest classifier
rf = RandomForestClassifier()
rf.fit(X_train, y_train)

In [80]:
# Make predictions on the testing set
y_pred = rf.predict(X_test)


In [81]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)

In [82]:
# Print the results
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{confusion_mat}")

Accuracy: 0.7998580553584103
Confusion Matrix:
[[965  71]
 [211 162]]
