This project predicts whether a customer is likely to leave (churn) a telecom service using machine learning.
👉 Churn means when a customer stops using a service.
Companies use churn prediction to identify customers who might leave and take actions to retain them.
Given customer data such as tenure, monthly charges, and contract type, the goal is to predict:
➡️ Will the customer churn? (Yes / No)
This is a binary classification problem.
The dataset contains customer information such as:
- Tenure (how long the customer stayed)
- Monthly Charges
- Total Charges
- Contract Type
- Internet Service
- Payment Method
- Churn (Target Variable)
- Python
- Pandas & NumPy → data handling
- Matplotlib & Seaborn → visualization
- Scikit-learn → machine learning models
- XGBoost → advanced model
Loaded the dataset using pandas.
- Converted incorrect data types (e.g., TotalCharges)
- Handled missing values using
errors='coerce'
- Analyzed churn distribution
- Studied relationships between features (e.g., tenure vs churn)
- Used visualizations for better understanding
- Converted categorical variables into numeric form
- Used encoding techniques like label encoding / one-hot encoding
- Split data into training and testing sets (80:20)
Trained multiple models:
- Logistic Regression
- Decision Tree
- Random Forest
- XGBoost
Used multiple metrics:
- Accuracy
- Precision
- Recall
- F1 Score
- ROC-AUC
XGBoost performed the best due to its ability to handle complex patterns and nonlinear relationships.
- Customers with low tenure are more likely to churn
- Month-to-month contracts have higher churn rates
- Higher monthly charges may increase churn probability
- Clone the repository
- Install dependencies
- Run