# Telco Customer Churn ML Pipeline

This notebook implements a complete machine learning pipeline for predicting customer churn in a telecommunications company. The pipeline includes data loading, preprocessing, model training, and evaluation.

## 1. Import Required Libraries

Import essential Python libraries for data manipulation, visualization, and machine learning model development.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, accuracy_score
import joblib
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Set visualization style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## 2. Load Dataset

Load the Telco Customer Churn dataset from the raw data directory into a pandas DataFrame.

In [None]:
# Load the Telco Churn dataset
df = pd.read_csv('../data/raw/telco_churn.csv')

print("Dataset loaded successfully!")
print(f"File path: ../data/raw/telco_churn.csv")

## 3. Dataset Overview

Display fundamental information about the dataset structure, dimensions, and basic statistics.

In [None]:
# Display dataset shape
print("=" * 80)
print("DATASET SHAPE")
print("=" * 80)
print(f"Number of rows: {df.shape[0]}")
print(f"Number of columns: {df.shape[1]}")
print()

In [None]:
# Display first few rows
print("=" * 80)
print("FIRST 5 ROWS")
print("=" * 80)
print(df.head())

In [None]:
# Display dataset information
print("\n" + "=" * 80)
print("DATASET INFORMATION")
print("=" * 80)
df.info()

In [None]:
# Display statistical summary
print("\n" + "=" * 80)
print("STATISTICAL SUMMARY")
print("=" * 80)
print(df.describe())