# Project Walkthrough: Telecom Churn Prediction

This notebook provides a step-by-step demonstration of the end-to-end machine learning pipeline for the telecom churn use case. We will cover:

1.  **Data Loading & Validation:** Reading the raw data and validating it against a schema.
2.  **Model Training:** Training a logistic regression model on the processed data.
3.  **Making a Recommendation:** Using the trained model to make a prediction and generate a recommendation for a sample customer.

This notebook uses the scripts from the `src/` directory to ensure we are demonstrating the project's actual production code.

## 1. Data Loading and Validation

First, we run the data loading script (`src/data_loader.py`) on our raw telecom churn CSV. This script will:

- Validate the data using the schema at `src/schemas/telecom_schema.yml`.
- Create a clean, processed Parquet file in `data/processed/`.
- Generate a JSON log file to confirm the process ran successfully.

In [None]:
# We use the '!' symbol to execute shell commands from within the notebook.
!python ../src/data_loader.py telecom ../data/telecom_churn.csv

Now, let's inspect the processed data using pandas to confirm it's what we expect.

In [None]:
import pandas as pd

df = pd.read_parquet('../data/processed/telecom_processed.parquet')
df.head()

## 2. Model Training

Next, we run the training script (`src/train.py`). This script loads the processed Parquet file and:

- Applies a feature engineering pipeline (scaling numerical features, one-hot encoding categorical ones).
- Trains a `LogisticRegression` model.
- Saves the trained model pipeline and a JSON file with performance metrics to the `artifacts/` directory.

In [None]:
!python ../src/train.py telecom

Let's view the metrics that were generated during training.

In [None]:
import json

with open('../artifacts/telecom_metrics.json', 'r') as f:
    metrics = json.load(f)

print(json.dumps(metrics, indent=4))

## 3. Making a Recommendation

Finally, we use the recommendation script (`src/recommend.py`) to make a prediction for a new, single customer. We will use the trained model saved in the previous step.

Here, we define a sample customer who looks like a high-risk case (short tenure, basic services).

In [None]:
import sys
sys.path.append('..')

from src.recommend import recommend_telecom_retention

# Define a sample customer
sample_customer = {
    "gender": "Female",
    "SeniorCitizen": 0,
    "Partner": "No",
    "Dependents": "No",
    "tenure": 1,
    "PhoneService": "No",
    "MonthlyCharges": 29.85,
    "TotalCharges": 29.85
}

# Get the recommendation
recommendation = recommend_telecom_retention(sample_customer)
print(recommendation)

This demonstrates the full, end-to-end flow of the system: from raw data, to a trained model, to a final, actionable recommendation.