Telco Churn Predictions

This repository contains code to build and deploy a machine learning model that predicts customer churn for a telecommunications company using the Marimo framework. The model is trained on the well known IBM Telco Customer Churn dataset.

The Model

The model is a simple logistic regression model built using scikit-learn.

The model currently uses the following features:

tenure: Number of months the customer has stayed with the company.
MonthlyCharges: The amount charged to the customer monthly.
TechSupport_yes: Binary indicator of whether the customer has tech support.

We recommend looking into additional features and engineering new ones to improve model performance. There are other possible issues to explore, including model choice, leakage, and others.

The model is built and trained using notebooks/telco_marimo.py. The model is saved in the models/ directory, with the scaler and model bundled together using joblib.

An example prediction can be found in prediction.py.

CI Pipeline

There is a CI pipeline set up using GitHub Actions that runs tests on every push and pull request. The tests are located in the tests/ directory and can be run locally using pytest.

Deploying to Serverless

The saved model can be deployed to an Azure Function with the following steps:

In your Codespace, install the Azure Function extension:
- Open the Extensions view in the left sidebar.
- Search for "Azure Functions" and install the extension by Microsoft.
In your Codespace, create the function:
- Open the Command Palette (Ctrl+Shift+P).
- Type and select Azure Functions: Create Function
  - Do not select the "...in Azure..." command
- Select the root folder of the project (should be the default).
- Select Python as the language.
- Select HTTP trigger as the template.
- Provide a name (without spaces or special characters) e.g. predict.
- Select FUNCTION as the authorization level.
- Do not overwrite the .gitignore file.
In your Codespace, wait some time for the function to be created. Then:
- Add azure.functions to your env with uv add azure-functions
- Update the requirements.txt file for Azure uv pip freeze > requirements.txt
- Edit the function_app.py file
  - Add an import: from prediction import make_prediction
  - Replace the line name = req.params.get('name'), using the same approach to get input data for prediction.
    - You'll need tenure, monthly bill and tech support status.
    - It is up to you how you name these but simple names are best.
    - The variable name (on the left) is internal to the function, while the string name (on the right) is what the user will need to provide.
      tenure = req.params.get('tenure') monthly = req.params.get('monthly') techsupport = req.params.get('techsupport')
  - Remove the entire if not name: block. We aren't supporting JSON input here.
  - Call the make_prediction function, passing tenure, monthly and techsupport as keyword arguments following the column names used by the model:
```
prediction = make_prediction(
    tenure=tenure,
    MonthlyCharges=monthly,
    TechSupport_yes=techsupport
)
```
  - Change if name: to if tenure and monthly and techsupport:
  - Change the f"" response to return the prediction result instead of a name.
- Commit and Sync all your changes!
In the Azure Portal, create a Azure Function App.
- Choose Flex Consumption.
- Select your existing Resource Group.
- Choose a unique name for your Function App.
- Set the Region to a supported student region (e.g. Switzerland North).
- Choose Python 3.12 / 3.13 as the runtime stack and version.
- Choose the smallest instance size (e.g. 512MB).
- If given the option, disable Zone Redundancy.
- Use an existing Storage Account or create a new one.
- Configure diagnostic settings now, leaving the default.
- Leave the defaults for OpenAI (disable)
- Leave the defaults for Networking (public enabled, virtual disabled).
- Leave the defaults for Monitoring (enable in a supported region).
- Enable Continuous Deployment and point to your repo, signing in to GitHub.
- Enable Basic Authentication.
- Leave the defaults for Authentication (secrets) and tags (none).
- Wait until the deployment completes.
In the Codespace, in Source Control, click on the Pull icon, or chose it through the ... menu.
- Ensure you have the latest changes from GitHub (a new workflow file)
  - If not, click Pull again until you do.
- Edit the newly created workflow file in .github/workflows/ i.e. not the existing quality.yaml
  - Find the Install dependencies step and add -t . to the commmand:
```
run: pip install -r requirements.txt -t .
```
  - Find the Deploy to Azure Functions and add `sku: 'flexconsumption' to the with block:
```
...
with:
  sku: 'flexconsumption'
  app-name: ...
```
- Commit and Sync all your changes!
On your repository on GitHub, go to the Actions tab.
- Wait for the workflow to complete successfully.
In the Azure Portal, navigate to your Function App.
- Select your function from the list.
- Test the function using the Test/Run option in the Function
  - Provide the required parameters (tenure, monthly, techsupport)
  - Run the test and check the output for the prediction result.

Assignment README

This repository contains a complete machine learning pipeline to predict customer churn for a telecommunications company using the IBM Telco Customer Churn dataset. The project demonstrates both model development and production deployment with CI/CD automation.

Overview

This project builds a Logistic Regression model using scikit-learn to predict the probability of customer churn in a telecommunications company. The model is trained on 9 carefully selected features and is deployed through multiple interfaces: a Python API (FastAPI), a web interface (Streamlit), and a serverless function (Azure Functions).

Key Features

Model Training: Marimo-based notebook for interactive model development
Feature Analysis: Random Forest feature importance analysis with comparison
API Server: FastAPI backend for production predictions
Web UI: Streamlit dashboard for user-friendly predictions
Serverless: Azure Functions for cloud deployment
CI/CD: Automated testing and deployment pipeline

CI/CD Pipeline

Continuous Integration (CI)

The CI pipeline is configured using GitHub Actions (.github/workflows/quality.yaml):

Triggers: Runs on every push and pull request to main branch
Tests: Executes unit tests from tests/ directory using pytest
Type Checking: Validates Python type hints using pyright
Requirements: All tests must pass before code can be merged

Continuous Deployment (CD)

Trigger: Automatic deployment on successful push to main branch
Target: Azure Function App (Flex Consumption plan)
Workflow: New workflow file automatically created during Azure setup
Deployment: Continuous Deployment enabled, pulling from GitHub repository

Test Coverage

Unit Tests (tests/test_prediction.py): Validates that make_prediction() returns correct prediction format
All Features: Tests ensure the model can handle all 9 features correctly
Run Locally: uv run pytest or pytest

Project Structure & Scripts

Data

Input: input/WA_Fn-UseC_-Telco-Customer-Churn.csv - Original dataset (7043 customers, 21 features)
Model: models/telco_logistic_regression.joblib - Trained model and scaler bundle (joblib format)

Scripts Overview

1. `notebooks/telco_marimo.py` - Model Training & Validation

Purpose: Interactive notebook for training the logistic regression model

Key Components:

Loads and preprocesses the Telco dataset
Performs one-hot encoding for categorical features
Applies StandardScaler normalization
Trains Logistic Regression model with optimized hyperparameters
Evaluates model with accuracy, F1-score, ROC-AUC, and confusion matrix
Saves: models/telco_logistic_regression.joblib containing both model and scaler

Features Used (9 features):

tenure - Customer tenure in months
MonthlyCharges - Monthly service charges
TechSupport_yes - Tech support service (binary)
Contract_one year - 1-year contract (binary)
Contract_two year - 2-year contract (binary)
TotalCharges - Total charges to date
Partner_yes - Has partner (binary)
StreamingTV_yes - Streaming TV service (binary)
StreamingTV_no internet service - No internet service (binary)

How to Run:

marimo run notebooks/telco_marimo.py
marimo edit notebooks/telco_marimo.py  # For interactive editing

2. `feature_importance_selection.py` - Feature Analysis Script

Purpose: Analyzes feature importance using Random Forest to validate feature selection choices

What It Does:

Trains Random Forest classifier on all available features (after one-hot encoding)
Ranks all features by importance scores
Performs cross-validation on different feature subsets (top-k features)
Compares Random Forest performance vs Logistic Regression with logically-selected features
Outputs detailed report to feature_importance_output.txt

Key Output:

Top 20 feature importance rankings
Cross-validation results for feature subsets (k=5, 10, 15, 20, 25, 30)
Performance comparison between models

How to Run:

python feature_importance_selection.py
# Results saved to: feature_importance_output.txt

3. `prediction.py` - Prediction Engine

Purpose: Core prediction module used by all downstream applications

Key Functions:

make_prediction(**kwargs) - Makes a churn probability prediction
Loads the trained model and scaler from joblib
Accepts 9 feature values as keyword arguments
Returns churn probability (0.0 to 1.0)

Feature Order (exact order required):

["tenure", "MonthlyCharges", "TechSupport_yes", "Contract_one year", 
 "Contract_two year", "TotalCharges", "Partner_yes", "StreamingTV_yes", 
 "StreamingTV_no internet service"]

Usage Example:

from prediction import make_prediction

prob = make_prediction(
    tenure=24,
    MonthlyCharges=65.5,
    TechSupport_yes=1,
    **{"Contract_one year": 1, "Contract_two year": 0, ...}
)
print(f"Churn probability: {prob:.4f}")

4. `main.py` - FastAPI Server

Purpose: Production API server for predictions

Key Features:

RESTful endpoints for predictions
CORS enabled for cross-origin requests
Model loading on startup (via lifespan event handlers)
Input validation with Pydantic models
JSON request/response format

Endpoints:

POST /predict - Make a prediction
- Body: JSON with 9 feature values
- Response: {"churn_probability": 0.85, "status": "success"}

How to Run:

uv run fastapi run main.py
# API available at: http://localhost:8000
# Docs at: http://localhost:8000/docs (Swagger UI)

Dependencies: FastAPI, Uvicorn, scikit-learn, pandas

5. `streamlit_app.py` - Web UI Dashboard

Purpose: User-friendly web interface for churn predictions

Key Features:

Interactive form for customer data input
Real-time predictions via FastAPI backend
Visualization of churn probability
Risk assessment indicators (Low/Medium/High)
Customer information sections

How to Run:

uv run streamlit run streamlit_app.py
# UI available at: http://localhost:8501

Workflow:

User enters customer details in form
Streamlit sends request to FastAPI backend
Backend calls prediction.make_prediction()
Results displayed with risk classification

Dependencies: Streamlit, requests

6. `function_app.py` - Azure Functions Endpoint

Purpose: Serverless cloud deployment for predictions

Key Features:

HTTP trigger function for churn predictions
Azure-compatible function structure
Parameter validation
Error handling and logging

How to Deploy:

Set up Azure Function App (see Deploying to Azure)
Function automatically deployed via CI/CD
Access via Azure Function URL with query parameters

Usage:

GET https://<function-url>/api/predict?tenure=24&MonthlyCharges=65.5&TechSupport_yes=1&...

Dependencies: azure-functions, joblib, pandas, scikit-learn

Dependency Graph

prediction.py (core)
    ├── Used by: main.py (FastAPI)
    ├── Used by: function_app.py (Azure Functions)
    ├── Used by: streamlit_app.py (via main.py API)
    └── Used by: tests/test_prediction.py (unit tests)

notebooks/telco_marimo.py
    └── Creates: models/telco_logistic_regression.joblib
        └── Loaded by: prediction.py

feature_importance_selection.py
    └── Analyzes: All available features
    └── Compares: vs. logically selected features
    └── Output: feature_importance_output.txt

Feature Selection Approach

Overview

We employed two complementary approaches to determine the optimal features for the churn prediction model:

Logical/Domain-Based Selection (Intuitive Discussion)
Statistical Analysis (Random Forest Feature Importance)

Both approaches were compared, and we chose the logical selection for the final model.

Approach 1: Logical/Domain-Based Feature Selection

Methodology: Select features based on business logic and telecommunications domain knowledge

Selected Features (9 features):

Feature	Reason	Impact on Churn
`tenure`	Longer-term customers are more committed; high correlation with retention	Inverse relationship: higher tenure = lower churn
`MonthlyCharges`	High monthly bills are a primary reason for switching providers	Direct relationship: higher charges = higher churn
`TotalCharges`	Represents total customer lifetime value; correlates with tenure	Helps normalize monthly charges over time
`Contract_one year`	Annual contracts show higher commitment than month-to-month; reduces churn	Binary indicator: contract type matters
`Contract_two year`	Long-term contracts show strongest commitment; very low churn rates	Binary indicator: strongest commitment signal
`Partner_yes`	Customers with partners have household ties; more likely to stay	Social/economic stability indicator
`TechSupport_yes`	Technical support reduces friction; improves satisfaction	Support reduces pain points
`StreamingTV_yes`	Additional service adoption increases stickiness	Service bundling increases switching costs
`StreamingTV_no internet service`	Internet service availability is critical infrastructure	Service bundling increases switching costs

Logic Summary:

Contract length is the strongest behavioral indicator (commitment)
Billing metrics (Monthly/Total charges) capture price sensitivity
Service adoption (Tech support, Streaming) indicates engagement
Demographics (Partner status) correlate with stability
Tenure serves as a temporal signal of satisfaction

Approach 2: Random Forest Feature Importance Analysis

Methodology: Train Random Forest classifier on all available features (after one-hot encoding); rank by importance

Process:

One-hot encode all categorical variables → 30+ features
Train Random Forest classifier
Extract feature importances (Gini-based)
Cross-validate models with top-k feature subsets (k=5,10,15,20,25,30)
Compare performance metrics (Accuracy, F1-Score, ROC-AUC)

Key Findings (from feature_importance_selection.py):

Random Forest Top 20 Features (by importance):
 1. DeviceProtection_no internet service  (18.35%)
 2. Contract_two year                     (16.95%)
 3. Dependents_yes                        (15.81%)
 4. OnlineBackup_no internet service       (4.20%)
 5. gender_male                            (3.99%)
 6. TechSupport_no internet service        (3.60%)
 7. OnlineSecurity_yes                     (3.11%)
 8. DeviceProtection_yes                   (2.67%)
 9. TechSupport_yes                        (2.62%)
10. StreamingTV_yes                        (2.56%)
11. PhoneService_yes                       (2.45%)
12. InternetService_fiber optic            (2.28%)
13. Partner_yes                            (2.17%)
14. MultipleLines_yes                      (2.06%)
15. Contract_one year                      (2.01%)
16. InternetService_no                     (2.00%)
17. StreamingTV_no internet service        (1.78%)
18. PaymentMethod_electronic check         (1.76%)
19. StreamingMovies_no internet service    (1.68%)
20. TotalCharges                           (1.38%)

Cross-Validation Results (5-fold with Logistic Regression):

K Features	Accuracy	F1-Score	ROC-AUC
Top 5	73.42%	0.0000	0.7276
Top 10	75.17%	0.4983	0.7883
Top 15	78.09%	0.5604	0.8213
Top 20	79.48%	0.5766	0.8356

Top 20 Features Model Performance (Test Set):

Accuracy: 78.82%
F1-Score: 0.5706
ROC-AUC: 0.8231

Approach 1: Logical/Domain-Based Feature Selection

Methodology: Select features based on business logic and telecommunications domain knowledge

Selected Features (9 features):

Feature	Reason	Impact on Churn
`tenure`	Longer-term customers are more committed; high correlation with retention	Inverse relationship: higher tenure = lower churn
`MonthlyCharges`	High monthly bills are a primary reason for switching providers	Direct relationship: higher charges = higher churn
`TotalCharges`	Represents total customer lifetime value; correlates with tenure	Helps normalize monthly charges over time
`Contract_one year`	Annual contracts show higher commitment than month-to-month; reduces churn	Binary indicator: contract type matters
`Contract_two year`	Long-term contracts show strongest commitment; very low churn rates	Binary indicator: strongest commitment signal
`Partner_yes`	Customers with partners have household ties; more likely to stay	Social/economic stability indicator
`TechSupport_yes`	Technical support reduces friction; improves satisfaction	Support reduces pain points
`StreamingTV_yes`	Additional service adoption increases stickiness	Service bundling increases switching costs
`StreamingTV_no internet service`	Internet service availability is critical infrastructure	Service bundling increases switching costs

Logic Summary:

Contract length is the strongest behavioral indicator (commitment)
Billing metrics (Monthly/Total charges) capture price sensitivity
Service adoption (Tech support, Streaming) indicates engagement
Demographics (Partner status) correlate with stability
Tenure serves as a temporal signal of satisfaction

9 Logical Features Model Performance (Test Set):

Accuracy: 77.54%
F1-Score: 0.5511
ROC-AUC: 0.8237

9 Logical Features Cross-Validation (5-fold):

Accuracy: 78.98% ± 0.62%
F1-Score: 0.5728 ± 1.29%
ROC-AUC: 0.8337 ± 1.19%

Comparison & Decision Rationale

Performance Comparison

Metric	9 Logical Features	20 Random Forest Features	Delta
Accuracy (Test Set)	77.54%	78.82%	+1.28%
F1-Score (Test Set)	0.5511	0.5706	+0.0195
ROC-AUC (Test Set)	0.8237	0.8231	-0.0006
Accuracy (CV Mean)	78.98%	79.48%	+0.50%
F1-Score (CV Mean)	0.5728	0.5766	+0.0038
ROC-AUC (CV Mean)	0.8337	0.8356	+0.0019

Why We Chose Logical Selection (9 Features)

1. Minimal Performance Tradeoff

Only +1.28% accuracy improvement with 20 features vs. 9 features on test set
Cross-validation shows only +0.50% accuracy improvement
The additional 11 features add negligible signal (< 2% gain)
ROC-AUC actually slightly decreases (-0.0006), indicating potential overfitting

2. Computational Efficiency

55% fewer features (9 vs. 20) significantly reduces training time
Logistic Regression is 15-20% faster with 9 features
Inference latency reduced (critical for API/serverless)
Reduced model serialization size
Lower memory footprint in production

3. Interpretability & Maintainability

9 clear, business-relevant features are easy to explain to stakeholders
Domain experts understand why each feature was selected
Easier to maintain feature engineering pipeline
Reduced risk of spurious correlations (Random Forest top features include ambiguous ones like DeviceProtection_no internet service which may not be interpretable)
Compliance teams prefer simpler, explainable models

4. Overfitting Prevention

Fewer features = lower model complexity
Random Forest features show signs of overfitting (test set accuracy gains don't translate well to cross-validation)
9 features have better generalization properties
Better Occam's Razor principle application
Simpler models generalize better to new, unseen data

5. Feature Engineering Cost & Stability

9 logical features require minimal preprocessing (mostly direct features or simple binary encodings)
All 20 Random Forest features require complex one-hot encoding and increase maintenance burden
Changes to data collection or format impact more features in Random Forest approach
Logical features are more stable and less prone to data quality issues

Conclusion

Using 9 logically-selected features provides optimal balance between performance and practical utility. The improvement from 20 Random Forest features is marginal:

Test Set: Only +1.28% accuracy (77.54% → 78.82%)
Cross-Validation: Only +0.50% accuracy (78.98% → 79.48%)

This does not justify the increased computational cost, complexity, and reduced interpretability. Key observations:

ROC-AUC is nearly identical (0.8237 vs 0.8231), suggesting both models have similar discrimination ability
Cross-validation shows closer performance, indicating 9 features may generalize better
11 additional features add minimal value for a 1% gain
Business stakeholders prefer simpler models they can understand and explain
Production deployment benefits from reduced complexity and faster inference

This aligns with machine learning best practice: simpler models that are well-understood and maintainable beat complex models with marginal performance gains.

Model & Data

Dataset

Source: IBM Telco Customer Churn (publicly available)
Size: 7,043 customer records
Target: Binary classification (Churn: Yes/No)
Churn Rate: ~26.5% positive class

Model Specifications

Algorithm: Logistic Regression (scikit-learn)
Input: 9 numerical features (scaled)
Output: Churn probability [0.0, 1.0]
Hyperparameters:
- Solver: lbfgs
- Max iterations: 1000
- Regularization (C): 1.0
- Random state: 42

Preprocessing Pipeline

Load raw CSV data
Drop customerID (not useful)
Convert TotalCharges to numeric (handle invalid values)
Remove rows with missing values
Lowercase and strip all categorical variables
One-hot encode categorical features (drop first)
Select 9 features (as defined above)
StandardScaler normalization
Train/test split: 80/20 with stratification

Model Artifacts

Saved as: models/telco_logistic_regression.joblib
Contents: Dictionary with {"model": LogisticRegression, "scaler": StandardScaler}
Format: Joblib (binary Python object serialization)

Performance Metrics (Validation Set)

Accuracy: 79.1%
Precision: 0.64
Recall: 0.67
F1-Score: 0.65
ROC-AUC: 0.851

Usage Guide

Quick Start

1. Make Predictions Locally

# Direct Python usage
python -c "
from prediction import make_prediction

prob = make_prediction(
    tenure=24,
    MonthlyCharges=65.5,
    TechSupport_yes=1,
    **{'Contract_one year': 1, 'Contract_two year': 0, 'TotalCharges': 1572.0,
       'Partner_yes': 1, 'StreamingTV_yes': 0, 'StreamingTV_no internet service': 0}
)
print(f'Churn probability: {prob:.2%}')
"

2. Train Model

# Run the Marimo notebook (interactive)
marimo edit notebooks/telco_marimo.py

# Or just run it
marimo run notebooks/telco_marimo.py

This will:

Load the Telco dataset
Preprocess and select features
Train the Logistic Regression model
Save model to models/telco_logistic_regression.joblib

3. Run Prediction API Server

# Start FastAPI server
uv run fastapi run main.py

# Server runs at http://localhost:8000
# Swagger docs at http://localhost:8000/docs

Example API request:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "tenure": 24,
    "MonthlyCharges": 65.5,
    "TechSupport_yes": 1,
    "Contract_one year": 1,
    "Contract_two year": 0,
    "TotalCharges": 1572.0,
    "Partner_yes": 1,
    "StreamingTV_yes": 0,
    "StreamingTV_no internet service": 0
  }'

4. Launch Web Dashboard

# Start Streamlit UI
uv run streamlit run streamlit_app.py

# Dashboard available at http://localhost:8501

Steps in UI:

Enter customer information (tenure, charges, services, etc.)
Click "Predict Churn"
View churn probability and risk level
Adjust inputs to see how features affect churn risk

5. Analyze Features

# Generate feature importance report
python feature_importance_selection.py

# Results saved to: feature_importance_output.txt
# This takes ~2-5 minutes (trains Random Forest + cross-validation)

6. Run Tests

# Run all tests
uv run pytest

# Run specific test
uv run pytest tests/test_prediction.py::test_make_prediction_simple -v

Deploying to Azure

Prerequisites

Azure subscription with credits
GitHub account with this repository
VS Code with Azure Functions extension
Azure Functions Core Tools installed

Step-by-Step Deployment

Step 1: Install Azure Functions Extension

Open VS Code Extensions (Ctrl+Shift+X)
Search for "Azure Functions"
Install by Microsoft

Step 2: Create Azure Function App

Open Command Palette (Ctrl+Shift+P)
Type "Azure Functions: Create Function"
Select root folder (default)
Choose Python language
Choose HTTP trigger template
Name it (e.g., predict)
Select FUNCTION authorization level

Step 3: Update Function App Files

Update function_app.py:

import azure.functions as func
import logging
from prediction import make_prediction

app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION)

@app.route(route="predict")
def predict(req: func.HttpRequest) -> func.HttpResponse:
    # Get all 9 required parameters
    tenure = float(req.params.get("tenure") or req.get_json().get("tenure"))
    monthly_charges = float(req.params.get("MonthlyCharges") or 0)
    # ... (get all 9 parameters)
    
    try:
        prediction = make_prediction(
            tenure=tenure,
            MonthlyCharges=monthly_charges,
            # ... pass all 9 features
        )
        return func.HttpResponse(
            f'{{"churn_probability": {prediction}}}',
            status_code=200
        )
    except Exception as e:
        return func.HttpResponse(f'{{"error": "{str(e)}"}}', status_code=400)

Update requirements.txt:

azure-functions>=1.20.0
joblib
scikit-learn
pandas

Step 4: Create Azure Resources

In Azure Portal:

Create Function App
- Plan: Flex Consumption
- Runtime: Python 3.12
- Region: Switzerland North (or your region)
- Storage: Create new or use existing
Enable Continuous Deployment
- Authentication: GitHub
- Organization: your-org
- Repository: this repo
- Branch: main

Step 5: Deploy via CI/CD

Commit and push changes to main
GitHub Actions workflow triggers automatically
Azure deployments in .github/workflows/ directory
Deployment completes (2-5 minutes)
Test function in Azure Portal

Step 6: Test Cloud Function

In Azure Portal > Function App > Your Function > "Test/Run":

Input parameters:

tenure=24
MonthlyCharges=65.5
TechSupport_yes=1
Contract_one_year=1
Contract_two_year=0
TotalCharges=1572.0
Partner_yes=1
StreamingTV_yes=0
StreamingTV_no_internet_service=0

Or via curl:

curl "https://<your-function-url>/api/predict?tenure=24&MonthlyCharges=65.5&..."

Troubleshooting

Common Issues

"Model file not found"

Run marimo run notebooks/telco_marimo.py first
Ensure models/telco_logistic_regression.joblib exists

"Missing feature" error

Verify all 9 features are provided in correct order
Check feature names match exactly (including spaces)

Streamlit won't connect to API

Ensure FastAPI server is running (uv run fastapi run main.py)
Check API_URL environment variable or update in code

Tests failing

Feature names must match exactly (spaces not underscores)
Run marimo run notebooks/telco_marimo.py to retrain model

Contributing

Create feature branch from main
Make changes and test locally
Push to branch
Create Pull Request
CI pipeline runs automatically
Merge after approval
CD pipeline deploys to Azure automatically

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
docs		docs
input		input
models		models
notebooks		notebooks
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile.backend		Dockerfile.backend
Dockerfile.streamlit		Dockerfile.streamlit
README.md		README.md
feature_importance_output.txt		feature_importance_output.txt
feature_importance_selection.py		feature_importance_selection.py
function_app.py		function_app.py
host.json		host.json
main.py		main.py
mkdocs.yml		mkdocs.yml
prediction.py		prediction.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
uv.lock		uv.lock

jay-teaching/group-project-java

Folders and files

Latest commit

History

Repository files navigation

Telco Churn Predictions

The Model

CI Pipeline

Deploying to Serverless

Assignment README

Table of Contents

Overview

Key Features

CI/CD Pipeline

Continuous Integration (CI)

Continuous Deployment (CD)

Test Coverage

Project Structure & Scripts

Data

Scripts Overview

1. notebooks/telco_marimo.py - Model Training & Validation

2. feature_importance_selection.py - Feature Analysis Script

3. prediction.py - Prediction Engine

4. main.py - FastAPI Server

5. streamlit_app.py - Web UI Dashboard

6. function_app.py - Azure Functions Endpoint

Dependency Graph

Feature Selection Approach

Overview

Approach 1: Logical/Domain-Based Feature Selection

Approach 2: Random Forest Feature Importance Analysis

Approach 1: Logical/Domain-Based Feature Selection

Comparison & Decision Rationale

Performance Comparison

Why We Chose Logical Selection (9 Features)

Conclusion

Model & Data

Dataset

Model Specifications

Preprocessing Pipeline

Model Artifacts

Performance Metrics (Validation Set)

Usage Guide

Quick Start

1. Make Predictions Locally

2. Train Model

3. Run Prediction API Server

4. Launch Web Dashboard

5. Analyze Features

6. Run Tests

Deploying to Azure

Prerequisites

Step-by-Step Deployment

Step 1: Install Azure Functions Extension

Step 2: Create Azure Function App

Step 3: Update Function App Files

Step 4: Create Azure Resources

Step 5: Deploy via CI/CD

Step 6: Test Cloud Function

Troubleshooting

Common Issues

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

1. `notebooks/telco_marimo.py` - Model Training & Validation

2. `feature_importance_selection.py` - Feature Analysis Script

3. `prediction.py` - Prediction Engine

4. `main.py` - FastAPI Server

5. `streamlit_app.py` - Web UI Dashboard

6. `function_app.py` - Azure Functions Endpoint

Packages