Skip to content

jay-teaching/group-project-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Telco Churn Predictions

This repository contains code to build and deploy a machine learning model that predicts customer churn for a telecommunications company using the Marimo framework. The model is trained on the well known IBM Telco Customer Churn dataset.

The Model

The model is a simple logistic regression model built using scikit-learn.

The model currently uses the following features:

  • tenure: Number of months the customer has stayed with the company.
  • MonthlyCharges: The amount charged to the customer monthly.
  • TechSupport_yes: Binary indicator of whether the customer has tech support.

We recommend looking into additional features and engineering new ones to improve model performance. There are other possible issues to explore, including model choice, leakage, and others.

The model is built and trained using notebooks/telco_marimo.py. The model is saved in the models/ directory, with the scaler and model bundled together using joblib.

An example prediction can be found in prediction.py.

CI Pipeline

There is a CI pipeline set up using GitHub Actions that runs tests on every push and pull request. The tests are located in the tests/ directory and can be run locally using pytest.

Deploying to Serverless

The saved model can be deployed to an Azure Function with the following steps:

  1. In your Codespace, install the Azure Function extension:

    • Open the Extensions view in the left sidebar.
    • Search for "Azure Functions" and install the extension by Microsoft.
  2. In your Codespace, create the function:

    • Open the Command Palette (Ctrl+Shift+P).
    • Type and select Azure Functions: Create Function
      • Do not select the "...in Azure..." command
    • Select the root folder of the project (should be the default).
    • Select Python as the language.
    • Select HTTP trigger as the template.
    • Provide a name (without spaces or special characters) e.g. predict.
    • Select FUNCTION as the authorization level.
    • Do not overwrite the .gitignore file.
  3. In your Codespace, wait some time for the function to be created. Then:

    • Add azure.functions to your env with uv add azure-functions
    • Update the requirements.txt file for Azure uv pip freeze > requirements.txt
    • Edit the function_app.py file
      • Add an import: from prediction import make_prediction
      • Replace the line name = req.params.get('name'), using the same approach to get input data for prediction.
        • You'll need tenure, monthly bill and tech support status.
        • It is up to you how you name these but simple names are best.
        • The variable name (on the left) is internal to the function, while the string name (on the right) is what the user will need to provide.
          tenure = req.params.get('tenure')
          monthly = req.params.get('monthly')
          techsupport = req.params.get('techsupport')
      • Remove the entire if not name: block. We aren't supporting JSON input here.
      • Call the make_prediction function, passing tenure, monthly and techsupport as keyword arguments following the column names used by the model:
        prediction = make_prediction(
            tenure=tenure,
            MonthlyCharges=monthly,
            TechSupport_yes=techsupport
        )
      • Change if name: to if tenure and monthly and techsupport:
      • Change the f"" response to return the prediction result instead of a name.
    • Commit and Sync all your changes!
  4. In the Azure Portal, create a Azure Function App.

    • Choose Flex Consumption.
    • Select your existing Resource Group.
    • Choose a unique name for your Function App.
    • Set the Region to a supported student region (e.g. Switzerland North).
    • Choose Python 3.12 / 3.13 as the runtime stack and version.
    • Choose the smallest instance size (e.g. 512MB).
    • If given the option, disable Zone Redundancy.
    • Use an existing Storage Account or create a new one.
    • Configure diagnostic settings now, leaving the default.
    • Leave the defaults for OpenAI (disable)
    • Leave the defaults for Networking (public enabled, virtual disabled).
    • Leave the defaults for Monitoring (enable in a supported region).
    • Enable Continuous Deployment and point to your repo, signing in to GitHub.
    • Enable Basic Authentication.
    • Leave the defaults for Authentication (secrets) and tags (none).
    • Wait until the deployment completes.
  5. In the Codespace, in Source Control, click on the Pull icon, or chose it through the ... menu.

    • Ensure you have the latest changes from GitHub (a new workflow file)
      • If not, click Pull again until you do.
    • Edit the newly created workflow file in .github/workflows/ i.e. not the existing quality.yaml
      • Find the Install dependencies step and add -t . to the commmand:

        run: pip install -r requirements.txt -t .
      • Find the Deploy to Azure Functions and add `sku: 'flexconsumption' to the with block:

        ...
        with:
          sku: 'flexconsumption'
          app-name: ...
    • Commit and Sync all your changes!
  6. On your repository on GitHub, go to the Actions tab.

    • Wait for the workflow to complete successfully.
  7. In the Azure Portal, navigate to your Function App.

    • Select your function from the list.
    • Test the function using the Test/Run option in the Function
      • Provide the required parameters (tenure, monthly, techsupport)
      • Run the test and check the output for the prediction result.

Assignment README

This repository contains a complete machine learning pipeline to predict customer churn for a telecommunications company using the IBM Telco Customer Churn dataset. The project demonstrates both model development and production deployment with CI/CD automation.

Table of Contents

  1. Overview
  2. CI/CD Pipeline
  3. Project Structure & Scripts
  4. Feature Selection Approach
  5. Model & Data
  6. Usage Guide
  7. Deploying to Azure

Overview

This project builds a Logistic Regression model using scikit-learn to predict the probability of customer churn in a telecommunications company. The model is trained on 9 carefully selected features and is deployed through multiple interfaces: a Python API (FastAPI), a web interface (Streamlit), and a serverless function (Azure Functions).

Key Features

  • Model Training: Marimo-based notebook for interactive model development
  • Feature Analysis: Random Forest feature importance analysis with comparison
  • API Server: FastAPI backend for production predictions
  • Web UI: Streamlit dashboard for user-friendly predictions
  • Serverless: Azure Functions for cloud deployment
  • CI/CD: Automated testing and deployment pipeline

CI/CD Pipeline

Continuous Integration (CI)

The CI pipeline is configured using GitHub Actions (.github/workflows/quality.yaml):

  • Triggers: Runs on every push and pull request to main branch
  • Tests: Executes unit tests from tests/ directory using pytest
  • Type Checking: Validates Python type hints using pyright
  • Requirements: All tests must pass before code can be merged

Continuous Deployment (CD)

  • Trigger: Automatic deployment on successful push to main branch
  • Target: Azure Function App (Flex Consumption plan)
  • Workflow: New workflow file automatically created during Azure setup
  • Deployment: Continuous Deployment enabled, pulling from GitHub repository

Test Coverage

  • Unit Tests (tests/test_prediction.py): Validates that make_prediction() returns correct prediction format
  • All Features: Tests ensure the model can handle all 9 features correctly
  • Run Locally: uv run pytest or pytest

Project Structure & Scripts

Data

  • Input: input/WA_Fn-UseC_-Telco-Customer-Churn.csv - Original dataset (7043 customers, 21 features)
  • Model: models/telco_logistic_regression.joblib - Trained model and scaler bundle (joblib format)

Scripts Overview

1. notebooks/telco_marimo.py - Model Training & Validation

Purpose: Interactive notebook for training the logistic regression model

Key Components:

  • Loads and preprocesses the Telco dataset
  • Performs one-hot encoding for categorical features
  • Applies StandardScaler normalization
  • Trains Logistic Regression model with optimized hyperparameters
  • Evaluates model with accuracy, F1-score, ROC-AUC, and confusion matrix
  • Saves: models/telco_logistic_regression.joblib containing both model and scaler

Features Used (9 features):

  1. tenure - Customer tenure in months
  2. MonthlyCharges - Monthly service charges
  3. TechSupport_yes - Tech support service (binary)
  4. Contract_one year - 1-year contract (binary)
  5. Contract_two year - 2-year contract (binary)
  6. TotalCharges - Total charges to date
  7. Partner_yes - Has partner (binary)
  8. StreamingTV_yes - Streaming TV service (binary)
  9. StreamingTV_no internet service - No internet service (binary)

How to Run:

marimo run notebooks/telco_marimo.py
marimo edit notebooks/telco_marimo.py  # For interactive editing

2. feature_importance_selection.py - Feature Analysis Script

Purpose: Analyzes feature importance using Random Forest to validate feature selection choices

What It Does:

  • Trains Random Forest classifier on all available features (after one-hot encoding)
  • Ranks all features by importance scores
  • Performs cross-validation on different feature subsets (top-k features)
  • Compares Random Forest performance vs Logistic Regression with logically-selected features
  • Outputs detailed report to feature_importance_output.txt

Key Output:

  • Top 20 feature importance rankings
  • Cross-validation results for feature subsets (k=5, 10, 15, 20, 25, 30)
  • Performance comparison between models

How to Run:

python feature_importance_selection.py
# Results saved to: feature_importance_output.txt

3. prediction.py - Prediction Engine

Purpose: Core prediction module used by all downstream applications

Key Functions:

  • make_prediction(**kwargs) - Makes a churn probability prediction
  • Loads the trained model and scaler from joblib
  • Accepts 9 feature values as keyword arguments
  • Returns churn probability (0.0 to 1.0)

Feature Order (exact order required):

["tenure", "MonthlyCharges", "TechSupport_yes", "Contract_one year", 
 "Contract_two year", "TotalCharges", "Partner_yes", "StreamingTV_yes", 
 "StreamingTV_no internet service"]

Usage Example:

from prediction import make_prediction

prob = make_prediction(
    tenure=24,
    MonthlyCharges=65.5,
    TechSupport_yes=1,
    **{"Contract_one year": 1, "Contract_two year": 0, ...}
)
print(f"Churn probability: {prob:.4f}")

4. main.py - FastAPI Server

Purpose: Production API server for predictions

Key Features:

  • RESTful endpoints for predictions
  • CORS enabled for cross-origin requests
  • Model loading on startup (via lifespan event handlers)
  • Input validation with Pydantic models
  • JSON request/response format

Endpoints:

  • POST /predict - Make a prediction
    • Body: JSON with 9 feature values
    • Response: {"churn_probability": 0.85, "status": "success"}

How to Run:

uv run fastapi run main.py
# API available at: http://localhost:8000
# Docs at: http://localhost:8000/docs (Swagger UI)

Dependencies: FastAPI, Uvicorn, scikit-learn, pandas


5. streamlit_app.py - Web UI Dashboard

Purpose: User-friendly web interface for churn predictions

Key Features:

  • Interactive form for customer data input
  • Real-time predictions via FastAPI backend
  • Visualization of churn probability
  • Risk assessment indicators (Low/Medium/High)
  • Customer information sections

How to Run:

uv run streamlit run streamlit_app.py
# UI available at: http://localhost:8501

Workflow:

  1. User enters customer details in form
  2. Streamlit sends request to FastAPI backend
  3. Backend calls prediction.make_prediction()
  4. Results displayed with risk classification

Dependencies: Streamlit, requests


6. function_app.py - Azure Functions Endpoint

Purpose: Serverless cloud deployment for predictions

Key Features:

  • HTTP trigger function for churn predictions
  • Azure-compatible function structure
  • Parameter validation
  • Error handling and logging

How to Deploy:

  1. Set up Azure Function App (see Deploying to Azure)
  2. Function automatically deployed via CI/CD
  3. Access via Azure Function URL with query parameters

Usage:

GET https://<function-url>/api/predict?tenure=24&MonthlyCharges=65.5&TechSupport_yes=1&...

Dependencies: azure-functions, joblib, pandas, scikit-learn


Dependency Graph

prediction.py (core)
    ├── Used by: main.py (FastAPI)
    ├── Used by: function_app.py (Azure Functions)
    ├── Used by: streamlit_app.py (via main.py API)
    └── Used by: tests/test_prediction.py (unit tests)

notebooks/telco_marimo.py
    └── Creates: models/telco_logistic_regression.joblib
        └── Loaded by: prediction.py

feature_importance_selection.py
    └── Analyzes: All available features
    └── Compares: vs. logically selected features
    └── Output: feature_importance_output.txt

Feature Selection Approach

Overview

We employed two complementary approaches to determine the optimal features for the churn prediction model:

  1. Logical/Domain-Based Selection (Intuitive Discussion)
  2. Statistical Analysis (Random Forest Feature Importance)

Both approaches were compared, and we chose the logical selection for the final model.


Approach 1: Logical/Domain-Based Feature Selection

Methodology: Select features based on business logic and telecommunications domain knowledge

Selected Features (9 features):

Feature Reason Impact on Churn
tenure Longer-term customers are more committed; high correlation with retention Inverse relationship: higher tenure = lower churn
MonthlyCharges High monthly bills are a primary reason for switching providers Direct relationship: higher charges = higher churn
TotalCharges Represents total customer lifetime value; correlates with tenure Helps normalize monthly charges over time
Contract_one year Annual contracts show higher commitment than month-to-month; reduces churn Binary indicator: contract type matters
Contract_two year Long-term contracts show strongest commitment; very low churn rates Binary indicator: strongest commitment signal
Partner_yes Customers with partners have household ties; more likely to stay Social/economic stability indicator
TechSupport_yes Technical support reduces friction; improves satisfaction Support reduces pain points
StreamingTV_yes Additional service adoption increases stickiness Service bundling increases switching costs
StreamingTV_no internet service Internet service availability is critical infrastructure Service bundling increases switching costs

Logic Summary:

  • Contract length is the strongest behavioral indicator (commitment)
  • Billing metrics (Monthly/Total charges) capture price sensitivity
  • Service adoption (Tech support, Streaming) indicates engagement
  • Demographics (Partner status) correlate with stability
  • Tenure serves as a temporal signal of satisfaction

Approach 2: Random Forest Feature Importance Analysis

Methodology: Train Random Forest classifier on all available features (after one-hot encoding); rank by importance

Process:

  1. One-hot encode all categorical variables → 30+ features
  2. Train Random Forest classifier
  3. Extract feature importances (Gini-based)
  4. Cross-validate models with top-k feature subsets (k=5,10,15,20,25,30)
  5. Compare performance metrics (Accuracy, F1-Score, ROC-AUC)

Key Findings (from feature_importance_selection.py):

Random Forest Top 20 Features (by importance):
 1. DeviceProtection_no internet service  (18.35%)
 2. Contract_two year                     (16.95%)
 3. Dependents_yes                        (15.81%)
 4. OnlineBackup_no internet service       (4.20%)
 5. gender_male                            (3.99%)
 6. TechSupport_no internet service        (3.60%)
 7. OnlineSecurity_yes                     (3.11%)
 8. DeviceProtection_yes                   (2.67%)
 9. TechSupport_yes                        (2.62%)
10. StreamingTV_yes                        (2.56%)
11. PhoneService_yes                       (2.45%)
12. InternetService_fiber optic            (2.28%)
13. Partner_yes                            (2.17%)
14. MultipleLines_yes                      (2.06%)
15. Contract_one year                      (2.01%)
16. InternetService_no                     (2.00%)
17. StreamingTV_no internet service        (1.78%)
18. PaymentMethod_electronic check         (1.76%)
19. StreamingMovies_no internet service    (1.68%)
20. TotalCharges                           (1.38%)

Cross-Validation Results (5-fold with Logistic Regression):

K Features Accuracy F1-Score ROC-AUC
Top 5 73.42% 0.0000 0.7276
Top 10 75.17% 0.4983 0.7883
Top 15 78.09% 0.5604 0.8213
Top 20 79.48% 0.5766 0.8356

Top 20 Features Model Performance (Test Set):

  • Accuracy: 78.82%
  • F1-Score: 0.5706
  • ROC-AUC: 0.8231

Approach 1: Logical/Domain-Based Feature Selection

Methodology: Select features based on business logic and telecommunications domain knowledge

Selected Features (9 features):

Feature Reason Impact on Churn
tenure Longer-term customers are more committed; high correlation with retention Inverse relationship: higher tenure = lower churn
MonthlyCharges High monthly bills are a primary reason for switching providers Direct relationship: higher charges = higher churn
TotalCharges Represents total customer lifetime value; correlates with tenure Helps normalize monthly charges over time
Contract_one year Annual contracts show higher commitment than month-to-month; reduces churn Binary indicator: contract type matters
Contract_two year Long-term contracts show strongest commitment; very low churn rates Binary indicator: strongest commitment signal
Partner_yes Customers with partners have household ties; more likely to stay Social/economic stability indicator
TechSupport_yes Technical support reduces friction; improves satisfaction Support reduces pain points
StreamingTV_yes Additional service adoption increases stickiness Service bundling increases switching costs
StreamingTV_no internet service Internet service availability is critical infrastructure Service bundling increases switching costs

Logic Summary:

  • Contract length is the strongest behavioral indicator (commitment)
  • Billing metrics (Monthly/Total charges) capture price sensitivity
  • Service adoption (Tech support, Streaming) indicates engagement
  • Demographics (Partner status) correlate with stability
  • Tenure serves as a temporal signal of satisfaction

9 Logical Features Model Performance (Test Set):

  • Accuracy: 77.54%
  • F1-Score: 0.5511
  • ROC-AUC: 0.8237

9 Logical Features Cross-Validation (5-fold):

  • Accuracy: 78.98% ± 0.62%
  • F1-Score: 0.5728 ± 1.29%
  • ROC-AUC: 0.8337 ± 1.19%

Comparison & Decision Rationale

Performance Comparison

Metric 9 Logical Features 20 Random Forest Features Delta
Accuracy (Test Set) 77.54% 78.82% +1.28%
F1-Score (Test Set) 0.5511 0.5706 +0.0195
ROC-AUC (Test Set) 0.8237 0.8231 -0.0006
Accuracy (CV Mean) 78.98% 79.48% +0.50%
F1-Score (CV Mean) 0.5728 0.5766 +0.0038
ROC-AUC (CV Mean) 0.8337 0.8356 +0.0019

Why We Chose Logical Selection (9 Features)

1. Minimal Performance Tradeoff

  • Only +1.28% accuracy improvement with 20 features vs. 9 features on test set
  • Cross-validation shows only +0.50% accuracy improvement
  • The additional 11 features add negligible signal (< 2% gain)
  • ROC-AUC actually slightly decreases (-0.0006), indicating potential overfitting

2. Computational Efficiency

  • 55% fewer features (9 vs. 20) significantly reduces training time
  • Logistic Regression is 15-20% faster with 9 features
  • Inference latency reduced (critical for API/serverless)
  • Reduced model serialization size
  • Lower memory footprint in production

3. Interpretability & Maintainability

  • 9 clear, business-relevant features are easy to explain to stakeholders
  • Domain experts understand why each feature was selected
  • Easier to maintain feature engineering pipeline
  • Reduced risk of spurious correlations (Random Forest top features include ambiguous ones like DeviceProtection_no internet service which may not be interpretable)
  • Compliance teams prefer simpler, explainable models

4. Overfitting Prevention

  • Fewer features = lower model complexity
  • Random Forest features show signs of overfitting (test set accuracy gains don't translate well to cross-validation)
  • 9 features have better generalization properties
  • Better Occam's Razor principle application
  • Simpler models generalize better to new, unseen data

5. Feature Engineering Cost & Stability

  • 9 logical features require minimal preprocessing (mostly direct features or simple binary encodings)
  • All 20 Random Forest features require complex one-hot encoding and increase maintenance burden
  • Changes to data collection or format impact more features in Random Forest approach
  • Logical features are more stable and less prone to data quality issues

Conclusion

Using 9 logically-selected features provides optimal balance between performance and practical utility. The improvement from 20 Random Forest features is marginal:

  • Test Set: Only +1.28% accuracy (77.54% → 78.82%)
  • Cross-Validation: Only +0.50% accuracy (78.98% → 79.48%)

This does not justify the increased computational cost, complexity, and reduced interpretability. Key observations:

  1. ROC-AUC is nearly identical (0.8237 vs 0.8231), suggesting both models have similar discrimination ability
  2. Cross-validation shows closer performance, indicating 9 features may generalize better
  3. 11 additional features add minimal value for a 1% gain
  4. Business stakeholders prefer simpler models they can understand and explain
  5. Production deployment benefits from reduced complexity and faster inference

This aligns with machine learning best practice: simpler models that are well-understood and maintainable beat complex models with marginal performance gains.


Model & Data

Dataset

  • Source: IBM Telco Customer Churn (publicly available)
  • Size: 7,043 customer records
  • Target: Binary classification (Churn: Yes/No)
  • Churn Rate: ~26.5% positive class

Model Specifications

  • Algorithm: Logistic Regression (scikit-learn)
  • Input: 9 numerical features (scaled)
  • Output: Churn probability [0.0, 1.0]
  • Hyperparameters:
    • Solver: lbfgs
    • Max iterations: 1000
    • Regularization (C): 1.0
    • Random state: 42

Preprocessing Pipeline

  1. Load raw CSV data
  2. Drop customerID (not useful)
  3. Convert TotalCharges to numeric (handle invalid values)
  4. Remove rows with missing values
  5. Lowercase and strip all categorical variables
  6. One-hot encode categorical features (drop first)
  7. Select 9 features (as defined above)
  8. StandardScaler normalization
  9. Train/test split: 80/20 with stratification

Model Artifacts

  • Saved as: models/telco_logistic_regression.joblib
  • Contents: Dictionary with {"model": LogisticRegression, "scaler": StandardScaler}
  • Format: Joblib (binary Python object serialization)

Performance Metrics (Validation Set)

  • Accuracy: 79.1%
  • Precision: 0.64
  • Recall: 0.67
  • F1-Score: 0.65
  • ROC-AUC: 0.851

Usage Guide

Quick Start

1. Make Predictions Locally

# Direct Python usage
python -c "
from prediction import make_prediction

prob = make_prediction(
    tenure=24,
    MonthlyCharges=65.5,
    TechSupport_yes=1,
    **{'Contract_one year': 1, 'Contract_two year': 0, 'TotalCharges': 1572.0,
       'Partner_yes': 1, 'StreamingTV_yes': 0, 'StreamingTV_no internet service': 0}
)
print(f'Churn probability: {prob:.2%}')
"

2. Train Model

# Run the Marimo notebook (interactive)
marimo edit notebooks/telco_marimo.py

# Or just run it
marimo run notebooks/telco_marimo.py

This will:

  • Load the Telco dataset
  • Preprocess and select features
  • Train the Logistic Regression model
  • Save model to models/telco_logistic_regression.joblib

3. Run Prediction API Server

# Start FastAPI server
uv run fastapi run main.py

# Server runs at http://localhost:8000
# Swagger docs at http://localhost:8000/docs

Example API request:

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "tenure": 24,
    "MonthlyCharges": 65.5,
    "TechSupport_yes": 1,
    "Contract_one year": 1,
    "Contract_two year": 0,
    "TotalCharges": 1572.0,
    "Partner_yes": 1,
    "StreamingTV_yes": 0,
    "StreamingTV_no internet service": 0
  }'

4. Launch Web Dashboard

# Start Streamlit UI
uv run streamlit run streamlit_app.py

# Dashboard available at http://localhost:8501

Steps in UI:

  1. Enter customer information (tenure, charges, services, etc.)
  2. Click "Predict Churn"
  3. View churn probability and risk level
  4. Adjust inputs to see how features affect churn risk

5. Analyze Features

# Generate feature importance report
python feature_importance_selection.py

# Results saved to: feature_importance_output.txt
# This takes ~2-5 minutes (trains Random Forest + cross-validation)

6. Run Tests

# Run all tests
uv run pytest

# Run specific test
uv run pytest tests/test_prediction.py::test_make_prediction_simple -v

Deploying to Azure

Prerequisites

  • Azure subscription with credits
  • GitHub account with this repository
  • VS Code with Azure Functions extension
  • Azure Functions Core Tools installed

Step-by-Step Deployment

Step 1: Install Azure Functions Extension

  1. Open VS Code Extensions (Ctrl+Shift+X)
  2. Search for "Azure Functions"
  3. Install by Microsoft

Step 2: Create Azure Function App

  1. Open Command Palette (Ctrl+Shift+P)
  2. Type "Azure Functions: Create Function"
  3. Select root folder (default)
  4. Choose Python language
  5. Choose HTTP trigger template
  6. Name it (e.g., predict)
  7. Select FUNCTION authorization level

Step 3: Update Function App Files

Update function_app.py:

import azure.functions as func
import logging
from prediction import make_prediction

app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION)

@app.route(route="predict")
def predict(req: func.HttpRequest) -> func.HttpResponse:
    # Get all 9 required parameters
    tenure = float(req.params.get("tenure") or req.get_json().get("tenure"))
    monthly_charges = float(req.params.get("MonthlyCharges") or 0)
    # ... (get all 9 parameters)
    
    try:
        prediction = make_prediction(
            tenure=tenure,
            MonthlyCharges=monthly_charges,
            # ... pass all 9 features
        )
        return func.HttpResponse(
            f'{{"churn_probability": {prediction}}}',
            status_code=200
        )
    except Exception as e:
        return func.HttpResponse(f'{{"error": "{str(e)}"}}', status_code=400)

Update requirements.txt:

azure-functions>=1.20.0
joblib
scikit-learn
pandas

Step 4: Create Azure Resources

In Azure Portal:

  1. Create Function App

    • Plan: Flex Consumption
    • Runtime: Python 3.12
    • Region: Switzerland North (or your region)
    • Storage: Create new or use existing
  2. Enable Continuous Deployment

    • Authentication: GitHub
    • Organization: your-org
    • Repository: this repo
    • Branch: main

Step 5: Deploy via CI/CD

  1. Commit and push changes to main
  2. GitHub Actions workflow triggers automatically
  3. Azure deployments in .github/workflows/ directory
  4. Deployment completes (2-5 minutes)
  5. Test function in Azure Portal

Step 6: Test Cloud Function

In Azure Portal > Function App > Your Function > "Test/Run":

Input parameters:

tenure=24
MonthlyCharges=65.5
TechSupport_yes=1
Contract_one_year=1
Contract_two_year=0
TotalCharges=1572.0
Partner_yes=1
StreamingTV_yes=0
StreamingTV_no_internet_service=0

Or via curl:

curl "https://<your-function-url>/api/predict?tenure=24&MonthlyCharges=65.5&..."

Troubleshooting

Common Issues

"Model file not found"

  • Run marimo run notebooks/telco_marimo.py first
  • Ensure models/telco_logistic_regression.joblib exists

"Missing feature" error

  • Verify all 9 features are provided in correct order
  • Check feature names match exactly (including spaces)

Streamlit won't connect to API

  • Ensure FastAPI server is running (uv run fastapi run main.py)
  • Check API_URL environment variable or update in code

Tests failing

  • Feature names must match exactly (spaces not underscores)
  • Run marimo run notebooks/telco_marimo.py to retrain model

Contributing

  1. Create feature branch from main
  2. Make changes and test locally
  3. Push to branch
  4. Create Pull Request
  5. CI pipeline runs automatically
  6. Merge after approval
  7. CD pipeline deploys to Azure automatically

About

Group Project for the Java group

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 6

Languages