Skip to content

mnoorchenar/AutoMLOps

Repository files navigation

title AutoMLOps
colorFrom purple
colorTo blue
sdk docker
app_port 7860
pinned false

🤖 AutoMLOps

Typing SVG

Python Flask Airflow MLflow Docker HuggingFace Status


🤖 AutoMLOps — A full-stack ML experiment tracking and pipeline orchestration platform. Train 50+ algorithms, run visual DAG pipelines powered by Apache Airflow, and manage models — all in one Docker container deployed to HuggingFace Spaces.



Table of Contents


✨ Features

🎨 Pipeline Studio Interactive full-screen DAG canvas with clickable nodes, slide-in config panel, and live execution terminal
✈️ Real Apache Airflow Pipelines execute as genuine Airflow DAGs with XCom, TaskInstance tracking, and DagRun polling
🤖 AutoML Engine Automated hyperparameter search across all algorithm categories for classification and regression tasks
📈 MLflow Tracking Every training run logs parameters, metrics, and model artifacts to a persistent SQLite-backed MLflow store
📦 Model Registry Register, version, and transition models through Staging → Production → Archived lifecycle stages
🌙 Theme Toggle Dark and light mode with instant CSS variable switching and localStorage persistence
🐳 Single-Container Deployment Flask + Airflow Scheduler + SQLite in one Docker image — no external services required

🏗️ Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                           AutoMLOps                                 │
│                                                                     │
│  ┌─────────────┐    ┌──────────────────┐    ┌───────────────────┐  │
│  │  Datasets   │───▶│   MLOps Engine   │───▶│   Flask API       │  │
│  │  (sklearn + │    │  (sklearn /      │    │   Backend         │  │
│  │   custom)   │    │   XGBoost /      │    └────────┬──────────┘  │
│  └─────────────┘    │   LightGBM /     │             │             │
│                     │   MLP)           │    ┌────────▼──────────┐  │
│                     └──────────────────┘    │  Pipeline Studio  │  │
│                                             │  AutoML Page      │  │
│  ┌─────────────┐    ┌──────────────────┐    │  Model Registry   │  │
│  │  MLflow DB  │◀───│ Airflow          │    └───────────────────┘  │
│  │  (SQLite)   │    │ Scheduler        │                           │
│  └─────────────┘    │ (DAGs / XCom /   │                           │
│                     │  TaskInstance)   │                           │
│                     └──────────────────┘                           │
└─────────────────────────────────────────────────────────────────────┘

🚀 Getting Started

Prerequisites

  • Python 3.11+
  • Docker (for containerised deployment)
  • Git

Local Installation

# 1. Clone the repository
git clone https://github.com/mnoorchenar/AutoMLOps.git
cd AutoMLOps

# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 3. Install app dependencies
pip install -r requirements.txt

# 4. Install Apache Airflow with official constraints
AIRFLOW_VERSION=2.10.4
pip install "apache-airflow==${AIRFLOW_VERSION}" \
    --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-3.11.txt"

# 5. Initialise Airflow metadata DB
export AIRFLOW_HOME=$(pwd)/airflow_home
export AIRFLOW__CORE__DAGS_FOLDER=$(pwd)/dags
airflow db migrate

# 6. Start the Airflow scheduler (background)
airflow scheduler &

# 7. Run the Flask application
python app.py

Open your browser at http://localhost:7860 🎉


🐳 Docker Deployment

# Build and run
docker build -t automlops .
docker run -p 7860:7860 automlops

# Or deploy directly to HuggingFace Spaces
# Push to a Space with sdk: docker in README frontmatter

The Dockerfile builds a single image containing:

  • Flask application (served via Gunicorn on port 7860)
  • Apache Airflow Scheduler (started by start.sh)
  • All Python dependencies with Airflow constraint-file pinning
  • Pre-initialised Airflow SQLite metadata DB (airflow db migrate)

📄 Pages

Page Route Description Status
🎨 Pipeline Studio / Interactive DAG canvas — click nodes to configure and execute pipelines ✅ Live
🤖 AutoML /automl Automated algorithm search across 50+ models for any dataset ✅ Live
📦 Model Registry /models Browse registered models, versions, and lifecycle stages ✅ Live

🔗 Pipelines

Three production-quality pipelines are pre-built and immediately executable from the Pipeline Studio:

Training Pipeline (training_pipeline)

Load Data → Validate → Preprocess → Feature Engineering → Train Model → Evaluate → Report → Register → Deploy to Staging

Retraining Pipeline (retraining_pipeline)

Drift Detection → Fetch New Data → Merge Datasets → Retrain Champion → A/B Test → Promote to Production

Data Processing Pipeline (data_pipeline)

Ingest Raw Data → Clean Data → Encode Features → Scale Features → Save to Feature Store

Each pipeline node is clickable in the UI — configurable nodes (dataset picker, algorithm picker) show a purple indicator dot. Execution logs stream live in the built-in terminal panel.


🧠 ML Algorithms

# AutoMLOps Algorithm Registry — 50+ algorithms across 2 tasks
ALGORITHMS = {
    "classification": {
        "Linear Models":          ["Logistic Regression", "Logistic Regression (L1)", "Ridge Classifier",
                                   "SGD Classifier", "Passive Aggressive", "Linear Discriminant Analysis"],
        "Tree-Based":             ["Decision Tree", "Random Forest", "Extra Trees",
                                   "Quadratic Discriminant Analysis"],
        "Ensemble / Boosting":    ["Gradient Boosting", "AdaBoost", "Bagging Classifier",
                                   "XGBoost", "LightGBM"],
        "Support Vector Machines":["SVC (RBF Kernel)", "SVC (Polynomial)", "SVC (Linear)", "LinearSVC"],
        "Probabilistic":          ["Gaussian Naive Bayes", "Bernoulli Naive Bayes", "Complement Naive Bayes"],
        "Instance-Based (KNN)":   ["KNN (k=3)", "KNN (k=5)", "KNN (k=9)"],
        "Neural Networks":        ["MLP (Small)", "MLP (Medium)", "MLP (Deep)"],
    },
    "regression": {
        "Linear Models":          ["Linear Regression", "Ridge Regression", "Lasso",
                                   "ElasticNet", "Bayesian Ridge", "Huber Regressor"],
        "Tree-Based":             ["Decision Tree Regressor", "Random Forest Regressor",
                                   "Extra Trees Regressor"],
        "Ensemble / Boosting":    ["Gradient Boosting Regressor", "AdaBoost Regressor",
                                   "Bagging Regressor", "XGBoost Regressor", "LightGBM Regressor"],
        "Support Vector Machines":["SVR (RBF)", "SVR (Linear)"],
        "Instance-Based (KNN)":   ["KNN Regressor (k=3)", "KNN Regressor (k=5)"],
        "Neural Networks":        ["MLP Regressor (Small)", "MLP Regressor (Medium)"],
    }
}

Built-in Datasets:

Dataset Task Samples Features
Iris Flowers Classification 150 4
Wine Quality Classification 178 13
Breast Cancer Classification 569 30
Diabetes Progression Regression 442 10
California Housing Regression 20,640 8

📁 Project Structure

AutoMLOps/
│
├── 📂 mlops/
│   ├── algorithms.py        # 50+ algorithm registry (classification + regression)
│   ├── datasets.py          # Dataset loaders (sklearn built-ins + California Housing)
│   ├── trainer.py           # Training & AutoML job management
│   └── airflow_runner.py    # Apache Airflow DAG trigger & watcher
│
├── 📂 pipelines/
│   ├── dag_engine.py        # Built-in DAG execution engine (fallback)
│   └── pipeline_defs.py     # Training / Retraining / Data pipeline definitions
│
├── 📂 dags/                 # Apache Airflow DAG files (parsed by scheduler)
│
├── 📂 templates/
│   ├── base.html            # Base layout: sidebar + topnav + theme toggle
│   ├── pipeline.html        # Pipeline Studio (home page — interactive DAG canvas)
│   ├── automl.html          # AutoML experiment launcher
│   └── models.html          # Model Registry browser
│
├── 📂 static/
│   ├── css/style.css        # Global styles + dark/light theme CSS variables
│   └── js/app.js            # Shared JS (toasts, theme switching)
│
├── 📄 app.py                # Flask application entry point + all API routes
├── 📄 Dockerfile            # Single-container image (Flask + Airflow)
├── 📄 start.sh              # Startup: Airflow scheduler → Gunicorn Flask
├── 📄 requirements.txt      # Python dependencies
└── 📄 README.md

👨‍💻 Author

Mohammad Noorchenarboo

Mohammad Noorchenarboo

Data Scientist  |  AI Researcher  |  Biostatistician

📍  Ontario, Canada    📧  mohammadnoorchenarboo@gmail.com

──────────────────────────────────────

LinkedIn  Personal Site  HuggingFace  Google Scholar  GitHub


🤝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'Add amazing feature'
  4. Push to the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Disclaimer

This project is developed strictly for educational and research purposes and does not constitute professional advice of any kind. All datasets used are either synthetically generated or publicly available — no real user data is stored. This software is provided "as is" without warranty of any kind; use at your own risk.


📜 License

Distributed under the MIT License. See LICENSE for more information.


GitHub Stars GitHub Forks

The name "AutoMLOps" is used purely for academic and research purposes. Any similarity to existing company names, products, or trademarks is entirely coincidental and unintentional. This project has no affiliation with any commercial entity.

About

AutoMLOps - created by repo-manager-hf

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors