Skip to content

gaocegege/awesome-open-source-mlops

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Open Source MLOps

Discord Awesome

An awesome & curated list of best open source MLOps tools for data scientists.

Contribute

Contributions are most welcome, please adhere to the contribution guidelines.

Community

You can join our gitter channel to discuss.

Table of Contents

Training

IDEs and Workspaces

  • code server - Run VS Code on any machine anywhere and access it in the browser.
  • conda - OS-agnostic, system-level binary package manager and ecosystem.
  • Docker - Moby is an open-source project created by Docker to enable and accelerate software containerization.
  • Jupyter Notebooks - The Jupyter notebook is a web-based notebook environment for interactive computing.

Frameworks for Training

  • Caffe - A fast open framework for deep learning.
  • ColossalAI - An integrated large-scale model training system with efficient parallelization techniques.
  • DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
  • Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
  • Jax - Autograd and XLA for high-performance machine learning research.
  • Kedro - Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code.
  • Keras - Keras is a deep learning API written in Python, running on top of the machine learning platform TensorFlow.
  • LightGBM - A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
  • MegEngine - MegEngine is a fast, scalable and easy-to-use deep learning framework, with auto-differentiation.
  • MindSpore - MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
  • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler.
  • Oneflow - OneFlow is a performance-centered and open-source deep learning framework.
  • PaddlePaddle - Machine Learning Framework from Industrial Practice.
  • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration.
  • PyTorchLightning - The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.
  • XGBoost - Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library.
  • TensorFlow - An Open Source Machine Learning Framework for Everyone.
  • VectorFlow - A minimalist neural network library optimized for sparse data and single machine environments.

Experiment Tracking

  • Aim - an easy-to-use and performant open-source experiment tracker.
  • Guild AI - Experiment tracking, ML developer tools.
  • MLRun - Machine Learning automation and tracking. -Kedro-Viz - Kedro-Viz is an interactive development tool for building data science pipelines with Kedro. Kedro-Viz also allows users to view and compare different runs in the Kedro project.
  • LabNotebook - LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.
  • Sacred - Sacred is a tool to help you configure, organize, log and reproduce experiments.

Visualization

  • Maniford - A model-agnostic visual debugging tool for machine learning.
  • netron - Visualizer for neural network, deep learning, and machine learning models.
  • OpenOps - Bring multiple data streams into one dashboard.
  • TensorBoard - TensorFlow's Visualization Toolkit.
  • TensorSpace - Neural network 3D visualization framework, build interactive and intuitive model in browsers, support pre-trained deep learning models from TensorFlow, Keras, TensorFlow.js.
  • dtreeviz - A python library for decision tree visualization and model interpretation.
  • Zetane Viewer - ML models and internal tensors 3D visualizer.

Model

Model Management

  • dvc - Data Version Control | Git for Data & Models | ML Experiments Management
  • ModelDB - Open Source ML Model Versioning, Metadata, and Experiment Management
  • ormb - Docker for Your ML/DL Models Based on OCI Artifacts

Pretrained Model

  • HuggingFace - State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
  • PaddleNLP - Easy-to-use and Fast NLP library with awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications.
  • PyTorch Image Models - PyTorch image models, scripts, pretrained weights.

Serving

Frameworks/Servers for Serving

  • BentoML - The Unified Model Serving Framework
  • ForestFlow - Policy-driven Machine Learning Model Server.
  • MOSEC - A machine learning model serving framework with dynamic batching and pipelined stages, provides an easy-to-use Python interface.
  • Multi Model Server - Multi Model Server is a tool for serving neural net models for inference.
  • Neuropod - A uniform interface to run deep learning models from multiple frameworks
  • Pinferencia - Python + Inference - Model Deployment library in Python. Simplest model inference server ever.
  • Service Streamer - Boosting your Web Services of Deep Learning Applications.
  • TFServing - A flexible, high-performance serving system for machine learning models.
  • Torchserve - Serve, optimize and scale PyTorch models in production
  • Triton Server (TRTIS) - The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Optimizations

  • FeatherCNN - FeatherCNN is a high performance inference engine for convolutional neural networks.
  • Forward - A library for high performance deep learning inference on NVIDIA GPUs.
  • NCNN - ncnn is a high-performance neural network inference framework optimized for the mobile platform.
  • PocketFlow - use AutoML to do model compression.
  • TNN - A uniform deep learning inference framework for mobile, desktop and server.

Observability

Large Scale Deployment

ML Platforms

  • ClearML - Auto-Magical CI/CD to streamline your ML workflow. Experiment Manager, MLOps and Data-Management.
  • MLflow - Open source platform for the machine learning lifecycle.
  • Kserve - Standardized Serverless ML Inference Platform on Kubernetes
  • Kubeflow - Machine Learning Toolkit for Kubernetes.
  • PAI - Resource scheduling and cluster management for AI.
  • Polyaxon - Machine Learning Management & Orchestration Platform.
  • Seldon-core - An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Workflow

  • Argo - Workflow engine for Kubernetes.
  • Flyte - Kubernetes-native workflow automation platform for complex, mission-critical data and ML processes at scale.
  • Kubeflow - Machine Learning Pipelines for Kubeflow.
  • Metaflow - Build and manage real-life data science projects with ease!
  • ZenML - MLOps framework to create reproducible pipelines.

Scheduling

  • Kueue - Kubernetes-native Job Queueing.
  • PAI - Resource scheduling and cluster management for AI (Open-sourced by Microsoft).
  • Slurm - A Highly Scalable Workload Manager.
  • Volcano - A Cloud Native Batch System (Project under CNCF).
  • Yunikorn - Light-weight, universal resource scheduler for container orchestrator systems.

AutoML

  • Adanet - Tensorflow package for AdaNet.
  • Advisor - open-source implementation of Google Vizier for hyper parameters tuning.
  • Archai - a platform for Neural Network Search (NAS) that allows you to generate efficient deep networks for your applications.
  • auptimizer - An automatic ML model optimization tool.
  • autoai - A framework to find the best performing AI/ML model for any AI problem.
  • AutoGL - An autoML framework & toolkit for machine learning on graphs
  • AutoGluon - AutoML for Image, Text, and Tabular Data.
  • automl-gs - Provide an input CSV and a target field to predict, generate a model + code to run it.
  • autokeras - AutoML library for deep learning.
  • Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch.
  • auto-sklearn - an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator.
  • AutoWeka - hyperparameter search for Weka.
  • Chocolate - A fully decentralized hyperparameter optimization framework.
  • Dragonfly - An open source python library for scalable Bayesian optimisation.
  • Determined - scalable deep learning training platform with integrated hyperparameter tuning support; includes Hyperband, PBT, and other search methods.
  • DEvol (DeepEvolution) - a basic proof of concept for genetic architecture search in Keras.
  • EvalML - An open source python library for AutoML.
  • FEDOT - AutoML framework for the design of composite pipelines.
  • FLAML - Fast and lightweight AutoML (paper).
  • Goptuna - A hyperparameter optimization framework, inspired by Optuna.
  • HpBandSter - a framework for distributed hyperparameter optimization.
  • HPOlib2 - a library for hyperparameter optimization and black box optimization benchmarks.
  • Hyperband - open source code for tuning hyperparams with Hyperband.
  • Hypernets - A General Automated Machine Learning Framework.
  • Hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
  • hyperunity - A toolset for black-box hyperparameter optimisation.
  • Katib - Katib is a Kubernetes-native project for automated machine learning (AutoML).
  • Keras Tuner - Hyperparameter tuning for humans.
  • learn2learn - PyTorch Meta-learning Framework for Researchers.
  • Ludwig - a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
  • MOE - a global, black box optimization engine for real world metric optimization by Yelp.
  • Model Search - a framework that implements AutoML algorithms for model architecture search at scale.
  • NASGym - a proof-of-concept OpenAI Gym environment for Neural Architecture Search (NAS).
  • NNI - An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
  • Optuna - A hyperparameter optimization framework.
  • Ray Tune - Scalable Hyperparameter Tuning.
  • REMBO - Bayesian optimization in high-dimensions via random embedding.
  • RoBO - a Robust Bayesian Optimization framework.
  • scikit-optimize(skopt) - Sequential model-based optimization with a scipy.optimize interface.
  • Spearmint - a software package to perform Bayesian optimization.
  • TPOT - one of the very first AutoML methods and open-source software packages.
  • Torchmeta - A Meta-Learning library for PyTorch.
  • Vegas - an AutoML algorithm tool chain by Huawei Noah's Arb Lab.

Data

Data Management

  • Dolt - Git for Data.
  • DVC - Data Version Control | Git for Data & Models | ML Experiments Management.
  • Hub - Hub is a dataset format with a simple API for creating, storing, and collaborating on AI datasets of any size.
  • Quilt - A self-organizing data hub for S3.

Data Ingestion

Data Storage

  • LakeFS - Git-like capabilities for your object storage.

Data Transformation

Feature Engineering

  • FeatureTools - An open source python framework for automated feature engineering

Data & Feature enrichment

  • Upgini - Free automated data & feature enrichment library for machine learning: automatically searches through thousands of ready-to-use features from public and community shared data sources and enriches your training dataset with only the accuracy improving features

Performance

ML Compiler

Profiling

⬆ back to top

About

An awesome & curated list of best open source MLOps tools for data scientists.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%