Skip to content

😎 A curated list of awesome machine learning engineering tools

Notifications You must be signed in to change notification settings

rootAir/awesome-mle

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome MLE Awesome

A curated list of awesome machine learning engineering tools.

Inspired by awesome-python.


Cron Job Monitoring

Tools for monitoring cron jobs (recurring jobs).

Data Exploration

Tools for performing data exploration.

  • Google Colab - Hosted Jupyter notebook service that requires no setup to use.
  • Jupyter Notebook - Web-based notebook environment for interactive computing.
  • JupyterLab - The next-generation user interface for Project Jupyter.

Data Processing

Tools related to data processing and data pipelines.

  • Airflow - Platform to programmatically author, schedule, and monitor workflows.
  • Hadoop - Framework that allows for the distributed processing of large data sets across clusters of computers.
  • Spark - Unified analytics engine for large-scale data processing.

Data Version Control

Tools for performing data version control.

  • DVC - Management and versioning of datasets and machine learning models.

Data Visualization

Tools for data visualization, reports and dashboards.

  • Data Studio - Reporting solution for power users who want to go beyond the data and dashboards of Google Analytics.
  • Metabase - The simplest, fastest way to get business intelligence and analytics to everyone.
  • Redash - Connect to any data source, easily visualize, dashboard and share your data.
  • Superset - Modern, enterprise-ready business intelligence web application.
  • Tableau - Powerful and fastest growing data visualization tool used in the business intelligence industry.

Feature Store

Feature store tools for data serving.

  • Feast - End-to-end open source feature store for machine learning.

Hyperparameter Tuning

Tools and libraries to perform hyperparameter tuning.

  • Katib - Kubernetes-based system for hyperparameter tuning and neural architecture search.
  • Tune - Python library for experiment execution and hyperparameter tuning at any scale.

Knowledge Sharing

Tools for sharing knowledge to the entire team/company.

  • Knowledge Repo - Knowledge sharing platform for data scientists and other technical professions.
  • Kyso - One place for data insights so your entire team can learn from your data.

Machine Learning Platform

Complete machine learning platform solutions.

  • Algorithmia - Securely govern your machine learning operations with a healthy ML lifecycle.
  • CNVRG - An end-to-end machine learning platform to build and deploy AI models at scale.
  • Dataiku - Platform democratizing access to data and enabling enterprises to build their own path to AI.
  • DataRobot - AI platform that democratizes data science and automates the end-to-end machine learning at scale.
  • Domino - One place for your data science tools, apps, results, models, and knowledge.
  • H2O - Open source leader in AI with a mission to democratize AI for everyone.
  • Hopsworks - Open-source platform for developing and operating machine learning models at scale.
  • Iguazio - Data science platform that automates MLOps with end-to-end machine learning pipelines.
  • Knime - Create and productionize data science using one easy and intuitive environment.
  • Kubeflow - Making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
  • Modzy - AI platform and marketplace offering scalable, secure, and ready-to-deploy AI models.
  • Pachyderm - Combines data lineage with end-to-end pipelines on Kubernetes, engineered for the enterprise.
  • Sagemaker - Fully managed service that provides the ability to build, train, and deploy ML models quickly.

Model Lifecycle

Tools for managing model lifecycle (tracking experiments, parameters and metrics).

  • Comet - Track your datasets, code changes, experimentation history, and models.
  • Mlflow - Open source platform for the machine learning lifecycle.
  • Neptune AI - The most lightweight experiment management tool that fits any workflow.

Model Serving

Tools for serving models in production.

  • BentoML - Open-source platform for high-performance ML model serving.
  • Cortex - Machine learning model serving infrastructure.
  • GraphPipe - Machine learning model deployment made simple.
  • KFServing - Kubernetes custom resource definition for serving machine learning (ML) models on arbitrary frameworks.
  • PredictionIO - Supports event collection, deployment of algorithms, evaluation, querying predictive results via REST APIs.
  • Seldon - Take your ML projects from POC to production with maximum efficiency and minimal risk.
  • TensorFlow Serving - Flexible, high-performance serving system for machine learning models, designed for production.

Optimization Tools

Optimization tools related to model scalability in production.

  • Dask - Provides advanced parallelism for analytics, enabling performance at scale for the tools you love.
  • Mahout - Distributed linear algebra framework and mathematically expressive Scala DSL.
  • MLlib - Apache Spark's scalable machine learning library.
  • Modin - Speed up your Pandas workflows by changing a single line of code.
  • Ray - Fast and simple framework for building and running distributed applications.
  • Singa - Apache top level project, focusing on distributed training of deep learning and machine learning models.
  • Tpot - Automated machine learning tool that optimizes machine learning pipelines using genetic programming.

Workflow Tools

Tools and frameworks to create workflows or pipelines in the machine learning context.

  • Argo - Open source container-native workflow engine for orchestrating parallel jobs on Kubernetes.
  • Kedro - Library that implements software engineering best-practice for data and ML pipelines.
  • Metaflow - Human-friendly library that helps scientists and engineers build and manage real-life data science projects.
  • Prefect - A workflow management system, designed for modern infrastructure.

Resources

Where to discover new tools and discuss about existing ones.

Articles

Podcasts

Slack

Websites

Contributing

All contributions are welcome! Please take a look at the contribution guidelines first.

About

😎 A curated list of awesome machine learning engineering tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%