Skip to content

formlio/mlprague23

Repository files navigation

Reproducible, Portable, and Distributable ML Solutions in Python

When achieved, the combination of reproducibility, portability, and distributability in machine learning (ML) solutions becomes a powerful capability that unlocks numerous operational opportunities.

While reproducibility is a well-established practice in scientific research, it often doesn't receive the same recognition in the data product industry. Similarly, portability and distributability are typically considered irrelevant for custom solutions and are only pursued when explicitly required. However, with modern tooling, these properties can be easily achieved without significant additional effort. As a result, they offer substantial benefits such as highly collaborative research and development, inherent lifecycle management, effective model troubleshooting, seamless and flexible deployment (latency/throughput-optimal runtime modes), and even potential commoditization in the form of turnkey solutions.

In this workshop, we will delve deeper into these concepts, carefully examining the available technologies and reviewing existing tools. A significant portion of our time will be dedicated to working with the ForML framework, where we will implement a practical end-to-end ML solution that demonstrates all of these principles we've discussed.

Slides

Article

An accompanying article in more depth on the lifecycle patterns can be found on medium

Setup

  1. Clone the workshop repository:
$ git clone git@github.com:formlio/mlprague23.git
$ cd mlprague23
  1. Install Docker Engine along with the Docker Compose plugin (should be already part of any recent docker engine version).
  2. Spin up the workspace container from within the mlprague23 project root directory (this will need to bind ports 8888, 8000 and 4040 on your machine):
$ docker compose up -d
  1. Load the workspace notebook interface at http://127.0.0.1:8888/lab using your browser.