# Introduction: MLOps for Research Software Engineers



## What is an RSE?


https://ukrse.github.io/who.html

https://software-carpentry.org/blog/2015/06/what-is-a-research-software-engineer.html 

* employed to develop software for research
* spend more time developing software than doing research

### What are the key principles for RSEs?

* reproducability
* code quality
* code performance / optmisation
* reliability / robustness
* automation / productionising 

## What is MLOps

MLOps stands for Machine Learning Operations, and is derived from the term DevOps, or Developer / Operations. So what is DevOps then?

### What is DevOps
According to AWS: *DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity*. The idea is to bring together good software practices to be able to good quality software more quickly. 

What are the key principles in Devops? (https://about.gitlab.com/topics/devops/)
* Automation of the software development lifecycle
* Collaboration and communication
* Continuous improvement and minimization of waste
* Hyperfocus on user needs with short feedback loops

https://about.gitlab.com/nuxt-images/topics/lifecycle-2.png


Links
* https://aws.amazon.com/devops/what-is-devops/
* https://about.gitlab.com/topics/devops/
* https://www.pagerduty.com/resources/learn/essential-devops-roles/

### What is MLOps

MLOps aim to apply a similar philosophy to machine learning software 

https://www.bestdevops.com/what-are-roles-and-responsibilities-of-mlops-engineers/

What are the roles./tasks  of an MLOps Engineer?
1. Deployment and operationalization of MLOps, with a focus on:
   * Optimization of model hyperparameters
   * Evaluation and explicability of models
   * Automated retraining and model training
   * Model onboarding, operations, and decommissioning workflows.
   * Version control and governance for models
   * Data archiving and version control
   * Monitoring the model and its drift
2. To measure and improve services, create and use benchmarks, metrics, and monitoring.
3. Providing best practises and running proof-of-concepts for automated and efficient model operations on a large scale.
4. Creating and maintaining scalable MLOps frameworks to support client-specific models.



What are the skills?
* The demands for good programming knowledge, hands-on experience with ML frameworks, libraries, agile environments and deploying machine learning solutions using DevOps principles is quite high.
* A combination of machine learning, data engineering, and DevOps practices is required in this field.
* Machine learning is heavily reliant on data, so an experienced MLOps engineer should be well-versed in data structures, data modelling, and database management systems.
* DevOps engineers should always collaborate with Quality Assurance (QA) teams and be aware of the testing history throughout the CI/CD cycle. Understanding how your code is tested and maintained requires an understanding of the framework/environments led by QA.
* Understand the tools in the pipeline that serve different purposes, such as Continuous Integration servers, Configuration management, Deployment automation, Containers, Infrastructure Orchestration, Monitoring and Analytics, Testing and Cloud Quality tools, and network protocols.
* MLOps is based on the existing DevOps discipline. Knowing how to automate the entire DevOps pipeline, including app performance monitoring, infrastructure settings, and configurations, is a requirement.
* Model validation, model training, and other aspects of evaluating an ML system are in addition to traditional code tests like unit and integration testing.


# RSE Perspective

Why might an RSE care about adopting MLOpsor DevOps  practices? Mny RSEs are already doing this, even if they do not realise it, as often with research software the same person or small group of people is responsible for both developing and supporting a software package or library. Applying principles like automation and continous improvement can reduce the burden of supporting the software and make it easier to respond to requests from researchers for new features, extensions and improvements.


# What is MLOps

### Machine Learning Pipeline
What are the typical components of an end-to-end machine learning pipeline in a research project? 

* *Data Loading & Cleaning* - Start by loading the data, and filtering out any data considered to be unsuitable training and evaluation of machine learning algorithms. Selection of appropriate data is an important way in which domain expertise in vital in getting good results.
* *Feauture Engineering* - The first step is to prepare the data for presenting to the algorithm. Different ways of presenting the data will emphasise different features, and choosing the right features is important for getting good results. Knowledge of what features represent based on domain knowledge is again very important.
* *Train/test Split* - Before we train the algorithm, we need to split into train and test sets. This is to ensure out algorithm doesn't overfit, learning irrelevant details that are not representative of the whole space of possible data, but rather that in generalises well.
* *Data Preparation* - The machine learning algorithm only sees numbers as numbers, with no inherent understnading oif meaning or context. We need to ensure different features are scaled to be comparable, otherwise big numbers will be treated as more important by the algorithm, irrespective of what those numbers mean. Value are typically scaled to a range of [0,1] or, assuming a gaussian distribution, to have mean=0 and std_dev=1.
* *Algorithm Setup* - Here we select the particular algorithm e,.g. neural network, k-means clustering, and specify the hyperparameters. It is important to distinguish between parameters and hyperparameters.
  * Parameters are the values that calculated by the training process.
  * Hyperparameters are values specified in algorithm setup, which are not altered by training. These need to be fine-tuned using an additional outer training loop called hyperparameter tuning.
* *Algorithm Training* - Execute the algorithm to calculate the best parameters for the chosen ML algorithm to fit the supplied training data
* *Inference* - Once we have an algorithm, we use it to produce predictions, for both the train and test sets.
* *Evaluation* - We then compare the predictions of the trained algorithms to expected results. For supervised learning, this will be supplied target values. For unsupervised learning, we will expplore the results and their usefulness much like in exploratory data analysis.
* *Interpretability & Explainability* - Explaining how the model produced a certain prediction ( looking inside the blackbox) and interpreting what the prediction means in terms of relation to the real world problem.
* *Model Storage* - Model training can be an expensive process that we don't want to perform too often, and. once we have a model that performs well we save its state so it can be reloaded and used subsequently for inference on later problems.

### ML Lifecycle from an RSE perspective
![Uber Michaelangelo''s ML lifecycle](https://1fykyq3mdn5r21tpna3wkdyi-wpengine.netdna-ssl.com/wp-content/uploads/2018/11/image6.png)

RSEs are generally supporting ML projects in the Prototype phase of the ML Life cycle. The focus is on these three elements
* Data Preparation
* Model Development ("train models")
* Model Evaluation ("Evaluate models")

In many ways developing software for a machine learning project is no different than for any other sort of software project, but there are some additional or different challenges and also some different tools to supports tasks that are unique to a machine learning pipeline. In order to apply the principles of good software development that are typically applied by RSEs to ML projects RSEs will need to know about what ML software development and support has in common with other sorts of projects and what is different, and what tools are needed to suopport the different aspects. Many of these tools are about promoting these good practices in research ML projects, where researchers may not not be familiar with those good practicves or have the technical knowledge to set up and use the infrastructure tand tools to implement those practices. These notebooks will demonstrate some of tose tools that promote good practices leading to reproducible, scalable research software.


References:
* ML Ops Wikipedia https://en.wikipedia.org/wiki/MLOps
* Databricks MLOps https://databricks.com/glossary/mlops
* Who is a RSE https://ukrse.github.io/who.html 
* What is Research Software Engineering https://software-carpentry.org/blog/2015/06/what-is-a-research-software-engineer.html 

In [None]:
Firther reading


* ML Engineering - https://databricks.com/wp-content/uploads/2021/09/ML-Engineering-Ebook-Final.pdf
