# Module 11: MLOps

# Introduction

As machine learning has matured as an enterprise competency, ML users and vendors have developed good practices and cloud-based tools to help data scientists manage their workflow.

In this module:

- We will review the ML lifecycle: the sequence of steps involved in building, testing, and deploying a production-quality ML model.
- Then we will look at methods and tools for managing the process.
- Next, we will look at an example of one such lifecycle support tool, MLflow, an open source project that works well with many ML libraries and can scale to big data.

# Learning Outcomes

In this module, you will build your skills in:

* Describing MLOps processes and roles
* Recognizing your role in responsible AI development and discuss how MLOps can help support it
* Applying DevOps principles and processes to data and model engineering
* Practice using MLOps concepts with MLflow as an example

# Readings and Resources

We invite you to supplement this notebook with the following:

* Spark Summit (2019) [Video] A Guide to MLflow Talks at Spark + AI Summit 2019. Organized by Databricks. https://databricks.com/blog/2019/04/18/a-guide-to-mlflow-talks-at-spark-ai-summit-2019.html



* Databricks blog: https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html



* Official MLflow page: https://mlflow.org/



* Databricks guide to MLflow: https://docs.databricks.com/applications/mlflow/index.html

<h1>Table of Contents<span class="tocSkip"></span></h1>
<br>
<div class="toc">
<ul class="toc-item">
<li><span><a href="#Module-11:-MLOps" data-toc-modified-id="Module-11:-MLOps">Module 11: MLOps</a></span>
</li>
<li><span><a href="#Introduction" data-toc-modified-id="Introduction">Introduction</a></span>
</li>
<li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes">Learning Outcomes</a></span>
</li>
<li><span><a href="#Readings-and-Resources" data-toc-modified-id="Readings-and-Resources">Readings and Resources</a></span>
</li>
<li><span><a href="#Challenges-of-Managing-Models-in-Large-Organizations" data-toc-modified-id="Challenges-of-Managing-Models-in-Large-Organizations">Challenges of Managing Models in Large Organizations</a></span>
</li>
<li><span><a href="#The-ML-Lifecycle" data-toc-modified-id="The-ML-Lifecycle">The ML Lifecycle</a></span>
</li>
<li><span><a href="#MLOps" data-toc-modified-id="MLOps">MLOps</a></span>
</li>
<li><span><a href="#MLOps-Roles" data-toc-modified-id="MLOps-Roles">MLOps Roles</a></span>
</li>
<li><span><a href="#MLOps-for-Responsible-AI" data-toc-modified-id="MLOps-for-Responsible-AI">MLOps for Responsible AI</a></span>
</li>
<li><span><a href="#Continuous-Delivery" data-toc-modified-id="Continuous-Delivery">Continuous Delivery</a></span>
</li>
<li><span><a href="#Configuration-Management-for-MLOps" data-toc-modified-id="Configuration-Management-for-MLOps">Configuration Management for MLOps</a></span>
</li>
<li><span><a href="#Testing-Strategies" data-toc-modified-id="Testing-Strategies">Testing Strategies</a></span>
</li>
<li><span><a href="#Test-Driven-Development-(TDD)" data-toc-modified-id="Test-Driven-Development-(TDD)">Test-Driven Development (TDD)</a></span>
</li>
<li><span><a href="#Deployment-Pipeline-Practices" data-toc-modified-id="Deployment-Pipeline-Practices">Deployment Pipeline Practices</a></span>
</li>
<li><span><a href="#Software-Telemetry" data-toc-modified-id="Software-Telemetry">Software Telemetry</a></span>
</li>
<li><span><a href="#MLflow" data-toc-modified-id="MLflow">MLflow</a></span>
<ul class="toc-item">
<li><span><a href="#MLflow-logging-location" data-toc-modified-id="MLflow-logging-location">MLflow logging location</a></span>
</li>
<li><span><a href="#MLflow-experiments" data-toc-modified-id="MLflow-experiments">MLflow experiments</a></span>
</li>
<li><span><a href="#Managing-models-using-MLflow" data-toc-modified-id="Managing-models-using-MLflow">Managing models using MLflow</a></span>
</li>
</ul>
</li>
<li><span><a href="#Serving-the-Model" data-toc-modified-id="Serving-the-Model">Serving the Model</a></span>
</li>
<li><span><a href="#References" data-toc-modified-id="References">References</a></span>
</li>
</ul>
</div>

# Challenges of Managing Models in Large Organizations

Creating a machine learning model is just the first of many steps in creating a useful predictive model in a corporate setting.  Managing a model is not too difficult if there is only one, but an implementation in a large company will probably involve several models &mdash; each of which are in different stages of completion, have multiple versions in development and production, and have one or more teams working on them. This quickly becomes a significant management challenge.

Here are some of the issues:

- **Encouraging collaboration between teams**: If teams work in relative isolation and don't collaborate, they'll duplicate effort, develop different standards, and select incompatible tools and programming languages.


- **Creating standard, repeatable processes**: New teams should benefit from the groundwork of existing teams and not have to reinvent the wheel.


- **Achieving auditability and meeting regulatory requirements**: The models must allow internal and external auditors to verify their correctness if they are used for applications that involve money. In regulated industries, such as financial services or utilities, there may be a variety of industry standards that also must be met.


- **Maintaining explainability**: Management needs to understand enough about how each model works to have confidence in it and to be able to assess the risks should it malfunction.

These challenges are similar in many ways to the ones application development teams face when creating and managing applications. The systems development community developed the concept of **DevOps** over the last two decades to attempt to address these kinds of issues in their work. DevOps combines the disciplines of development and operations into a relatively seamless automated workflow. To address these challenges with machine learning, organizations needed an approach that brings the agility of DevOps to the ML lifecycle. As you might have guessed, the name **MLOps** has emerged.

Let's begin by looking at the lifecycle for creating and deploying machine learning models, then we can discuss how to optimize the process.


# The ML Lifecycle

Development of a model typically involves six stages, many of which could be usefully supported by additional tools:

1. **Train and Test**: Data preparation is often the biggest time commitment in the lifecycle. This phase includes cleaning and standardizing the data, selecting features, then training and testing the model using the training and test holdout datasets.<br><br>

2. **Package**: The model often needs to be packaged in some way before it can be moved into a cloud or server environment. This involves putting together the model with everything it needs to run into a single container of some kind &mdash; often a zip file or a Docker image. This allows all the components to be moved from one location to another as a unit without being concerned that some component will be dropped or forgotten. The additional files in the package may include metadata to configure the model to run in test or production mode, code libraries that the model requires for execution, additional code to load or extract data, monitoring tools and their configuration, and automated tests.<br><br>

3. **Validate**: In this stage, the team evaluates how model performance compares to their expectations and business goals. For example, a company might want to optimize for accuracy over speed in some cases. The first three stages of the lifecycle may need be iterated many times. It can take hundreds of training hours to find a satisfactory model. The development team can train many versions of the model by adjusting training data, tuning algorithm hyperparameters, or trying different algorithms. Ideally the model improves with each round of adjustment. Ultimately, it's the development team's role to determine which version of the model best fits the business use case.<br><br>

4. **Deploy**: The team deploys the model to a production server, or increasingly to other types of devices, for end use. The latest mobile phones, laptops, and various kinds of machinery for example have built-in support for fast inference on the device itself, so model execution is becoming increasingly distributed.<br><br>

5. **Predict and Monitor**: Once deployed, the model can take in new data as it becomes available and use it for prediction. Even if a model works well at first, it needs to be continually monitored and retrained to stay relevant and accurate. The conditions under which the original training was performed may no longer hold. For example, consider a recommender trained on the last ten years of sales experience. People's tastes change, so a model that would have previously performed well may no longer be appropriate. The model's predictions must be continuously re-evaluated to ensure that the model is functioning as expected and producing business value.<br><br>

6. **Retrain**: Certain types of models can incrementally use new observations to improve their predictions and track changing conditions without retraining from scratch. Others require full retraining. It's important to understand the strengths and limitations of each algorithm. It may be preferable to sacrifice some short term accuracy in return for longer-term flexibility by choosing an algorithm that is adaptable but perhaps not quite as performant on the original training set.<br><br>

# MLOps
MLOps, short for Machine Learning Operations, is then a discipline that combines Machine Learning (ML), Data Science, and DevOps principles to facilitate the development, deployment, and monitoring of ML models in a scalable, efficient, and reliable manner. The goal of MLOps is to create a smooth, automated pipeline for ML model lifecycle management and to establish best practices for collaboration between data scientists, engineers, and other stakeholders.

MLOps borrows practices from DevOps, which is a set of practices aimed at streamlining the process of developing, deploying, and maintaining software applications. Some key aspects of MLOps include:

- **Version Control**: Tracking changes in code, data, and configurations, enabling easy collaboration and reproducibility.
- **Continuous Integration and Continuous Deployment (CI/CD)**: Automating the process of integrating, testing, and deploying ML models to ensure consistent and efficient delivery.
- **Automated Testing**: Regularly testing models, data pipelines, and infrastructure to ensure quality and reliability.
- **Monitoring**: Tracking the performance of ML models, data pipelines, and infrastructure in real-time, as well as monitoring for anomalies or drift in model performance.
- **Model Management**: Managing the lifecycle of ML models, from development and training to deployment and retirement.
- **Experiment Tracking**: Documenting and organizing experiments, including model parameters, metrics, and results, to facilitate collaboration, analysis, and decision-making.
- **Model Explainability**: Ensuring that the logic behind the model's predictions is understandable and transparent, which is critical for trust, compliance, and debugging.
- **Model Retraining and Updating**: Periodically retraining and updating models based on new data, changing requirements, or performance degradation to maintain their effectiveness and relevance.
- **Scalability and Infrastructure**: Designing and managing the infrastructure needed for training, deploying, and managing ML models at scale, often using cloud-based services and specialized hardware.
- **Collaboration and Communication**: Encouraging clear communication and collaboration among data scientists, engineers, and other stakeholders to ensure that everyone is aligned on goals, requirements, and best practices.

MLOps aims to address the challenges that arise when moving ML models from experimentation to production, ensuring that they are deployed in a robust, efficient, and maintainable manner. This helps organizations realize the full potential of their ML and AI initiatives, driving innovation, and creating a competitive advantage.

# MLOps Roles

The roles within MLOps have become increasingly specialized as the field has matured.  These are the key roles:

- **Machine Learning Engineer**: These engineers are responsible for designing, building, and deploying machine learning models. They have strong skills in programming, data science, and machine learning frameworks. They work closely with data scientists to turn experimental models into production-ready systems.
- **Data Scientist**: Data scientists are responsible for analyzing and interpreting complex data sets to develop insights, inform decisions, and create data-driven models. They use various techniques, including machine learning, to create predictive models and are often involved in the early stages of model development. In an MLOps context, data scientists work closely with machine learning engineers to operationalize their models.
- **Data Engineer**: Data engineers are responsible for creating and maintaining the data infrastructure necessary for machine learning systems. They build pipelines to ingest, process, and transform data from multiple sources, ensuring that it is clean, consistent, and ready for use by machine learning models. They also manage data storage and implement data security measures.
- **MLOps Engineer**: MLOps engineers are responsible for integrating the work of data scientists, machine learning engineers, and data engineers into a cohesive, automated, and scalable system. They ensure that machine learning models are deployed and monitored correctly, focusing on continuous integration, continuous deployment (CI/CD), and infrastructure management. They also troubleshoot issues that arise during the deployment and operation of machine learning models.
- **Infrastructure Engineer**: These engineers are responsible for managing the underlying infrastructure that supports machine learning models, such as cloud services, on-premises servers, or specialized hardware like GPUs. They ensure that resources are available, scalable, and optimized for the needs of the machine learning systems.
- **Model Validator**: Model validators are responsible for evaluating and assessing machine learning models to ensure that they meet required performance and accuracy standards. They use various techniques, such as cross-validation, to verify that models generalize well to unseen data and that they meet the necessary regulatory requirements.
- **ML Product Manager**: ML product managers are responsible for defining the vision, strategy, and roadmap for machine learning products or features. They work with cross-functional teams, including data scientists, engineers, and stakeholders, to ensure that machine learning solutions align with business goals and provide value to users.
- **Business Analyst**: Business analysts work with stakeholders to identify opportunities for using machine learning to solve problems and improve business processes. They translate business requirements into technical specifications, helping to ensure that machine learning solutions are designed to meet the needs of the organization.

These roles often collaborate and overlap, depending on the organization's size and structure, as well as the specific requirements of a given machine learning project.

# MLOps for Responsible AI
MLOps can help to promote responsible AI by focusing on creating and maintaining machine learning systems that adhere to ethical principles, promote fairness, and ensure transparency and accountability. This extends the traditional MLOps practices to address the unique challenges posed by responsible AI. Here are some key aspects of MLOps for responsible AI:

- **Fairness**: MLOps processes should consider the fairness of machine learning models by ensuring that they do not discriminate against particular groups or perpetuate existing biases. Techniques like re-sampling, re-weighting, and adversarial training can be used to minimize biases in training data and models.

- **Explainability**: MLOps practices for responsible AI should emphasize creating models that are transparent and easy to understand. Explainable AI (XAI) techniques like Local Interpretable Model Agnostic Explanation (LIME), SHapley Additive exPlanations (SHAP), and counterfactual explanations can be employed to provide insights into how a model makes decisions, making it easier to detect and correct biases or errors. (See References for more information on these).

- **Privacy**: MLOps processes should prioritize data privacy and security, ensuring that sensitive information is protected and used responsibly. Privacy-preserving techniques such as federated learning, differential privacy, and secure multi-party computation can be incorporated to protect data while still enabling the development of effective machine learning models.

- **Accountability**: MLOps for responsible AI should establish clear lines of responsibility and ownership for machine learning models, including who is responsible for monitoring and addressing ethical concerns. This includes setting up robust monitoring and auditing systems to track the performance and impact of models throughout their lifecycle.

- **Transparency**: Responsible AI practices in MLOps should emphasize openness and transparency. This includes documenting model development processes, data sources, and decision-making criteria, as well as sharing this information with relevant stakeholders. Transparent practices can help build trust in AI systems and promote more ethical outcomes.

- **Continuous monitoring and evaluation**: MLOps for responsible AI should include ongoing monitoring and evaluation of machine learning models to ensure that they continue to meet ethical standards, as well as performance requirements. This includes regular re-evaluation of model fairness, explainability, and other ethical concerns, and updating models as needed to address any issues.

- **Collaboration**: MLOps for responsible AI should involve cross-functional teams, including data scientists, engineers, ethicists, and domain experts, who collaborate to ensure that AI systems are developed and deployed responsibly. This collaborative approach can help identify and mitigate potential ethical concerns and ensure that AI solutions align with organizational values and goals.

By integrating these responsible AI principles into MLOps practices, organizations can develop and deploy machine learning models that not only perform well but also adhere to ethical guidelines and contribute positively to society.

# Continuous Delivery

MLOps draws heavily on ideas from Continuous Delivery (CD).  Continuous Delivery is a software development practice that aims to ensure software can be released to production at any time, in a reliable, safe, and efficient manner. It involves automating various stages of the software development process to minimize manual intervention, reduce errors, and accelerate the release of new features or bug fixes. 

The principles of Continuous Delivery are:

- **Build quality in**: Prioritize quality from the start by adopting practices such as Test-Driven Development (TDD), code reviews, and automated testing at various levels (unit, integration, and system tests). Ensuring high-quality code minimizes the likelihood of defects and reduces the cost of fixing issues later in the development process.

- **Work in small batches**: Break down large features and changes into smaller, manageable increments. Smaller batches make it easier to develop, test, and deploy changes, reducing the risk associated with large releases and allowing for quicker feedback.

- **Automate repetitive tasks**: Automate tasks such as building, testing, and deploying the software to minimize manual intervention, reduce human error, and accelerate the development process. Automation ensures consistent execution of tasks and frees up team members to focus on higher-value work.

- **Use a deployment pipeline**: Implement a deployment pipeline to manage and automate the progression of code changes from development to production. The pipeline should include stages for building, testing, and deploying the software, with automated gates that prevent low-quality code from moving forward.

- **Keep software releasable at all times**: Maintain the software in a state where it can be deployed to production at any time. This involves rigorous testing, monitoring, and addressing issues as they arise, ensuring that the software remains stable and reliable throughout the development process.

- **Everyone is responsible for the delivery process**: Encourage a culture of shared responsibility for the software delivery process, involving not only developers but also operations, quality assurance, and other stakeholders. This promotes collaboration, accountability, and continuous improvement.

- **Continuous improvement**: Regularly evaluate and refine the development, testing, and deployment processes to identify areas for improvement and optimize the overall software delivery process. Continuous improvement encourages learning, adaptability, and a focus on long-term success.

- **Rapid feedback**: Foster a feedback loop that allows for quick identification and resolution of issues. This includes monitoring the application's performance, gathering user feedback, and using automated testing as part of the continuous integration process. Rapid feedback helps teams address problems early and iterate more effectively.

- **Manage infrastructure and configurations as code**: Treat infrastructure and configuration management as code, allowing for version control, automated provisioning, and consistency across environments. This approach simplifies rollbacks, enables auditing, and ensures that infrastructure changes are tracked and managed like any other software artifact.

- **Collaboration and communication**: Facilitate open communication and collaboration among different teams and stakeholders, promoting a shared understanding of goals, challenges, and best practices. This fosters a culture of teamwork, innovation, and shared ownership of the software delivery process.

By adhering to these principles, organizations can create a sustainable and efficient software delivery process that minimizes risk, accelerates time to market, and promotes high-quality, reliable software products.

# Configuration Management for MLOps

Configuration management is essential in MLOps (Machine Learning Operations) for ensuring consistency, reliability, and maintainability of the machine learning systems. It helps manage various aspects of ML systems, such as data pipelines, model training, and deployment configurations. Here's how configuration management applies to MLOps:

- **Reproducibility**: In MLOps, managing configurations helps maintain the reproducibility of machine learning experiments, model training, and deployments. By tracking and versioning configurations, it becomes easier to recreate the same environment and conditions for different team members, reducing inconsistencies and ensuring reliable results.

- **Infrastructure as Code (IaC)**: Applying IaC principles to MLOps allows teams to define, provision, and manage infrastructure and configurations using code. This enables version control, automated provisioning, and consistency across development, testing, and production environments. Tools like Terraform, Ansible, and Kubernetes can be used to manage ML infrastructure and configurations.

- **Model training configurations**: Managing configurations for model training, such as hyperparameters, data splits, and feature selection, is crucial for maintaining consistency and transparency in the model development process. Versioning these configurations ensures that the results of different experiments can be compared and tracked over time.

- **Data pipeline configurations**: In MLOps, data pipelines often require configurations for data ingestion, preprocessing, and transformation. Managing these configurations helps maintain consistency in data processing, ensuring that models are trained on high-quality, reliable data.

- **Deployment configurations**: Configuration management is essential for managing deployment settings, such as resource allocation, scaling policies, and monitoring configurations. This ensures that ML models are deployed consistently and reliably in production environments, and can scale as needed to meet demand.

- **Experiment tracking and monitoring**: By managing configurations related to experiment tracking and monitoring, teams can ensure that they collect consistent and relevant metrics, logs, and other data throughout the ML lifecycle. This aids in identifying and addressing issues, as well as informing continuous improvement efforts.

- **Standardization and collaboration**: Configuration management promotes standardization and collaboration within MLOps, as teams can share and reuse configurations for various stages of the ML lifecycle. This fosters a consistent approach to building, training, and deploying models, streamlining the development process and reducing the risk of errors or inconsistencies.

- **Change management**: Implementing a structured process for managing changes to configurations in MLOps helps maintain system stability, minimize the impact of changes, and ensure that modifications are properly reviewed and controlled.

By incorporating configuration management practices into MLOps, teams can maintain consistent, reliable, and maintainable machine learning systems, reducing the risk of errors or inconsistencies and enabling faster, more efficient development and deployment of ML models.

# Testing Strategies
Software testing in the context of MLOps (Machine Learning Operations) is a crucial aspect of ensuring the reliability, scalability, and maintainability of machine learning systems. It involves verifying not only the machine learning models but also the surrounding infrastructure, pipelines, and processes that enable the model to function in a production environment. Here are some key aspects of software testing as it relates to MLOps:

- **Unit testing**: In MLOps, unit testing involves testing individual components or functions in isolation, such as data preprocessing steps, feature engineering functions, or custom model code. These tests help catch issues early in the development process and ensure that each component is functioning as expected.
- **Integration testing**: Integration testing focuses on the interactions between different components of the machine learning system, such as data pipelines, feature stores, and model serving infrastructure. This ensures that data flows correctly through the system and that components work together as expected.
- **System testing**: System testing evaluates the machine learning system as a whole, assessing its performance, reliability, and stability under various conditions. This may include testing the system's ability to handle different data loads, respond to requests under high concurrency, or recover from failures.
- **Model validation**: Model validation is a critical part of MLOps testing, ensuring that the machine learning model performs well on unseen data and meets the required performance metrics. Techniques such as cross-validation, holdout sets, and performance monitoring can be used to validate models before deployment and during operation.
- **End-to-end testing**: End-to-end testing simulates real-world scenarios to ensure that the entire machine learning system, from data ingestion to model serving, functions correctly. This helps identify issues that may not be apparent when testing components in isolation.
- **Load and performance testing**: Load and performance testing involves subjecting the machine learning system to varying levels of data volume, user requests, or computational load to ensure that it remains stable and performs well under different conditions. This is especially important for systems that need to scale in response to fluctuating demand.
- **Security testing**: Security testing focuses on identifying vulnerabilities in the machine learning system, including data leaks, unauthorized access, or potential attack vectors. This ensures that the system and its data are protected from potential threats.
- **Monitoring and observability**: In MLOps, monitoring and observability are essential for tracking the performance and behavior of machine learning systems in production. This includes collecting metrics, logs, and other information to identify issues, measure system health, and inform continuous improvement efforts.
- **Continuous testing**: MLOps promotes the practice of continuous testing, where tests are run as part of the continuous integration and deployment (CI/CD) pipeline. This ensures that changes to the code, data, or infrastructure are tested and validated before being deployed to production.

Incorporating these software testing practices into MLOps helps ensure the reliability, performance, and maintainability of machine learning systems, reducing the risk of issues and enabling faster deployment of new models and features.


# Test-Driven Development (TDD)
Test-Driven Development (TDD) is a software development methodology that emphasizes writing tests before implementing the actual code. It involves writing a failing test for a specific functionality, writing the code to pass the test, and then refactoring the code for optimization and readability. In the context of MLOps (Machine Learning Operations), TDD can be applied to various aspects of the machine learning system, ensuring its reliability, maintainability, and scalability. Here's how TDD can be used in MLOps:

- **Data preprocessing and feature engineering**: Before implementing data preprocessing and feature engineering functions, write tests to ensure that the functions transform the data as expected. This helps catch issues early in the development process and ensures that data is processed correctly before being used by the machine learning model.

- **Custom model code**: If your machine learning system involves custom model code or non-standard model architectures, use TDD to write tests for the model's behavior, such as ensuring the model produces the correct output shape or meets specific performance criteria. This ensures that the custom code is functioning correctly and helps avoid potential issues in the model's implementation.

- **Data pipeline and workflow**: Write tests for the data pipeline and workflows that are part of the machine learning system, ensuring that data is ingested, processed, and stored correctly. This can help identify issues with data integration, consistency, and processing, ensuring that the model is trained on high-quality data.

- **Model training and validation**: Use TDD to write tests for model training and validation processes, ensuring that models are trained and evaluated correctly. This can involve tests for hyperparameter optimization, model selection, and validation strategies such as cross-validation or holdout sets.

- **Model serving and deployment**: Write tests for the model serving and deployment infrastructure, ensuring that the system can handle requests, serve predictions, and scale as needed. This can include tests for API endpoints, request/response handling, and error handling.

- **Monitoring and observability**: Implement tests for monitoring and observability components, such as logging, metrics collection, and alerting. This ensures that the system is capable of tracking its performance, detecting issues, and providing insights for continuous improvement.

- **Integration with other systems**: In an MLOps context, machine learning systems often need to integrate with other systems, such as databases, message queues, or external APIs. Use TDD to write tests for these integrations, ensuring that data flows correctly between the systems and that the machine learning system functions as expected in the broader context.

By applying Test-Driven Development principles in MLOps, teams can catch issues early, ensure the reliability and maintainability of the machine learning system, and promote a culture of quality and collaboration. While TDD may require additional upfront effort, it ultimately leads to a more robust and efficient machine learning system.

# Deployment Pipeline Practices
A deployment pipeline is a series of stages through which code changes pass as they move from development to production. The pipeline is designed to automate the build, test, and deployment processes, ensuring that software releases are reliable, efficient, and consistent. Here are some key deployment pipeline practices to consider:

- **Version control**: Use a version control system (such as Git) to manage code changes, track revisions, and collaborate effectively across teams. Version control ensures that developers work on a consistent codebase and makes it easier to roll back changes if necessary.
- **Continuous Integration (CI)**: Merge code changes from multiple developers into a shared repository frequently, ideally several times a day. CI helps to identify and resolve integration issues early, reducing the risk of conflicts and bugs accumulating over time.
- **Automated build**: Automate the process of compiling, building, and packaging the software. This reduces the potential for human error, ensures consistency, and accelerates the development process.
- **Automated testing**: Implement automated tests at various levels, such as unit tests, integration tests, and system tests, to ensure the software's quality and stability. Tests should be run as part of the continuous integration process, allowing developers to catch and fix issues quickly.
- **Deployment gates**: Use automated gates in the deployment pipeline to prevent low-quality code from moving forward. For example, code changes might only progress to the next stage if all tests pass and performance metrics meet predefined thresholds.
- **Environment consistency**: Ensure consistency across environments (e.g., development, staging, and production) by using tools like Docker, Kubernetes, or infrastructure-as-code solutions like Terraform. This minimizes the risk of environment-specific issues and makes it easier to reproduce and troubleshoot problems.
- **Continuous Deployment (CD)**: Automatically deploy code changes to production once they have passed all stages of the pipeline and met quality criteria. CD accelerates the release of new features and bug fixes and minimizes manual intervention in the deployment process.
- **Rollback and roll-forward strategies**: Implement strategies for rolling back or rolling forward changes in case of issues. This includes versioning artifacts, tracking changes, and having a well-defined process for reverting or fixing problematic releases.
- **Monitoring and observability**: Collect metrics, logs, and other data from each stage of the deployment pipeline to gain insights into the performance and health of the software. This information can be used to identify and address issues, as well as to inform continuous improvement efforts.
- **Continuous improvement**: Regularly review and refine the deployment pipeline to optimize efficiency, reliability, and the overall quality of the software. This might involve updating tools, automating additional tasks, or incorporating feedback from stakeholders.

By adopting these deployment pipeline practices, organizations can streamline their software development processes, reduce the time to market for new features, and improve the overall quality and stability of their software products.

# Software Telemetry
Software telemetry refers to the collection of data regarding the usage, performance, and behavior of a software system. In the context of MLOps (Machine Learning Operations), software telemetry is crucial for monitoring and understanding the performance of machine learning systems in various stages of the lifecycle, such as training, validation, and deployment. The insights gained from software telemetry can help identify issues, optimize performance, and inform continuous improvement efforts. Here are some key aspects of software telemetry in MLOps:

- **Model performance metrics**: Collect metrics related to model performance, such as accuracy, precision, recall, F1 score, or other domain-specific metrics. These metrics help evaluate the effectiveness of the model and compare different models or versions.
- **System performance metrics**: Track system performance metrics, such as CPU utilization, memory consumption, network latency, and response times. Monitoring these metrics helps ensure that the machine learning system is performing optimally and can identify potential bottlenecks or resource constraints.
- **Training and validation metrics**: During the model training and validation stages, collect telemetry data on training loss, validation loss, and other relevant metrics. This information can help identify overfitting, underfitting, or other issues related to the model's generalization capabilities.
- **Data pipeline telemetry**: Monitor the data processing pipeline, collecting metrics on data ingestion, preprocessing, and transformation. This helps ensure that data pipelines are functioning correctly and efficiently, and can identify potential issues with data quality or consistency.
- **Model serving and deployment**: Collect telemetry data related to model serving and deployment, such as request rates, response times, and error rates. This information can help optimize the model serving infrastructure, identify issues with model performance or scalability, and inform decisions related to resource allocation and auto-scaling.
- **Feature importance and drift**: Monitor feature importance and drift to understand how the relevance of input features changes over time. This can help identify the need for model updates or retraining and inform decisions related to feature engineering or data collection.
- **User behavior and feedback***: Collect data on user interactions with the machine learning system, such as usage patterns, feedback, or error reports. This information can help identify potential issues with the system, inform usability improvements, and guide the development of new features or models.
- **Logs and events**: Collect logs and events generated by the machine learning system, such as system logs, error logs, and custom events. Analyzing this data can help identify issues, troubleshoot problems, and understand the overall behavior of the system.
- **Anomaly detection and alerting**: Use software telemetry data to detect anomalies in the machine learning system's performance or behavior and trigger alerts when issues are identified. This helps ensure that potential problems are addressed proactively and that system stability is maintained.

By incorporating software telemetry practices into MLOps, teams can gain valuable insights into the performance and behavior of their machine learning systems, identify and address issues, and continually improve the effectiveness and reliability of their models and infrastructure.

# MLflow

Now we will look at MLflow as an example of an open-source platform for managing the end-to-end machine learning lifecycle. It was developed by Databricks to help data scientists and engineers track experiments, share and package projects, and deploy models in a systematic and reproducible manner. MLflow is designed to work with any machine learning library, language, or existing codebase, making it a versatile and flexible solution for MLOps.

MLflow consists of four main components:

* **MLflow Tracking**: This component provides a centralized service for tracking and logging experiments, including parameters, metrics, tags, and artifacts, such as trained models or visualizations. MLflow Tracking allows users to compare different experiments and models, making it easier to select the best model for deployment. It also supports visualization of experiment results through a web-based UI or integration with other tools like TensorBoard.
* **MLflow Projects**: MLflow Projects is a packaging format for organizing and sharing code, dependencies, and configurations for a machine learning project. By using a simple YAML configuration file, MLflow Projects ensures that code can be easily shared and reproduced across different environments. It also supports running projects in containers or remote environments, such as Kubernetes or cloud-based virtual machines.
* **MLflow Model**s: This component is a general format for packaging machine learning models, allowing them to be easily shared and deployed across various platforms. MLflow Models can store models from different ML libraries and provides a standard interface for serving models, regardless of the underlying library. It also supports multiple model serving options, such as local serving, REST APIs, or deployment to cloud-based platforms like Amazon SageMaker or Microsoft Azure ML.
* **MLflow Model Registry**: The Model Registry is a centralized repository for managing and versioning models. It allows users to register models, manage different versions, and track the deployment of models to different environments, such as staging or production. With the Model Registry, teams can collaborate on models more effectively, ensuring that only the best models are deployed and used in production.

MLflow's modular design and support for various machine learning libraries and platforms make it a popular choice for teams looking to streamline their MLOps processes. By providing tools for tracking experiments, sharing code and models, and deploying models consistently and reproducibly, MLflow helps to improve the efficiency and reliability of machine learning projects.

MLflow can be installed for use with Python ML pipelines by running the command `pip install mlflow`. We will use MLflow in a Databricks notebook for this module. The main site for MLflow is: https://www.mlflow.org/.

**Exercise 1**

In this exercise, we will complete the quickstart tutorial for the `autolog` feature in MLflow. More about the `autolog` function can be found here: https://mlflow.org/docs/latest/python_api/mlflow.spark.html#mlflow.spark.autolog.

Please follow this link for the MLflow quickstart tutorial:

- https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5915990090493625/3852659441213238/6085673883631125/latest.html

Import this notebook to see how specific items can be logged:

- https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5915990090493625/1771997817162911/6085673883631125/latest.html

## MLflow logging location

All MLflow runs are logged to the active experiment, which can be set using any of the following:

1. Using the [`mlflow.set_experiment()` command].<br><br>

2. Using the `experiment_id` parameter in the [`mlflow.start_run()` command].<br><br>

3. Setting one of the MLflow environment variables [`MLFLOW_EXPERIMENT_NAME` or `MLFLOW_EXPERIMENT_ID`].<br><br>

If no active experiment is set, the runs are logged to the [notebook experiment].

[`mlflow.set_experiment()` command]: https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.set_experiment
[`mlflow.start_run()` command]: https://www.mlflow.org/docs/latest/python_api/mlflow.html#mlflow.start_run
[`MLFLOW_EXPERIMENT_NAME` or `MLFLOW_EXPERIMENT_ID`]: https://mlflow.org/docs/latest/cli.html#cmdoption-mlflow-run-arg-uri
[notebook experiment]: https://docs.databricks.com/applications/mlflow/tracking.html#mlflow-notebook-experiments


**Exercise 2**

This notebook creates a random forest model on a simple dataset and uses the MLflow Tracking API to log the model, selected model parameters, and metrics:

- https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5915990090493625/1771997817162911/6085673883631125/latest.html

## MLflow experiments

There are two types of experiments: workspace and notebook.

- You can create a workspace experiment using the Workspace UI or the MLflow API. Workspace experiments are not associated with any notebook, and any notebook can log a run to these experiments by using the experiment ID or the experiment name.


- A notebook experiment is associated with a specific notebook. Databricks automatically creates a notebook experiment if there is no active experiment when you start a run using `mlflow.start_run()`.

You can learn more here: https://docs.databricks.com/applications/mlflow/tracking.html.

Here is an example notebook for Automated MLflow tracking with MLlib: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5915990090493625/3815222091422169/6085673883631125/latest.html

## Managing models using MLflow

MLflow provides a registry for tracking models and their status. Databricks provides the MLflow registry as a cloud service, which makes it convenient for sharing between teams without needing to set up the infrastructure to run a locally-hosted registry. The Model Registry provides chronological model lineage (which MLflow experiment and run produced the model at a given time), model versioning, stage transitions (for example, from staging to production or archived), and email notifications of model events. You can also create and view model descriptions and leave comments.

Models can be registered through a UI or programmatically through an API.  As your model progresses through the stages of the lifecycle, you update its status in the registry (again either through the UI or API). Users that don't have permission to update the status can still request a status change for approval by the model owner.

The registry also provides a search capability that becomes useful when there are a large number of models in play or if you need to find an older version of a model.

To read more about the details of managing models refer to:

- https://docs.databricks.com/applications/machine-learning/manage-model-lifecycle/index.html#create-or-register-a-model

# Serving the Model

When the model is ready for production, it can be *served* (i.e. moved to production) in the Databricks cloud with a few mouse clicks. To learn more about the details of serving models, refer to: https://docs.databricks.com/applications/mlflow/model-serving.html.

In the following example, a model is registered to AWS. (Note: this is for your reference only and won't work in the free Community version).

- https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/5915990090493625/1771997817162797/6085673883631125/latest.html

**End of Module**

You have reached the end of this module.

If you have any questions, please reach out to your peers using the discussion boards. If you and your peers are unable to come to a suitable conclusion, do not hesitate to reach out to your instructor on the designated discussion board.

# References


- EXplainable AI (XAI): LIME & SHAP, Two Great Candidates to Help You Explain Your Machine Learning Models (2021). Retrieved March 30, 2023 from https://towardsdatascience.com/explainable-ai-xai-lime-shap-two-great-candidates-to-help-you-explain-your-machine-learning-a95536a46c4e.
- Interpretable Machine Learning Explanations (2023). Retrieved March 30, 2023 from https://christophm.github.io/interpretable-ml-book/.
- MLflow Documentation. Retrieved from: https://mlflow.org/docs/latest/index.html


- Databricks MLflow Guide. Retrieved from: https://docs.databricks.com/applications/mlflow/index.html


- MLflow Documentation. Retrieved from: https://mlflow.org/docs/latest/index.html


- Databricks MLflow Guide. Retrieved from: https://docs.databricks.com/applications/mlflow/index.html


- Databricks blog. Retrieved from: https://databricks.com/blog/2021/04/15/how-not-to-tune-your-model-with-hyperopt.html


- MLflow Model Registry on Databricks from Databricks blog. Retrieved from: https://docs.databricks.com/applications/mlflow/model-registry.html#register-a-model-in-the-model-registry