# Designing Adaptable ML systems

## Adapting to Data
### Adapting to Data

Decoupled upstream data producers

Underutilized data dependencies

### Changing Distributions

Distributions change
* Monitor descriptive statistics for your inputs and outputs
* Monitor your residuals as a function of your inputs
* Use custom weights in your loss function to emphasize data recency
* Use dynamic training architecture and regularly retrain your model

### Right and Wrong Decisions

### System Failure

System Fail

Feedback Loops

## Mitigating Training-Serving Skew through Design

Training/Serving Skew
1. A discrepancy between how you handle data in the training and serving pipelines
2. A change in the data between when you train and when you serve
3. A feedback loop between your model and your algorithm

How Code Can Create Training/Serving Skew
* Different library versions that are functional equivalent but optimized differently
* Different library versions that are not functional equivalent
* Re-implemented functions

## Debugging a Production Model

## Summary

keep humans in the loop

Prioritize maintainability

Get ready to roll back

# Designing High Performance ML Systems

## Aspects of Performance

### Training

![](tuning_performance.png)

### Why distributed training

Improving performance alse adds complexity

Machine learning gets complex quickly

Heterogeneous systems require our code to work anywhere

Deep learning works because datasets are large, but the compute required keeps increasing

Distributed systems are a necessity for managing complex models with large data volumns

### Distributed training architectures

**Data Parallelism**

Two approaches to Data Parallelism
1. Parameter server
2. Sync Allreduce

![](data_parallelism.png)

**Model Parallelism**

## Faster input pipelines

### Faster input pipelines

Reading Data into TensorFlow
1. Directly feed from Python
2. Native TensorFlow Ops
3. Read transformed tf records

### Parallel pipelines

![](parallelize_file_reading.png)
![](parallelize_transformations.png)
![](prefetch.png)
![](fuse.png)

## Data parallelism (All Reduce)

**Distribution API Strategy**
![](mirroredstrategy.png)

**Mirrored Strategy**
* No change to the model or training loop
* No change to input function (requires tf.data.Dataset)
* Checkpoints and summaries are seamless

## Parameter Server Approach

Data parallelism is a way to incerase training throughput

Model Parallelism lets you disribute a model across GPUs

Large embedding need multiple machines to map sparse data

Estimator train_and_evaluate() handles all this

Estimator contains the implementation of three functions: training, evaluation and serving

![](estimator_encapsulating.png)

train_and_evaluate bundles together a distributed workflow

## Inference

Aspects of performance during inference
* QPS
* Microservice
* Cost

![](inference_implement.png)

# Hybrid ML System

You may not be able to do machine learning soley on Google Cloud
* Tied to On-Premise Infrastructure
* Multi Cloud System Architecture
* Running ML on the edge

Kubernetes minimized infrastructure management

Kubeflow enables hybrid machine learning

## Kubeflow

### Machine Learning on Hybrid Cloud

#### Composability

Building a model is only one part of the entire system

Each ML Stage is an Independent System

Composability is about microservices

#### Portability

#### Scalability

* More accelerators(GPU, TPU)
* More CPUs
* More disk/networking
* More skillsets(data engineers, data scientists)
* More teams
* More experiments

### KubeFlow

What's in the box?
* Jupyter notebook
* Multi-architecture, distributed training
* Multi-framework model serving
* Examples and walkthroughs for getting started
* Ksonnet packaging for customizing it yourself!

### Demo: KubeFlow

https://github.com/amygdala/code-snippets/tree/master/ml/kubeflow-pipelines

KubeFlow Benefits
* Portability
* Composability and Reproducibility
* Scalability
* Visualization and Collaboration

## Optimzing TensorFlow for mobile

### Embedded Models

ML models can help extract meaning from raw data, thus reducing network traffic

From mobile devices, we often can't use the microservices approach. Microservices can add unwanted latency

In these situations, we'd like to train on the cloud, predict on device

### TensorFlow Lite

TensorFlow supports multiple mobile platforms

TensorFlow Lite
* Reduced code footprint
* Quantization
* Lower precision arithmetric

Even though we have talked primarily about prediction on mobile, a new frontier is federated learning

### Optimizing for Mobile

Large neural networks can be compressed

There are several methods to reduce model size
* Freeze graph
* Transform the graph
* Quantize weights and calculations

## Wrap Up

Article: [KubeFlow on GCP](https://cloud.google.com/blog/products/gcp/simplifying-machine-learning-on-open-hybrid-clouds-with-kubeflow.png)

Article: [Cloud MLE Architecture Review](https://cloud.google.com/ml-engine/docs/tensorflow/technical-overview.png)