# Adapting to Data

### Agenda
- Adapting to Data
- Mitigating Training-Serving Skew Through Design
- Debugging a Production Model

mitigate 완화시키다

### Which of these is least likely to change?

1. An upstream model
2. A data source maintained by another team
3. The relationship between features and labels
4. The distribution of inputs

=> All of them can and often do change

### Decoupled upstream data product
- Input -> Model -> Output
- Web Dev Team - Logs
 - Site Reliability(O)
 - Anti Abuse(O)
 - Data Science() - New Features
  - Rigorously assess all features

### Distributions change
interpolation is much easier than extrapolation

- Monitor descriptive statistics for your inputs and outputs
- Monitor your residuals as a function of your inputs
- Use custom weights in your loss function to emphasize data recency
- Use dynamic training architecture and regularly retrain your model

### Exercise : Adapting to Data

- __Scenario 1: Code Sprint__
- __Scenario 2: A Gift Horse__
 - smell
 
### Right and Wrong Data Decision
- patient age
- gender
- prior medical conditions
- hospital name
- vital signs
- test results

extrapolation 1. 외삽법, 보외법   2. (기지의 사실에서의) 추정 ((from))<br>
recency : 최신, 새로움(recentness) ((of))<br>
vigilant 바짝 경계하는, 조금도 방심하지 않는<br>

### Data Leakage

### Predict political affiliation from metaphors

__Solution: Cross-contamination: you have to split by author

affiliation 1. (개인의 정치·종교적) 소속   2. (단체의) 제휴, 가맹<br>
suspicious 1. (불법·부정행위를 한 것으로) 의혹을 갖는, 수상쩍어 하는   2. 의심스러운, 수상쩍은<br>
contamination 1. [U] 오염(pollution), 오탁; 더러움; [C] 오탁물; [비유] 타락   2. 독가스[방사능]에 의한 오염 <br>

### System Failure

Rollback Initiated Version 1.0.1 Threee months old

### Feedback Loops
- Clinet - Static Model - Stale Recommendations

stale 1. 신선하지 않은, (만든 지) 오래된   2. 퀴퀴한, (좋지 못한) 냄새가 나는

# Mitigating Training-Serving Skew

1. A discrepancy between how you handle data in the training and serving pipelines
2. A change in the data between when you train and when you serve
3. A feedback loop between your model and your algorithm

### How Code Can Create Training/Serving Skew
- Different library versions that are functionally equivalent but optimized differently
- Different library versions that are not functionally equivalent
- Re-implemented functions

discrepancy (같아야 할 것들 사이의) 차이<br>
polymorphism 1. 동질 이상(同質異像)   2. 다형(多形)(현상), 다형성; 다형 현상 ((동종 집단 가운데에서...

- CSV File & Cloud Pub/Sub
 - Cloud DataFlow(Batch) & Cloud DataFlow(Streaming) 
  - Multiple CSV Files & BigQuery

# Debugging a Production Model

1. Multiple Purchase orders -> Cloud Pub/Sub 
2. 1)Cloud Dataflow -> Predicted demand -> Purchasing system 2) Model
3. 1) Google BigQuery Data Warehouse -> Cloud ML engine Model training ->(deploy)-> Model

### Business Catastrophe 1

### An Actual Feedback Loop
- Bad Data -> ML Model -> Predicts Low Demand -> Product Turnover Increases -> ML Model Loop

### Business Catastrophe 2
- Centralized Purchasing

### Business Catastrophe 3
- Solution : Stop automatic model deplyment process -> contaminated data

uptick 약간의 증가

1. Keep humans in the loop
2. Prioritize maintainability
3. Get ready to roll back

# Designing Adaptable ML Systems
1. Which of the following models are susceptible to a feedback loop?
> A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).<br>
A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.<br>
An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.<br>
A university-ranking model that rates schools in part by their selectivity—the percentage of students who applied that were admitted.<br>
A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its feature.<br>
A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.
>>A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).<br>
A university-ranking model that rates schools in part by their selectivity—the percentage of students who applied that were admitted.<br>
A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its feature.<br>
2. Suppose you are building a ML-based system to predict the likelihood that a customer will leave a positive review. The user interface by which customers leave reviews changed a few months ago, but you don't know about this. Which of these is a potential consequence of mismanaging this data dependency?
> Losses in prediction quality<br>
Change in model serving signature<br>
Change in ability of model to be part of a streaming ingest
>>Losses in prediction quality

## Learn how to ...
1. Identify performance consideration for ML models
2. Choose appropriate ML infrastructure
3. Select a distribution strategy

# Agenda

- Distributed training
- Faster input pipelines
- Data parallelism(All Reduce)
- Parameter Server approach
- Inference

### High performance ML
- One key aspect is the time taken to train a model

### Optimizing your Training Budget
- time, cost, scale

### Model Training can take a long time
### Analyze Benifit of Model vs Running Cost
- Optimize training dataset size
- Choosing optimized infrastructure
- Use earlier model checkpoints
- Tuning Performance to reduce training time, reduce cost, and increase scale

|Constraint|Input/Output|CPU|Memory|
|---|---|---|---|
|Commonly <br>Occurs|Large inputs<br> Input requires parsing <br>Small model|Expensive computations <br>UnderposeredHardware|Large number of inputs <br>Complex model|
|Take <br>Action|Store efficiently <br>Parallelize reads <br>Consider batch size|Train on faster accel.<br> Upgrade processor <br>Run on TPUs <br>Simplify model|Add more memory <br>Use fewer layers <br>Reduce batch size|

Heterogeneous : 여러 다른 종류들로 이뤄진

### Optimizing your Batch Prediction

- time, cost, scale

### Optimizing your Online Prediction

- different : Single-Machine, Microservice, QPS,

### Improving performance also adds complexity
- Heterogenous system, Distributed systems, Model architectures

### Heterogenous system require our code to work anywhere
- CPU, GPU, TPU, Android/iOS, Edge TPU, Raspberry Pi

### Deep learning works because datasets are large, but the compute required keeps increasing
### Large models could have millions of weights
### Training can take a long time
### How can you make model training faster
### Scaling with Distributed Training

### Adding a single accelerator
- Multi-core CPU -> GPU, TPU
### Adding many machines with many possible devices

### Two approaches to Data Parallelism
1. Parameter server
2. Sync Allreduce

### Async Parameter Server
### Sync Allreduce Architecture

### Consider Async Parameter Server if...
- many low-power or unreliable workers
- More mature approach
- Constrained I/O

### Consider Sync Allreduce if...
- Multiple devices on one host Fast <br> devices with strong links(e.g. TPUs)
- Better for multiple GPUs



# 2-3. Faster Input pipelines

Training Data - Input pipeline(Bottleneck) -> Multiple GPU/TPU

# Reading Data into Tensorflow
1. Directly feed from Python
2. Native Tensorflow Ops
3. Read transformed tf records

### Input pipeline as an ETL Process
- Extract -> Transform -> Load

# 2-4 Data parallelism(All Reduce)

### Data paralelism is a way to increase training throughput
### Distribution API Strategy
- Easy to use
- Fast to train

### Training with Estimator API
### Mirrored Strategy
- No change to the model or training loop
- No change to input function (requires tf.data.Dataset)
- Checkpoints and summaries are seamless

# Parameter Server approach
### Model parallelism lets you distribute a model across GPUs
### Large embeddings need multiple machines to map sparse data
### Estimator train_and_evaluate() handles all this
### Estimator contains the implementation of three function - training, eval, serving
### By encapsulating details about sessions and graphs, it also supports exporting the model for serving
### train_and_evaluate bundles together a distributed workflow

seamless 1. 솔기가 없는   2. (중간에 끊어짐이 없이) 아주 매끄러운, 천의무봉의<br>
sparse data 희소 데이터 ( 차원/전체 공간에 비해 데이터가 있는 공간이 매우 협소한 데이터를 의미합니다)<br>
sparse (흔히 넓은 지역에 분포된 정도가) 드문, (밀도가) 희박한<br>
configuration 1. 배열, 배치; 배열 형태   2. 환경 설정<br>

# 2-5 Inference

- Aspects of performance during inference - QPS, Microservice, Cost
### Implementation Options
- REST/HTTP API - For Streaming Pipelines
- Cloud Machine Learning Engine - For Batch Pipelines
- Cloud Dataflow - For Batch and Streaming Pipelines
### Batch = Bounded Dataset
### Performance for Batch Pipelines
- CMLE + Microbatching : Best Option for maintainability and speed
- SavedModel : Best option for high-speed inference below some limit

enrich 1. 질을 높이다, 풍요롭게 하다; (식품에 어떤 영양소를) 강화하다   2. (더) 부유하게 만들다

# Designing High-performance ML systems

1. Machine learning training performance can be bound by:
> Input/output speed, Read latency, Number of data points, Number of open ports, Computation speed, Memory
>> Input/output speed,  Computation speed,  Memory

2. If each of your examples is large in terms of size, requires parsing and your model is relatively simple and shallow, then your model is likely to be
>I/O bound, so you should look for ways to store data more efficiently and ways to parallelize the reads, CPU-bound, so you should use GPUs or TPUs, Latency-bound, so you should use faster hardware
>> I/O bound, so you should look for ways to store data more efficiently and ways to parallelize the reads

3. For the fastest I/O performance in TensorFlow
>Read TF records into your model, Read in parallel threads, Use fused operations, prefetch the data
>> Read TF records into your model, Read in parallel threads, Use fused operations, prefetch the data

4. Consider Sync All Reduce if
> You have many distributed workers, You have a single machine that has multiple devices with fast interconnect
>> You have a single machine that has multiple devices with fast interconnect

# 3. Introduction\

## __Learn how to__

- Build hybrid cloud machine learning models
- Optimize Tensorflow graphs for mobile

### Agenda
- Kubeflow for hybrid cloud
- Optimizing Tensorflow for mobile

### Choose from ready-made ML models
- Vision,Translation,Speech, Natural Language

### Customize ready-made ML models
- Auto-ML

### Brain, train, and serve, your own custom ML Models
- ML Engine
-  Storage, Bigquery, Datalab, Model Management, Pipelines

### ML runtimes in a cloud-native environment
1. Prototype with Cloud Datalab or Deep Learning Image
2. Distribute and autoscale training and predictions with Cloud ML Engine

### you may not be able to do machine learning solely on Google Cloud
- Tied to On-Premise Infrastructure
- Multi Cloud System Architecture
- Running ML on the edge

### Kubernetes minimizes infrastructure management
### Kubeflow enable hybrid machine learning
- GKE

premise (주장의) 전제

# 3-1. KubeFlow

## Machine Learning on Hybrid Cloud
- Composability
- Portability
- Scalability

## Composability
- Build a model

### Building a model is only one part of the entire system
- 5% build a model, rest are 95%

### Composability is about microservices

## Portability
- Experimentation 
 - Model,UX, Tooling, Framework, Storage, Runtime, Drivers, OS
- Training
- Cloud

### "Portability are not problem" wrong
- it is essential

### Your Labtop Counts

## Scalability
- More accelerators(GPU,TPU)
- More CPUs
- More disk/networking
- More skillsets(data engineers, data scientist)
- More teams
- More experiments

# KubeFlow

### Oh you want to use ML on K8s?
First become an expert in:
- Containers
- Packaging
- Kubernetes service endpoints
- Persistent volumes
- Scaling
- Immutable deployments
- GPUs, Drivers & the GPL
- Cloud APIs
- DevOps

### Make it Easy for Everyone to Develop, Deploy and Mange Portable, Distributed ML on Kubernets

### What's in the box?
- Jupyter notebook
- Multi-architecture, distributed training
- Multi-framework model serving
- Examples and walkthroughs for getting started
- Ksonnet packaging for customizing it yourself!

Composability : 점진적으로 새로운 기능을 더할 수 있는 웹서비스를 가능하게 하는 장비<br>
sonnet : 소네트(10개의 음절로 구성되는 시행 14개가 일정한 운율로 이어지는 14행시)<br>
asynchronous : 동시에 존재하지 않는<br>
deploy 1. (군대·무기를) 배치하다   2. 효율적으로 사용하다

https://github.com/amygdala/code-snippets/tree/master/ml/kubeflow-pipelines

ease 1. 쉬움, 용이함, 편의성   2. (근심걱정 없이) 편안함, 안락함   3. 편해지다;<br>
render  1. (어떤 상태가 되게) 만들다   2. (특히 어떤 것에 대한 대가로 또는 기대에 따라) 주다<br>

# Kubeflow Benefits
- Portability
- Composability and reproducibility
- Scalability
- Visualization and Collaboration

# Optimiaing Tensorflow for mobile

### Increasingly, applications are combining ML with mobile apps
- Image/OCR
- Speech <=> Text
- Translation

### ML models can help extract meaning from raw data, thus reducing network traffic
- Image recognition: send raw image v, send detected label
- Motion detection: send raw motion v. send feature vector

### From mobile devices, we often can't use the microservices approach
- Monlithic Service
- Microservice
 - Microservices can add unwanted latency

superimpose 1. (이미지를 결합하여 보여줄 수 있도록) 겹쳐 놓다   2. (어떤 요소·특질을) 덧붙이다<br>
Monolithic 1. 하나의 암석으로 된; [건축] 중공 초석의  2. <조직·단결 등이> 단일체의, 한 덩어리로 뭉친  3. 획일적이고 자유가 없는 <사회>  4. (전자) 단일 결정(結晶)으로 된 <칩>, 모놀리식의 <회로> 


### Tensorflow supports multiple mobile platforms
- Tensorflow Lite
 - Reduce code footprint
 - Quantization
 - Lower precision arithmetic
 
### Build with Bazel by starting with a git clone
### Cocoapods support for iOS
### Understand how to Code with the API
### Even though we have talked primarily about prediction on mobile, a new frontier is federated learning

### Large neural network can be compressed
### There are several methods to reduce model size
- Freeze graph
- Transform the graph
- Quantize weights and calculations

### Freezing a graph can do load time optimization
- Converts variables to constants and remove checkpoints
### Transform your graph to remove nodes you don't use in prediction
- strip_unused_nodes:
 - Remove training-only operations
- fold_batch_norms:
 - Remove Muls for batch norm
- quantize_weights quantize_nodes
 - Add quantization

### Quantizing weights and calculations boosts performance
### Tensorflow Lite is optimized for mobile apps

1. Which of these are reasons why you may not be able to machine learning solely on Google Cloud?
> You are tied to on-premises or multi-cloud infrastructure due to business reasons,You need to run inference on the edge, TensorFlow is not supported on Google Cloud
>>  You are tied to on-premises or multi-cloud infrastructure due to business reasons,You need to run inference on the edge

2. A key principle behind Kubeflow is portability so that you can:
> Move your model from on-prem to Google Cloud

# Summary
- Build hybrid cloud machine learning models
- Optimize Tensorflow graph for mobile


# Summary
## Agenda
- Architecting Production ML Systems
- Ingesting data for Cloud-based analytics and ML
- Designing Adaptable ML systems
- Designing High Performance ML Systems
- Hybrid ML Systems

### Training and Serving Decision
- cloud function and add engine, cloud dataflow

### Data Migration Options


https://cloud.google.com/blog/products/gcp/simplifying-machine-learning-on-open-hybrid-clouds-with-kubeflow<br>
https://cloud.google.com/ml-engine/docs/tensorflow/technical-overview