In [None]:
"""
1. Define the task
2. Develop a model
3. Deploy the model
"""

In [None]:
"""
1. Define the task:

(a) Frame the problem
  (i) Understand the inputs. Understand the targets (annotations). Understand the problem statement and business use case.
  (ii) What is the machine learning task?
  Binary classification, Multiclass classification, Scalar regression, Vector regression, Multi-label classification, ranking,
  image segmentation, generation, or reinforcement learning
  (iii) What do existing solutions look like?
  (iv) Understand constraints (end-to-end crypted, privacy issues)

(b) Collect a dataset
  (i) Annotation of the dataset -> yourself, crowdsourcing platforms, or outsourcing to a subject matter expert company
  Issue of control vs time and money and knowledge
  (ii) Training data should be representative of the production data

(c) Understand your data
  (i) For numerical features, plot histograms to see range of values and the frequency of different values
  (ii) For images and natural language processing, look at actual samples and confirm annotations
  (iii) Target leaking by chance from inputs
  (iv) Balanced classes or class-imbalance?
  (v) Deal with missing values

(d) Choose a measure of success
  (i) For balanced classification problems: accuracy and ROC AUC
  (ii) For class-imbalanced classification problems: precision and recall and [weighted form of accuracy and ROC AUC]
"""

In [None]:
"""
2. Develop a model

(a) Prepare the data
  (i) Vectorize the inputs and targets to float32 tensors
  (ii) Normalize each feature independently with mean = 0 and standard deviation = 1 to have small values and homogenous values
  (iii) Account for missing values: categorical (one-hot encoding or NA) ; numerical: average, median, train other features to predict missing feature

(b) Choose an evaluation protocol
  (i) Maintaining a holdout validation set if more data available
  (ii) K-fold validation for too few samples for holdout validation to be reliable
  (iii) Iterated K-fold validation for best model evaluation when little data is available

(c) Beat a baseline (validation metrics to improve)
  (i) Feature engineering
  (ii) Choosing right architecture priors
  (iii) Batch size, learning rate, loss function to use (train loss to go down)

(d) Scale up (validation metrics are going down, that is, some generalization and model is fitting do the following for overfitting)
  (i) Add layers
  (ii) Make the layers bigger
  (iii) Train for more epochs

(e) Regularize and tune your models (maximize generalization)
  (i) Reducing model capacity
  (ii) L1 and L2 weight regularization
  (iii) Dropout
  (iv) Feature engineering
  (v) Try different hyperparameters
"""

In [1]:
"""
3. Deploy the model

(a) Explain your work to stakeholders and set expectations
(b) Ship an inference model
  (i) Deploying a model as a REST API -> TensorFlow servicing
  Low latency is fine, internet connectivity needed, input data is not highly sensitive

  (ii) Deloying a model on a device -> TensorFlow lite [android and ios phones]
  Strict latency requirements or low internet connectivity, input data is sensitive,
  convert to small model as model needs to run under the memory and power constraints of target user's device
  trade-off between runtime efficiency and accuracy

  (iii) Deploying a model on a browser -> TensorFlow.js [web browser]
  Off load compute cost to end user, input data to stay on end user's computer or phone, strict latency requirements (request, inference, answer)
  need app to work without connectivity, after the model has been downloaded and cached

  (iv) Weight pruning (selecting significant weights) so as to save memory and compute footprint
       Weight quantization (convert float32 values to int8) so as to get size that is quarter of the float32 model and maintains similar accuracy

(d) Monitor performance in wild
see if business metrics are improving
compare inference of model and reconcile with manual audits of annotations


(d) Maintain your model
concept drift
keep checking production data for variances


"""