# Deep Learning - Nasir Hussain - 2021/03/14

# 6 The universal workflow of machine learning

- Topics to cover
  - Steps for framing a machine learning problem
  - Steps for developing a working model
  - Steps for deploying your model in production and maintaining it


- In real life problems
  - don’t start from a dataset
  - start from a problem


- Parts of universal workflow
  1. Define the task
     - Understand the problem domain and the business logic underlying what the customer asked for
     - Collect a dataset, understand what the data represents, and choose how you will measure success on the task
  2. Develop a model
     - Prepare your data so that it can be processed by a machine learning model
     - select a model evaluation protocol and a simple baseline to beat
     - train a first model that has generalization power and that can overfit
     - then regularize and tune your model until you achieve the best possible generalization performance
  3. Deploy the model
     - Present your work to stakeholders
     - ship the model to a web server, a mobile app, a web page, or an embedded device, monitor the model’s performance in the wild, and start collecting the data you’ll need to build the next-generation model.


## 6.1 Define the task

- a deep understanding of the context of what being done
  - Why is your customer trying to solve this particular problem?
  - What value will they derive from the solution
  - how will your model be used
  - how will it fit into your customer’s business processes?
  - What kind of data is available, or could be collected?
  - What kind of machine learning task can be mapped to the business problem?


- Steps to define the task
    - Frame the problem
    - Collect a dataset

### 6.1.1 Frame the problem

- questions for framing the problem
  - What will your input data be?
  - What are you trying to predict?
  - What type of machine learning task are you facing?
    - binary classification
    - Multiclass classification
    - Scalar regression
    - Vector regression
    - Multiclass, multilabel classification
    - Image segmentation
    - Ranking
    - clustering
    - generation
    - reinforcement learning
  - What do existing solutions look like?
  - Are there particular constraints you will need to deal with?
- Results from above questions
    - what your inputs will be
    - what your targets will be,
    - what broad type of machine learning task the problem maps to.
- hypotheses to make
    - your targets can be predicted given your inputs
    - the data that’s available is sufficiently informative to learn the relationship between inputs and targets

### 6.1.2 Collect a dataset
-  ability to generalize comes from the properties of the data
    - the number of data points
    - the reliability of labels
    - the quality of features

#### INVESTING IN DATA ANNOTATION INFRASTRUCTURE
- options to annotate data
    - annotate the data yourself
    - use a crowdsourcing platform like Mechanical Turk to collect labels
        - inexpensive and to scale well
        - annotations may end up being quite noisy
    - use the services of a specialized data-labeling company
        - can potentially save you time and money
        - it takes away control
- constraints while annotation
    - Do the data labelers need to be subject matter experts, or could anyone annotate the data? 
    - If annotating the data requires specialized knowledge
        - can you train people to do it?
        - how can you get access to relevant experts?
    - Do you, yourself, understand the way experts come up with the annotations? 
        - If you don’t, you will have to treat your dataset as a black box, and you won’t be able to perform manual feature engineering—this isn’t critical, but it can be limiting.

#### BEWARE OF NON-REPRESENTATIVE DATA
-  models can only make sense of inputs that are similar to what they’ve seen before.
- data used for training should be representative of the production data
- not possible to train on production data
    - find training and production data differ
    - actively correct for these differences.
- Concept drift 
    - it occurs when the properties of the production data change over time, causing model accuracy to gradually decay
    - to deal we need constant
        - data collection
        - annotation
        - model retraining

##### The problem of sampling bias
- Sampling bias occurs when your data collection process interacts with what you are trying to predict, resulting in biased measurements. 

### 6.1.3 Understand your data
-  explore and visualize your data to gain insights about what makes it predictive
    - If data includes images or natural language text
        - take a look at a few samples directly.
    - If data contains numerical features, 
        - plot the histogram of feature values to get a feel for 
            - the range of values taken
            - the frequency of different values
    - If data includes location information
        - plot it on a map
        - Do any clear patterns emerge?
    - Are some samples missing values for some features? 
        - deal with this when you prepare the data
    - If task is a classification problem
        - print the number of instances of each class in your data.
        - Are the classes roughly equally represented? 
            - If not, you will need to account for this imbalance.
    - Check for target leaking
        - the presence of features in your data that provide information about the targets and which may not be available in production.
        - is every feature in data something that will be available in the same form in production?

### 6.1.4 Choose a measure of success
- success?
    - Accuracy
    - Precision and recall
    - Customer retention rate

## 6.2 Develop a model

### 6.2.1 Prepare the data
- vectorization
- normalization
- handling missing values

#### VECTORIZATION
- All inputs and targets in a neural network must typically be tensors of floating-point data
- data need to process must first turn into tensors, a step called data vectorization.
#### VALUE NORMALIZATION
- Data should be
    - Take small values—Typically, most values should be in the 0–1 range.
    - Be homogenous—All features should take values in roughly the same range.
- stricter normalization practice
    - Normalize each feature independently to have a mean of 0.
    - Normalize each feature independently to have a standard deviation of 1.
        ```
        x -= x.mean(axis=0)
        x /= x.std(axis=0)
        ```
#### HANDLING MISSING VALUES
- If the feature is categorical
    - create a new category that means “the value is missing.”
    - The model will automatically learn what this implies with respect to the targets.
- If the feature is numerical, 
    - avoid inputting an arbitrary value like "0"
    - replace the missing value with the average or median value for the feature in the dataset.

### 6.2.2 Choose an evaluation protocol
- evaluation protocols
    - Maintaining a holdout validation set
        - This is the way to go when you have plenty of data.
    - Doing K-fold cross-validation
        - This is the right choice when you have too few samples for holdout validation to be reliable
    - Doing iterated K-fold validation
        - This is for performing highly accurate model evaluation when little data is available

### 6.2.3 Beat a baseline

- Feature engineering
  - Filter out uninformative features (feature selection) and use your knowledge of the problem to develop new features that are likely to be useful.
- Selecting the correct architecture priors
  - What type of model architecture will you use?
    - A densely connected network
    - a convnet
    - a recurrent neural network
    - a Transformer?
    - Is deep learning even a good approach for the task, or should you use something else?
- Selecting a good-enough training configuration
  - What loss function should you use?
  - What batch size and learning rate?

#### Picking the right loss function
| Problem type                            | Last-layer activation | Loss function            |
| --------------------------------------- | --------------------- | ------------------------ |
| Binary classification                   | sigmoid               | binary_crossentropy      |
| Multiclass, single-label classification | softmax               | categorical_crossentropy |
| Multiclass, multilabel classification   | sigmoid               | binary_crossentropy      |


### 6.2.4 Scale up: Develop a model that overfits
- Once you’ve obtained a model that has statistical power, the question?
    - is model sufficiently powerful? 
    - Does it have enough layers and parameters to properly model the problem at hand?
- The ideal model is one that stands right at the border between 
    - underfitting and overfitting
    - undercapacity and overcapacity.
- develop a model that overfits
    1. Add layers.
    2. Make the layers bigger.
    3. Train for more epochs.
- model’s performance on the validation data begins to degrade, you’ve achieved overfitting.

### 6.2.5 Regularize and tune your model
- maximize generalization performance.
    - Try different architectures; add or remove layers.
    - Add dropout.
    - If your model is small, add L1 or L2 regularization.
    - Try different hyperparameters (such as the number of units per layer or the learning rate of the optimizer) to find the optimal configuration.
    - Optionally, iterate on data curation or feature engineering: collect and annotate more data, develop better features, or remove features that don’t seem to be informative.
- developed a satisfactory model configuration
    - train your final production model on all the available data
    - evaluate it one last time on the test set

## 6.3 Deploy the model

### 6.3.1 Explain your work to stakeholders and set expectations
- failure modes
- model performance expectations
### 6.3.2 Ship an inference model
#### DEPLOYING A MODEL AS A REST API
- use this approch when
    - The application that will consume the model’s prediction will have reliable access to the internet
    - The application does not have strict latency requirements
    - The input data sent for inference is not highly sensitive
#### DEPLOYING A MODEL ON A DEVICE
- use this approach when
    - model has strict latency constraints or needs to run in a low-connectivity environment.
    - Your model can be made sufficiently small that it can run under the memory and power constraints of the target device.
    - Getting the highest possible accuracy isn’t mission critical for your task. There is always a trade-off between runtime efficiency and accuracy
    - The input data is strictly sensitive and thus shouldn’t be decryptable on a remote server.
#### DEPLOYING A MODEL IN THE BROWSER
- You want to offload compute to the end user, which can dramatically reduce server costs.
- The input data needs to stay on the end user’s computer or phone.
- Your application has strict latency constraints.
- You need your app to keep working without connectivity, after the model has been downloaded and cached.
#### INFERENCE MODEL OPTIMIZATION
-  popular optimization techniques
    - Weight pruning—
        - Not every coefficient in a weight tensor contributes equally to the predictions. It’s possible to considerably lower the number of parameters in the layers of your model by only keeping the most significant ones. This reduces the memory and compute footprint of your model, at a small cost in performance metrics. By deciding how much pruning you want to apply, you are in control of the trade-off between size and accuracy.
    - Weight quantization
        - Deep learning models are trained with single-precision floating-point weights. However, it’s possible to quantize weights to 8-bit signed integers to get an inferenceonly model that’s a quarter the size but remains near the accuracy of the original model.

### 6.3.3 Monitor your model in the wild
- keep monitoring its behavior

### 6.3.4 Maintain your model

## Summary
- The universal workflow of machine learning
    - Define the task
        - Frame the problem
        - Collect a dataset
        - Understand your data
        - Choose a measure of success
    - Develop a model
        - Prepare the data
            - vectorization
            - normalization
            - handling missing values
        - Choose an evaluation protocol
            - holdout validation set
            - K-fold cross-validation
            - iterated K-fold cross-validation
        - Beat a baseline
            - Picking the right loss function
        - Scale up: Develop a model that overfits
            - Add layers.
            - Make the layers bigger.
            - Train for more epochs.
        - Regularize and tune your model
            - Try different architectures; add or remove layers.
            - Add dropout.
            - If your model is small, add L1 or L2 regularization.
            - Try different hyperparameters (number of units, learning rate)
            - Optionally, feature engineering
    - Deploy the model
        - Explain your work to stakeholders and set expectations
        - Ship an inference model
            - deploying a model as a rest api
            - deploying a model on a device
            - deploying a model in the browser
            - inference model optimization
        - Monitor your model in the wild
            - keep monitoring its behavior
        - Maintain your model

- When you take on a new machine learning project, first define the problem at hand:
    - Understand the broader context of what you’re setting out to do—what’s the end goal and what are the constraints?
    - Collect and annotate a dataset; make sure you understand your data in depth.
    - Choose how you’ll measure success for your problem—what metrics will you monitor on your validation data?
- Once you understand the problem and you have an appropriate dataset, develop a model:
    - Prepare your data.
    - Pick your evaluation protocol: holdout validation? K-fold validation? Which portion of the data should you use for validation?
    - Achieve statistical power: beat a simple baseline.
    - Scale up: develop a model that can overfit.
    - Regularize your model and tune its hyperparameters, based on performance on the validation data. A lot of machine learning research tends to focus only on this step, but keep the big picture in mind.