# I. What Is Amazon SageMaker?
- Amazon SageMaker is a **fully managed machine learning service**.
- helps to **quickly and easily build and train machine learning models**, and then **directly deploy** them into a production-ready hosted environment.
- provides an **integrated Jupyter** authoring **notebook** instance for **easy access** to your **data** sources
- provides **common machine learning algorithms** that are **optimized** to run efficiently against **extremely large data** in a **distributed environment**.
- Deploy **with a single click console**

# II. How It Works
This section provides an **overview of machine learning** and explains how Amazon SageMaker works.
## 2.1. Machine Learning with Amazon SageMaker
Typical workflow for creating a machine learning model:
![](https://docs.aws.amazon.com/sagemaker/latest/dg/images/ml-concepts-10.png)
1. **Generate example data**<br>
    Data type depends on business problem<br>
    a. **Fetch the data** — 
    pull the dataset or datasets into a single repository<br>
    b. **Clean the data** — 
    To improve model training<br>
    c. **Prepare or transform the data** — 
    To improve performance
2. **Train a model**:<br>
    a. **Training** the model — To train a model, you need<br>
    - an algorithm (can be provided by Amazon Sagemaker or self-implementation)
    - compute resources<br>
    b. **Evaluating** the model — 
    to determine whether the accuracy of the inferences is acceptable.<br>
    -.use either the AWS SDK for Python (Boto) or the high-level Python library in SageMaker<br>
    -.use a Jupyter notebook in SageMaker notebook instance to train and evaluate model.
3. **Deploy the model** — independently with Amazon SageMaker hosting services, decoupling from application code. 

Machine learning is a **continuous cycle**: 
- => **deploy** model
- => **monitor** inferences
- => **collect** "ground truth"
- => **evalutate** model
- => **retrain** model
- => **deploy new** model
- => ...

## 2.2. Explore and Preprocess Data

- Use a Jupyter notebook on an Amazon SageMaker 
- use a model to transform data by using Amazon SageMaker batch transform

## 2.3. Training a Model with Amazon SageMaker

![](https://docs.aws.amazon.com/sagemaker/latest/dg/images/sagemaker-architecture.png)

Create a training job by SageMaker Conslole or API, including:
- URL to data on S3
- Compute resources, managed by Amazon SageMaker
- URL for output data on S3
- Amazon Elastic Container Registry path where the training code is stored

**Training** options:
- Use an algorithm provided by Amazon SageMaker
- Use Apache Spark with Amazon SageMaker, similarly to use Spark MLLib
- Submit custom code to train with deep learning frameworks: [TensorFlow](https://docs.aws.amazon.com/sagemaker/latest/dg/tf.html), [Apache MXNet](https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet.html)
- Use custom algorithms in  a Docker image

**WorkFlow**:
- User creates the training job
- => Amazon SageMaker launches the ML compute instances
- => SageMaker uses the training code and the training dataset to train the model
- =>SageMaker saves the resulting model artifacts and other output in the S3 bucket 

**Important**
- Prevent out-of-memory error!

## 2.4. Model Deployment in Amazon SageMaker
- use Amazon **[SageMaker hosting services](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)** to set up a persistent endpoint to get **one prediction** at a time, .
- use Amazon **[SageMaker batch transform](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html)** to get **predictions** for an **entire dataset**.

### 2.4.1. Deploying a Model on Amazon SageMaker Hosting Services
A three-step process:
1. Create a model in Amazon SageMaker for finding model components
2. Create an endpoint configuration for an HTTPS endpoint, configure the endpoint to elastically scale the deployed ML compute instances in production
3. Create an HTTPS endpoint: endpoint configuration to Amazon SageMaker

**Considerations**
- client application sends requests to the Amazon SageMaker HTTPS endpoint to obtain inferences from a deployed model, requests can be sent to this endpoint from Jupyter notebook during testing.
- model trained with Amazon SageMaker can be deployed to specific deployment target.
- multiple variants of a model can be deployed to the same Amazon SageMaker HTTPS endpoint.
- a *ProductionVariant* can be configured to use Application Auto Scaling
- an endpoint can be modified without taking models that are already deployed into production out of service.
- **Changing or deleting** model artifacts or changing inference code **after deploying** a model produces **unpredictable results**.

### 2.4.2. Getting Inferences by Using Amazon SageMaker Batch Transform 
![](https://docs.aws.amazon.com/sagemaker/latest/dg/images/batch-transform.png)

Batch transform manages all compute resources necessary to get inferences. This includes launching instances and deleting them after the transform job completes. 

To perform a batch transform, create a transform job including:
- path to data on S3 bucket
- compute resources
- path to S3 for output data
- name of model in the transform job

Batch transform is ideal for situations where:
- You want to get inferences for an entire dataset and store them online.
- You don't need a persistent endpoint that applications can call to get inferences.
- You don't need the sub-second latency that Amazon SageMaker hosted endpoints provide.
- You want to preprocess your data before using the data to train a new model or generate inferences.

**Considerations**:
- transform job can be created by SageMaker Console or API
- Amazon SageMaker follows the transform job to read input, launches ML and save output
- Amazon SageMaker uses Multipart Upload API to upload output data results from a transform job to S3. 
- For testing model variants, create separate transform jobs for each variant using a validation data set.
- For large datasets or data of indeterminate size, create an infinite stream.

## 2.5. Validating Machine Learning Models 

After training, a model needs to be evaluated to determine whether its performance and accuracy allow to achieve business goals.

- **Offline testing**: Deploy trained model to an alpha endpoint, and use historical data to send inference requests to it.
- **Online testing** with live data: choose to send a portion of the traffic to a model variant for evaluation.

**Options** for **offline** evaluation:
- **using a "holdout set"**: use 20-30% of the training data for validation
- **k-fold validation**: split data into k+1 folds, user 1 folds for validation & k folds for training; run k+1 times => k+1 models => aggregate to obtain final model.

## 2.6 The Amazon SageMaker Programming Model 

- use SageMaker APIs to create and manage notebook instances and train and deploy models
- alternatives:
    - Use the Amazon SageMaker console
    - Modify the example Jupyter notebooks
    - Write model training and inference code from scratch
        - high-level Python library
        - AWS SDK

# III. Getting Started

## [3.1: Setting Up](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html)
### [3.1.1: Create an AWS Account and an Administrator User](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-account.html)
- [Create an AWS Account](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-account.html#gs-account-create)
- [Create an IAM Administrator User and Sign In](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-account.html#gs-account-user)
### [3.1.2: Create an S3 Bucket](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-config-permissions.html)

## [3.2: Create an Amazon SageMaker Notebook Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html)

## [3.3: Train a Model with a Built-in Algorithm and Deploy It](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1.html)

### [3.3.1: Create a Jupyter Notebook and Initialize Variables](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-prepare.html)
### [3.3.2: Download, Explore, and Transform the Training Data](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-preprocess-data.html)
- 1: Download the MNIST Dataset
- 2: Explore the Training Dataset
- 3: Transform the Training Dataset and Upload It to S3

### [3.3.3: Train a Model](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html)
- 1: Choose the Training Algorithm
- 2: Create a Training Job

### [3.3.4: Deploy the Model to Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-model-deployment.html)
- 1: Deploy the Model to Amazon **SageMaker Hosting** Services
- 2: Deploy the Model to Amazon **SageMaker Batch Transform**

### [3.3.5: Validate the Model](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-test-model.html)

## [3.4: Clean up](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html)
## [3.5: Additional Considerations](https://docs.aws.amazon.com/sagemaker/latest/dg/getting-started-client-app.html)

# IV. Automatic Model Tuning
- Automatic model tuning = **hyperparameter tuning**, finds the **best** version of a **model** by running **many training jobs** on dataset using the **algorithm** and **ranges of hyperparameters** that were specified.
- use Amazon SageMaker automatic model tuning with **built-in** algorithms, **custom algorithms**, and Amazon SageMaker **pre-built containers** for machine learning frameworks
- Before start hyperparameter tuning, it requires a well-defined machine learning problem, including:
    - A **dataset**
    - An understanding of the **type of algorithm** needed to train
    - A clear understanding of **how to measure success**
    
## 4.1 How Hyperparameter Tuning Works
- **Workflow**:
    - hyperparameter tuning makes guesses about which hyperparameter combinations are likely to get the best results, 
    - runs training jobs to test these guesses. 
    - after testing uses regression to choose the next set of hyperparameter values
- Hyperparameter tuning uses an Amazon SageMaker implementation of **Bayesian optimization**
- Use explore/exploit trade-off strategy
- **Note**:
    - might not improve model
    - exploring all of the possible combinations is impractical with complex model
    - need to choose the right ranges to explore
- **Refernces**:
    - A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning, [link](https://arxiv.org/abs/1012.2599)
    - Practical Bayesian Optimization of Machine Learning Algorithms, [link](https://arxiv.org/abs/1206.2944)
    - Taking the Human Out of the Loop: A Review of Bayesian Optimization, [link](http://ieeexplore.ieee.org/document/7352306/?reload=true)

## 4.2 Defining Objective Metrics
- Not required for SageMaker built-in algorithms, just select and use
- Required with regular expressions (**regex**) for custom algorithms. Algorthms needs to emit at least one metric by writing evaluation data to *stderr* or *stdout*
- The hyperparameter tuning job returns the training job that returned the best value for the objective metric as the best training job.

## 4.3 Defining Hyperparameter Ranges
Choosing hyperparameters and ranges significantly affects the performance of your tuning job.

Example:

## 4.4 Example: Hyperparameter Tuning Job
[Link](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex.html)

## 4.5 Design Considerations
- Choosing the Number of Hyperparameters
    - It's possible to use up to 20 variables simultaneously in a hyperparameter tuning job
- Choosing Hyperparameter Ranges
    - better results can obtain by searching only in a small range where all possible values in the range are reasonable.
- Use Logarithmic Scales for Hyperparameters
    - could improve hyperparameter optimization
- Choosing the Best Degree of Parallelism
    - running in parallel gets more work done quickly
    - running one training job at a time achieves the best results with the least amount of compute time. 
- Running Training Jobs on Multiple Instances
    - hyperparameter tuning uses the last-reported objective metric from all instances

# V. Using Notebook Instances
[Link](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html)
- Creating a Notebook Instance
- Accessing Notebook Instances
- Using Example Notebooks
- Set the Notebook Kernel
- Installing External Libraries and Kernels in Notebook Instances

# VI. Using Built-in Algorithms
Because a model is created to **address a business question**, the **first** step is to **understand the problem** needed to solve. Specifically, the format of the answer influences the algorithm.

**Examples:**
- Answers that fit into discrete categories >>> use *Linear Learner* and *XGBoost*
- Answers that are quantitative >>> also use *Linear Learner* and *XGBoost*
- Answers in the form of discrete recommendations >>> use *Factorization Machines*
- Classify customer >>> K-Means Algorithm
- understand customer attributes >>> PCA

## 6.1 Common Information
### 6.1.1 Common Parameters 

**Computer resources**
![](https://imgur.com/f3gNKOm.png)


**AWS region**<br>
[Link](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html)

### 6.1.2 Common Data Formats 
#### Training
- Training Data Formats 
    - CSV
    - protobuf [recordIO](https://mxnet.incubator.apache.org/architecture/note_data_loading.html#data-format) format
- Trained Model Deserialization
    - Amazon SageMaker models are stored as **model.tar.gz** in the S3 bucket specified in *OutputDataConfig S3OutputPath* parameter of the *create_training_job* call.

#### Inference
- Inference Request Serialization
    - text/csv, 
    - application/json, 
    - application/x-recordio-protobuf.
    - text/x-libsvm
- Inference Response Deserialization 
    - Amazon SageMaker algorithms return JSON in several layouts.
- Common Request Formats for All Algorithms 
    - JSON
    - JSONLINES
    - CSV
    - RECORDIO
- Using Batch Transform with Build-in Algorithms 
    - JSONLINES

### 6.1.3 Suggested Instance Types 
For training and hosting Amazon SageMaker algorithms, we recommend using the following EC2 instance types:
- ml.m4.xlarge, ml.m4.4xlarge, and ml.m4.10xlarge
- ml.c4.xlarge, ml.c4.2xlarge, and ml.c4.8xlarge
- ml.p2.xlarge, ml.p2.8xlarge, and ml.p2.16xlarge

### 6.1.4 Logs 
**Note**
If a job fails and logs do not appear in CloudWatch, it's likely that an error occurred before the start of training. Reasons include specifying the wrong training image or S3 location.

The contents of logs vary by algorithms. However, you can typically find the following information:
- Confirmation of arguments provided at the beginning of the log
- Errors that occurred during training
- Measurement of an algorithms accuracy or numerical performance
- Timings for the algorithm, and any major stages within the algorithm

## 6.2 BlazingText
[link](https://dl.acm.org/citation.cfm?doid=3146347.3146354)
- A highly optimized implementations of the **Word2vec** (sentiment analysis, named entity recognition, machine translation) and **text classification** (web search, information retrieval, ranking and document classification) algorithms.
- Similar to Word2vec, it provides the Skip-gram and continuous bag-of-words (CBOW) training architectures.

**BlazingText provides the following features**:
- Accelerated training of fastText text classifier on multi-core CPUs or a GPU and Word2Vec on GPUs using highly optimized CUDA kernels.
- Enriched Word Vectors with Subword Information by learning vector representations for character n-grams.
- A batch_skipgram mode for the Word2Vec algorithm that allows faster training and distributed computation across multiple CPU nodes. 

### 6.2.1 Input/Output Interface
### 6.2.2 Training and Validation Data Format
- Word2Vec algorithm: a training sentence per line
- Text Classification algorithm: a training sentence per line along with the labels
### 6.2.3 Model artifacts and Inference 
- Word2Vec algorithm: *vectors.txt* which contains words to vectors mapping (compatible with other tools like Gensim and Spacy) and *vectors.bin*.
- Text Classification algorithm: model.bin 
### 6.2.4 EC2 Instance [Recommendation]()
### 6.2.5 BlazingText Sample [Notebooks](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.ipynb)

### 6.2.6 [Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext_hyperparameters.html)
#### Word2Vec Hyperparameters
- mode, batch_size, buckets, epochs, evaluation, learning_rate
- min_char, min_count, max_char, negative_samples
- sampling_threshold, subwords, vector_dim, window_size
#### Text Classification Hyperparameters
- mode, buckets, early_stopping, epochs, learning_rate
- min_count, min_epochs, patience, vector_dim, word_ngrams

### 6.2.7 [Tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext-tuning.html) a BlazingText Model
#### Metrics Computed by the BlazingText Algorithm 
 <div class="table">
    <div class="table-contents">
       <table id="w1584aac23c25c29b7b7">
          <tr>
             <th>Metric Name</th>
             <th>Description</th>
             <th>Optimization Direction</th>
          </tr>
          <tr>
             <td><code class="code">train:mean_rho</code></td>
             <td>
                <p>Mean rho (Spearman's rank correlation coefficient) on <a href="https://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/" target="_blank">WS-353 word similarity datasets</a>.
                </p>
             </td>
             <td>
                <p>Maximize</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">validation:accuracy</code></td>
             <td>
                <p>Classification accuracy on user specified validation dataset.
                </p>
             </td>
             <td>
                <p>Maximize</p>
             </td>
          </tr>
       </table>
   </div>
 </div>

#### Tunable Hyperparameters for Word2Vec
<div class="table">
    <div class="table-contents">
       <table id="w1584aac23c25c29b9b3b5">
          <tr>
             <th>Parameter Name</th>
             <th>Parameter Type</th>
             <th>Recommended Ranges</th>
          </tr>
          <tr>
             <td><code class="code">batch_size</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[8-32]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">epochs</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[5-15]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">learning_rate</code></td>
             <td>
                <p>ContinuousParameterRange</p>
             </td>
             <td>
                <p>MinValue: 0.005, MaxValue: 0.01</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">min_count</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[0-100]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">mode</code></td>
             <td>
                <p>CategoricalParameterRange</p>
             </td>
             <td>
                <p>['batch_skipgram', 'skipgram', 'cbow']</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">negative_samples</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[5-25]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">sampling_threshold</code></td>
             <td>
                <p>ContinuousParameterRange</p>
             </td>
             <td>
               <p>MinValue: 0.0001, MaxValue: 0.001</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">vector_dim</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[32-300]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">window_size</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[1-10]</p>
             </td>
          </tr>
       </table>
   </div>
 </div>
 
#### Tunable Hyperparameters for Text Classification
 <div class="table">
    <div class="table-contents">
       <table id="w1584aac23c25c29b9b5b5">
          <tr>
             <th>Parameter Name</th>
             <th>Parameter Type</th>
             <th>Recommended Ranges</th>
          </tr>
          <tr>
             <td><code class="code">buckets</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[1000000-10000000]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">epochs</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[5-15]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">learning_rate</code></td>
             <td>
                <p>ContinuousParameterRange</p>
             </td>
             <td>
                <p>MinValue: 0.005, MaxValue: 0.01</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">min_count</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[0-100]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">mode</code></td>
             <td>
                <p>CategoricalParameterRange</p>
             </td>
             <td>
                <p>['supervised']</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">vector_dim</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[32-300]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">word_ngrams</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[1-3]</p>
             </td>
          </tr>
       </table>
   </div>
</div>

## 6.3 DeepAR Forecasting
[Link](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html)

Amazon SageMaker DeepAR is a supervised learning algorithm for forecasting scalar (that is, one-dimensional) **time series** using recurrent neural networks (**RNN**) - **ARIMA, ETS**

Examples of such time series groupings are **demand** for different **products**, **server loads**, and **requests** for web pages. In this case, it can be beneficial to train a single model jointly over all of these time series. 

**Topics**:
### 6.3.1    Input/Output Interface
- DeepAR supports two data channels **train** and **test** with *JSONLINES* file format (either ***.json*** or ***.json.gz*** or ***.parquet***)
- Use RMSE or weighted quantile loss for evaluation
     
     $$\text{RMSE} = \sqrt{\frac{1}{nT}\sum_{i,t}{\left(\hat{y}_{i,t}-y_{i.t}\right)^2}}$$
     $$\text{wQuantileLoss}[\tau] = 2\frac{\sum_{i,t}{Q_{i,t}^{(\tau)}}}{\sum_{i,t}{|y_{i,t}|}}\quad\text{with}\quad Q_{i,t}^{(\tau)}=\begin{cases}(1-\tau)|q_{i,t}^{(\tau)}-y_{i,t}| & \text{ if } q_{i,t}^{(\tau)}>y_{i,t}\\ \tau|q_{i,t}^{(\tau)}-y_{i,t}| & \text{ otherwise } \end{cases}$$


### 6.3.2    Recommended Best Practices
- always provide entire time series for training, testing, and when calling the model for prediction
- dataset can be split into training and test datasets for tuning a DeepAR Model at different end points
- do not use very large values (>400) for *prediction length*
- use the same values for *prediction* and *context* lengths.
- train DeepAR model on as **many time series** as available
### 6.3.3 EC2 Instance Recommendations
- can use GPU and CPU
- use large machine for large model size
    
### 6.3.4 DeepAR Sample Notebooks: 
[Time series forecasting with DeepAR - Synthetic data](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/deepar_synthetic/deepar_synthetic.ipynb)

### 6.3.5 How DeepAR Works
- Under the Hood:<br>
    DeepAR automatically creates feature time series<br>
    DeepAR model is trained by randomly sampling several training examples from each of the time series in the training dataset.<br>
    To capture seasonality patterns, DeepAR also automatically feeds lagged values from the target time series. <br>
    For inference, the trained model takes as input target time series, which might or might not have been used during training, and forecasts a probability distribution for the next prediction_length values.
    
### 6.3.6 [DeepAR Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar_hyperparameters.html)
- context_length, prediction_length
- epochs (+ early_stopping_patience)
- time_freq (every minutes, hourly, daily, weekly, monthly)
- cardinality (for categorical)
- dropout_rate, embedding_dimension, learning_rate
- likelihood (gaussian, beta, negative-binomial, student-T, deterministic-L1)
- mini_batch_size, num_cells, num_dynamic_feat, num_eval_samples, num_layers, test_quantiles

### 6.3.7 Tuning a DeepAR Model
- **Metrics Computed by the DeepAR Algorithm**:
<table id="w1649aac23c28c21b7b5">
  <tr>
     <th>Metric Name</th>
     <th>Description</th>
     <th>Optimization Direction</th>
  </tr>
  <tr>
     <td><code class="code">test:RMSE</code></td>
     <td>
        <p>Root mean square error between forecast and actual target computed on the test set.
        </p>
     </td>
     <td>
        <p>Minimize</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">test:mean_wQuantileLoss</code></td>
     <td>
        <p>Average overall quantile losses computed on the test set. Setting the <code class="code">test_quantiles</code> hyperparameter controls which quantiles are used. 
        </p>
     </td>
     <td>
        <p>Minimize</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">train:final_loss</code></td>
     <td>
        <p>Training negative log-likelihood loss averaged over the last training epoch for the model.
        </p>
     </td>
     <td>
        <p>Minimize</p>
     </td>
  </tr>
</table>    
- **Tunable Hyperparameters**:
<table id="w1649aac23c28c21b9b5">
  <tr>
     <th>Parameter Name</th>
     <th>Parameter Type</th>
     <th>Recommended Ranges</th>
  </tr>
  <tr>
     <td><code class="code">mini_batch_size</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 32, MaxValue: 1028</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">epochs</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 1, MaxValue: 1000</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">context_length</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 1, MaxValue: 200</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">num_cells</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 30, MaxValue: 200</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">num_layers</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 1, MaxValue: 8</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">dropout_rate</code></td>
     <td>
        <p>ContinuousParameterRange</p>
     </td>
     <td>
        <p>MinValue: 0.00, MaxValue: 0.2</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">embedding_dimension</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 1, MaxValue: 50</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">learning_rate</code></td>
     <td>
        <p>ContinuousParameterRange</p>
     </td>
     <td>
        <p>MinValue: 1e-5, MaxValue: 1e-1</p>
     </td>
  </tr>
</table>

### 6.3.8 DeepAR Inference Formats
[**JSON**](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar-in-formats.html)

## 6.4 Factorization Machines
- A factorization machine is a **general-purpose supervised learning** algorithm that can be used for both **classification** and **regression** tasks. 
- It is an **extension of a linear model** that is designed to capture interactions between features within high dimensional sparse datasets economically.

### 6.4.1 Input/Output Interface
- RMSE, Log Loss, Accuracy, F1-score
- application/json, x-recordio-protobuf

### 6.4.2 EC2 Instance Recommendation
- CPUs instances

### 6.4.3 Factorization Machines Sample Notebooks
- [ An Introduction to Factorization Machines with MNIST](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/factorization_machines_mnist/factorization_machines_mnist.ipynb)

### 6.4.4 How Factorization Machines Work
- for prediction task, [FM](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) estimates a function $\hat{y}$ from a feature set $x_i$ to a target domain:
$$\hat{y}=w_0+\sum_i w_ix_i +\sum_i\sum_{j>i}<v_i,v_j>x_ix_j$$
- for regression task, FM minimizes:
$$L=\frac{1}{N}\sum_n(y_n-\hat{y}_n)^2$$
- for classification task, FM minimizes:
$$L=\frac{1}{N}\sum_n\left[y_n\log\hat{p}_n+(1-y_n)\log(1-\hat{p}_n)\right]\text{ where } \hat{p}_n=\frac{1}{1+e^{-\hat{y}_n}}$$

### 6.4.5 Factorization Machines Hyperparameters
- feature_dim, num_factors,
- predictor_type (binary_classifier, regressor)
- bias_init_method, bias_init_scale, bias_init_sigma
- bias_init_value, bias_lr, bias_wd, 
- clip_gradient, epochs, eps
- factors_init_method, factors_init_scale, factors_init_sigma, factors_init_value, factors_lr, factors_wd, 
- linear_lr, linear_init_method, linear_init_scale, linear_init_sigma, linear_init_value, linear_wd, 
- mini_batch_size, rescale_grad

### 6.4.6 Tuning a Factorization Machines Model
- **Metrics Computed by the Factorization Machines Algorithm**

**Regression**
<table id="w1649aac23c31c21b7b5">
  <tr>
     <th>Metric Name</th>
     <th>Description</th>
     <th>Optimization Direction</th>
  </tr>
  <tr>
     <td><code class="code">test:rmse</code></td>
     <td><p>Root Mean Square Error</p></td>
     <td><p>Minimize</p></td>
  </tr>
</table>

**Classification**
<table id="w1649aac23c31c21b7b9">
  <tr>
     <th>Metric Name</th>
     <th>Description</th>
     <th>Optimization Direction</th>
  </tr>
  <tr>
     <td><code class="code">test:binary_classification_accuracy</code></td>
     <td><p>Accuracy</p></td>
     <td><p>Maximize</p></td>
  </tr>
  <tr>
     <td><code class="code">test:binary_classification_cross_entropy</code></td>
     <td><p>Cross Entropy</p></td>
     <td><p>Minimize</p></td>
  </tr>
  <tr>
     <td><code class="code">test:binary_f_beta</code></td>
     <td><p>Beta</p></td>
     <td><p>Maximize</p></td>
  </tr>
</table>

- **Tunable Hyperparameters**
<table id="w1649aac23c31c21b9b5">
  <tr>
     <th>Parameter Name</th>
     <th>Parameter Type</th>
     <th>Recommended Ranges</th>
     <th>Dependency</th>
  </tr>
  <tr>
     <td><code class="code">bias_init_scale</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==uniform</p></td>
  </tr>
  <tr>
     <td><code class="code">bias_init_sigma</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==normal</p></td>
  </tr>
  <tr>
     <td><code class="code">bias_init_value</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==constant</p></td>
  </tr>
  <tr>
     <td><code class="code">bias_lr</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>None</p></td>
  </tr>
  <tr>
     <td><code class="code">bias_wd</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>None</p></td>
  </tr>
  <tr>
     <td><code class="code">epoch</code></td>
     <td><p>IntegerParameterRange</p></td>
     <td><p>MinValue: 1, MaxValue: 1000</p></td>
     <td><p>None</p></td>
  </tr>
  <tr>
     <td><code class="code">factors_init_scale</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==uniform</p></td>
  </tr>
  <tr>
     <td><code class="code">factors_init_sigma</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==normal</p></td>
  </tr>
  <tr>
     <td><code class="code">factors_init_value</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==constant</p></td>
  </tr>
  <tr>
     <td><code class="code">factors_lr</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>None</p></td>
  </tr>
  <tr>
     <td><code class="code">factors_wd</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512]</p></td>
     <td><p>None</p></td>
  </tr>
  <tr>
     <td><code class="code">linear_init_scale</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==uniform</p></td>
  </tr>
  <tr>
     <td><code class="code">linear_init_sigma</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==normal</p></td>
  </tr>
  <tr>
     <td><code class="code">linear_init_value</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>bias_init_method==constant</p></td>
  </tr>
  <tr>
     <td><code class="code">linear_lr</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>None</p></td>
  </tr>
  <tr>
     <td><code class="code">linear_wd</code></td>
     <td><p>ContinuousParameterRange</p></td>
     <td><p>MinValue: 1e-8, MaxValue: 512</p></td>
     <td><p>None</p></td>
  </tr>
  <tr>
     <td><code class="code">mini_batch_size</code></td>
     <td><p>IntegerParameterRange</p></td>
     <td><p>MinValue: 100, MaxValue: 10000</p></td>
     <td><p>None</p></td>
  </tr>
</table>

### 6.4.7 Factorization Machine Response Formats
[JSON](https://docs.aws.amazon.com/sagemaker/latest/dg/fm-in-formats.html#fm-json), [JSONLINES](https://docs.aws.amazon.com/sagemaker/latest/dg/fm-in-formats.html#fm-jsonlines), [RECORDIO](https://docs.aws.amazon.com/sagemaker/latest/dg/fm-in-formats.html#fm-recordio)

## 6.5 Image Classification Algorithm
- A **supervised learning** algorithm that **takes an image as input** and **classifies** it into one of **multiple** output **categories**. 
- Uses a convolutional neural network (**ResNet**) that can be **trained from scratch**, or trained using **transfer learning** when a large number of training images are not available.
- input format: **MXNet [RecordIO](https://mxnet.incubator.apache.org/architecture/note_data_loading.html)** (differs from the *protobuf* data), .jpg, .png
- References:
    - [Deep residual learning for image recognition ](https://arxiv.org/abs/1512.03385)
    - [ImageNet image database](http://www.image-net.org/)
    - [Image classification in MXNet](https://github.com/apache/incubator-mxnet/tree/master/example/image-classification)
    
### 6.5.1 Input/Output Interface
- application/x-recordio
- application/x-image
- training with both, inference only with image format

### 6.5.2 EC2 Instance Recommendation
- prefer GPU
- can train with CPU

### 6.5.3 Image Classification Sample Notebooks
- [ End-to-End Multiclass Image Classification Example](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/imageclassification_caltech/Image-classification-fulltraining.ipynb)

### 6.5.4 How Image Classification Works
- takes an image as input and classifies it into one of the output categories
- full and transfer learnings

### 6.5.5 Hyperparameters
- [link](https://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html)
- num_classes, num_training_samples, augmentation_type (crop, crop_color, crop_color_transform)
- beta_1, beta_2, checkpoint_frequency, epochs, eps, gamma
- image_shape, kv_store, 
- learning_rate, lr_scheduler_factor, mlr_scheduler_step, mini_batch_size
- momentum, multi_label, num_layers, optimizer
- precision_dtype, resize, top_k
- use_pretrained_model, use_weighted_loss, weight_decay

### 6.5.6 Tuning an Image Classification Model
- **Metrics Computed by the Image Classification Algorithm**
<table id="w1649aac23c34c25b7b5">
  <tr>
     <th>Metric Name</th>
      <th>Description</th>
      <th>Optimization Direction</th>
  </tr>
  <tr>
     <td><code class="code">validation:accuracy</code></td>
     <td>        <p>The ratio of the number of correct predictions to the total           number of predictions made.        </p>     </td>
     <td>        <p>Maximize</p>     </td>
  </tr>
</table>
- **Tunable Hyperparameters**
<table id="w1649aac23c34c25b9b7">
  <tr>
     <th>Parameter Name</th>
     <th>Parameter Type</th>
     <th>Recommended Ranges</th>
  </tr>
  <tr>
     <td><code class="code">beta_1</code></td>
     <td>       <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 1e-6, MaxValue: 0.999</p>     </td>
  </tr>
  <tr>
     <td><code class="code">beta_2</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 1e-6, MaxValue: 0.999</p>     </td>
  </tr>
  <tr>
     <td><code class="code">eps</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 1e-8, MaxValue: 1.0</p>     </td>
  </tr>
  <tr>
    <td><code class="code">gamma</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
      <td>        <p>MinValue: 1e-8, MaxValue: 0.999</p>     </td>
  </tr>
  <tr>
    <td><code class="code">learning_rate</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 1e-6, MaxValue: 0.5</p>     </td>
  </tr>
  <tr>
     <td><code class="code">mini_batch_size</code></td>
     <td>        <p>IntegerParameterRanges</p>     </td>
     <td>        <p>MinValue: 8, MaxValue: 512</p>     </td>
  </tr>
  <tr>
     <td><code class="code">momentum</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0.0, MaxValue: 0.999</p>     </td>
  </tr>
  <tr>
     <td><code class="code">optimizer</code></td>
     <td>        <p>CategoricalParameterRanges</p>     </td>
     <td>        <p>['sgd', ‘adam’, ‘rmsprop’, 'nag']</p>     </td>
  </tr>
  <tr>
     <td><code class="code">weight_decay</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0.0, MaxValue: 0.999</p>     </td>
  </tr>
    
</table>     

## 6.6 K-Means Algorithm
- **find discrete groupings within data**, where members of a group are **as similar as possible** to one another and as **different** as possible from members of other groups.
- the version used by Amazon SageMaker is more accurate than [web-scale](https://www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf)
- [k-means algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/algo-kmeans-tech-notes.html) expects tabular data, 
- Euclidean distance represents the similarity

### 6.6.1 Input/Output Interface
[k-means Response Formats](https://docs.aws.amazon.com/sagemaker/latest/dg/km-in-formats.html)

### 6.6.2 EC2 Instance Recommendation
- recommeded CPU instances
- can train with GPUs

### 6.6.3 K-Means Sample Notebooks
- [Analyze US census data for population segmentation using Amazon SageMaker](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_applying_machine_learning/US-census_population_segmentation_PCA_Kmeans/sagemaker-countycensusclustering.ipynb)


### 6.6.4 How K-Means Clustering Works
- Step 1: Determine the Initial Cluster Centers
- Step 2: Iterate over the Training Dataset and Calculate Cluster Centers 
- Step 3: Reduce the Clusters from K to k

### 6.6.5 K-Means Hyperparameters
- feature_dim, mini_batch_size
- k, init_method, 
- epochs, eval_metrics, extra_center_factor
- half_life_time_size
- local_lloyd_max_iter, local_lloyd_init_method, local_lloyd_num_trials, local_lloyd_tol

### 6.6.6 Tuning a K-Means Model
- **Metrics Computed by the K-Means Algorithm**
<table id="w1649aac23c37c21b9b5">
  <tr>
     <th>Metric Name</th>
     <th>Description</th>
     <th>Optimization Direction</th>
  </tr>
  <tr>
     <td><code class="code">test:msd</code></td>
     <td>
        <p>Mean squared distances between each record in the test set and the closest center of the model.
        </p>
     </td>
     <td>        <p>Minimize</p>     </td>
  </tr>
  <tr>
     <td><code class="code">test:ssd</code></td>
     <td>
        <p>Sum of the squared distances between each record in the
           test set and the closest center of the model.
        </p>
     </td>
     <td>        <p>Minimize</p>     </td>
  </tr>
</table>

- **Tunable Hyperparameters**:
<table id="w1649aac23c37c21c11b5">
  <tr>
     <th>Parameter Name</th>
     <th>Parameter Type</th>
     <th>Recommended Ranges</th>
  </tr>
  <tr>
     <td><code class="code">epochs</code></td>
     <td>        <p>IntegerParameterRanges</p>     </td>
     <td>        <p>MinValue: 1, MaxValue:10</p>     </td>
  </tr>
  <tr>
     <td><code class="code">extra_center_factor</code></td>
     <td>        <p>IntegerParameterRanges</p>     </td>
     <td>        <p>MinValue: 4, MaxValue:10</p>     </td>
  </tr>
  <tr>
     <td><code class="code">init_method</code></td>
     <td>        <p>CategoricalParameterRanges</p>     </td>
     <td>        <p>['kmeans++', 'random']</p>     </td>
  </tr>
  <tr>
     <td><code class="code">mini_batch_size</code></td>
     <td>        <p>IntegerParameterRanges</p>     </td>
     <td>        <p>MinValue: 3000, MaxValue:15000</p>     </td>
  </tr>
</table>

### 6.6.7 k-means Response Formats
- JSON
- JSONLINES
- RECORDIO

## 6.7 K-Nearest Neighbors
- an index-based algorithm
- uses a non-parametric method for **classification** or **regression**
- For **classification** problems, the algorithm queries the **k points** that are **closest** to the sample point and **returns the most frequently used label** of their class **as the predicted** label. 
- For **regression problems**, the algorithm queries the **k closest points** to the sample point and **returns the average** of their feature values **as the predicted** value.
- **three steps**: sampling, dimension reduction, and index building

### 6.7.1 Input/Output Interface
- train in: text/csv, application/x-recordio-protobuf; 
- train out: text/csv
- inference in: application/json, application/x-recordio-protobuf, text/csv, 
- inference out: application/json, application/x-recordio-protobuf
- batch transform: application/jsonlines

### 6.7.2 kNN Sample Notebooks
[K-Nearest Neighbor Covertype](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/k_nearest_neighbors_covtype/k_nearest_neighbors_covtype.ipynb)

### 6.7.4 EC2 Instance Recommendation
- Training: CPU
- Inference: CPUs

### 6.7.3 How It Works
- Step 1: Sampling
- Step 2: Dimension Reduction: ***sign*** for specifies a random projection and ***fjlt*** for fast Johnson-Lindenstrauss transform
- Step 3: Building an Index
- Model Serialization: preparation for inference

### 6.7.5 K-Nearest Neighbors Hyperparameters
- [feature_dim](https://docs.aws.amazon.com/sagemaker/latest/dg/kNN_hyperparameters.html), k, predictor_type (classifier, regressor)
- sample_size, dimension_reduction_target, dimension_reduction_type (sign, fjlt) 
- faiss_index_ivf_nlists, faiss_index_pq_m, 
- index_metric, index_type, mini_batch_size

### 6.7.6 Tuning a K-Nearest Neighbors Model
- **Metrics Computed by the K-Nearest Neighbors Algorithm**
<table>
    <tr>
        <th>Metric Name</th>
        <th>Optimization Direction</th>
    </tr>
    <tr>
        <td>test:accuracy</td>
        <td>Maximize</td>
    </tr>
    <tr>
        <td>test:mse</td>
        <td>Minimize</td>
    </tr>
</table>
- **Tunable Hyperparameters**
<table>
    <tr>
        <th>Parameter Name</th>
        <th>Parameter Type</th>
        <th>Recommended Ranges</th>
    </tr>
    <tr>
        <td>k</td>
        <td>IntegerParameterRanges</td>
        <td>MinValue: 1, MaxValue: 1024</td>
    </tr>
    <tr>
        <td>sample_size</td>
        <td>IntegerParameterRanges</td>
        <td>MinValue: 256, MaxValue: 20000000</td>
    </tr>
</table>

### 6.7.7 Data Formats for K-Nearest Neighbors Training Input
- [CSV] and [RECORDIO](https://docs.aws.amazon.com/sagemaker/latest/dg/kNN-in-formats.html)

### 6.7.8 K-NN Request and Response Formats
- [INPUT](https://docs.aws.amazon.com/sagemaker/latest/dg/kNN-inference-formats.html): CSV, JSON, JSONLINES, RECORDIO
- [OUPUT](https://docs.aws.amazon.com/sagemaker/latest/dg/kNN-inference-formats.html#kNN-output-json): JSON, JSONLINES, VERBOSE JSON, RECORDIO-PROTOBUF, VERBOSE RECORDIO-PROTOBUF

## 6.8 Latent Dirichlet Allocation (LDA)
- attempts to describe a set of observations as a mixture of distinct categories
- used to discover a user-specified number of topics shared by documents within a text corpus

### 6.8.1 Input/Output Interface
Supports recordIO-wrapped-protobuf (dense and sparse) and CSV file formats
For inference, text/csv, application/json, and application/x-recordio-protobuf content types are supported.

### 6.8.2 EC2 Instance Recommendation
- currently only supports single-instance CPU training

### 6.8.3 LDA Sample Notebooks
[An Introduction to SageMaker LDA](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/lda_topic_modeling/LDA-Introduction.ipynb)

### 6.8.4 How LDA Works
- an unsupervised learning algorithm that attempts to describe a set of observations as a mixture of different categories
- LDA is a generative probability model
- LDA can be used for a variety of tasks, from **clustering customers** based on product purchases to **automatic harmonic** analysis in music. 
- is most commonly associated with **topic modeling** in text corpuses.
- LDA is a "bag-of-words" model
- For each word: 
    - Choose a **topic** z ∼ Multinomial(θ) 
    - Choose the corresponding **topic-word distribution** β_z
    - Draw a **word** w ∼ Multinomial(β_z)
- The goal is to find parameters α and β:
    - α — A prior estimate on topic probability
    - β — "topic-word distribution."
- use Gibbs sampling or Expectation Maximization (EM) techniques to estimate
- **Tensor decomposition algorithm**:
    - The goal is to calculate the spectral decomposition of a V x V x V tensor
    - uses a V x V moment matrix to find a whitening matrix of dimension V x k
    - This same whitening matrix can then be used to find a smaller k x k x k tensor
    - Alternating Least Squares is used to decompose the smaller k x k x k tensor. 
- [More info](https://docs.aws.amazon.com/sagemaker/latest/dg/lda-how-it-works.html)
        
### 6.8.5 LDA Hyperparameters
- num_topics, feature_dim, mini_batch_size
- alpha0, max_restarts, max_iterations, tol

### 6.8.6 Tuning an LDA Model
- **Metrics Computed by the LDA Algorithm**
    - Maximize per-word log-likelihood
- **Tunable Hyperparameters**
    - alpha0: 0.1-10
    - number of topic: 1-150

## 6.9 Linear Learner
- algorithms used for solving either **classification** or **regression** problems
- learns a linear function (**Linear regression**), or linear threshold function (**Logistic regression**) for classification, mapping a vector x to an approximation of the label y

### 6.9.1 Input/Output Interface
- recordIO wrapped protobuf and CSV
- application/x-recordio-protobuf, text/csv,
- For inference: application/json, application/x-recordio-protobuf, and text/csv

### 6.9.2 EC2 Instance Recommendation
- single- or multi-machine CPU and GPU instances

### 6.9.3 Linear Learner Sample Notebooks
- [ An Introduction to Linear Learner with MNIST](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/linear_learner_mnist/linear_learner_mnist.ipynb)

### 6.9.4 How It Works
- Step 1: Preprocessing: normalization and standalization
- Step 2: Training 
- Step 3: Validation and Setting the Threshold 

### 6.9.5 Linear Learner Hyperparameters
- feature_dim, num_classes, predictor_type, 
- accuracy_top_k, balance_multiclass_weights
- beta_1, beta_2, bias_lr_mult, bias_wd_mult
- binary_classifier_model_selection_criteria
- early_stopping_tolerance, early_stopping_patience, 
- epochs [...](https://docs.aws.amazon.com/sagemaker/latest/dg/ll_hyperparameters.html)


### 6.9.6 Tuning a Linear Learner Model
- **Metrics Computed by the Linear Learner Algorithm**, [source](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner-tuning.html#linear-learner-metrics)
- **Tuning Hyperparameters** [source](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner-tuning.html#linear-learner-tunable-hyperparameters)

### 6.9.7 Linear Learner Response Formats
- JSON, JSONLINES, RECORDIO

## 6.10 Neural Topic Model (NTM)
- an unsupervised learning algorithm that is used to **organize a corpus of documents into topics** that contain word groupings based on their statistical distribution
- [NTM](https://arxiv.org/pdf/1511.06038.pdf) and LDA are distinct algorithms for topic modeling

### 6.10.1 Input/Output Interface
- for train, validation, test, and [auxiliary](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/scientific_details_of_algorithms/ntm_topic_modeling/ntm_wikitext.ipynb): recordIO-wrapped-protobuf, csv
- for inference: text/csv, application/json, application/jsonlines, application/x-recordio-protobuf
- [WETC score](https://arxiv.org/pdf/1809.02687.pdf)


### 6.10.2 EC2 Instance Recommendation
- GPU and CPU instance types

### 6.10.3 NTM Sample Notebooks
- [Introduction to Basic Functionality of NTM](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/ntm_synthetic/ntm_synthetic.ipynb)

### 6.10.4 NTM Hyperparameters
[link](https://docs.aws.amazon.com/sagemaker/latest/dg/ntm_hyperparameters.html)

### 6.10.5 Tuning an NTM Model
- Minimize ***total_loss***
- Tunable Hyperparameters: encoder_layers_activation, learning_rate, mini_batch_size, optimizer, rescale_gradient, weight_decay

### 6.10.6 NTM Response Formats
- JSON, JSONLINES, RECORDIO

## 6.11 Object Detection Algorithm
- **detects and classifies** objects in images using a single deep neural network
- It uses the **Single Shot multibox Detector** (SSD) framework and supports two base networks: **VGG** and **ResNet**. The network can be **trained from scratch**, or trained with models that have been **pre-trained** on the **ImageNet** dataset.

### 6.11.1 Input/Output Interface
- Training with RecordIO Format 
- Training with Image Format

### 6.11.2 EC2 Instance Recommendation
- GPU instances 

### 6.11.3 Object Detection Sample Notebooks
- [Object Detection using the Image and JSON format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_image_json_format.ipynb)

### 6.11.4 How Object Detection Works
- takes an image as input and outputs the category that the object belongs to, along with a confidence score
- Object detection in Amazon SageMaker supports both **VGG-16** and **ResNet-50** as a base network for SSD.
- full/transfer learning

### 6.11.5 Object Detection Hyperparameters
- num_classes, num_training_samples, base_network
- image_shape, epochs, freeze_layer_pattern
- kv_store, label_width
- learning_rate, lr_scheduler_factor, lr_scheduler_step, mini_batch_size, momentum, nms_threshold, 
- optimizer, overlap_threshold, 
- use_pretrained_model, weight_decay

### 6.11.6 Tuning an Object Detection Model
- Maximize **mAP**
- Tunable Hyperparameters: learning_rate, mini_batch_size, momentum, optimizer, weight_decay

### 6.11.7 Object Detection Request and Response Formats
- in: image/jpeg and image/png
- out: JSON

## 6.12 Principal Component Analysis (PCA)
-  attempts to **reduce the dimensionality**, retaining as **much infor**mation as possible
- uses tabular data

### 6.12.1 Input/Output Interface
- training:  recordIO-wrapped-protobuf, csv
- inference: text/csv, application/json, application/x-recordio-protobuf

### 6.12.2 EC2 Instance Recommendation
- GPU and CPU computation

### 6.12.3 Principal Component Analysis Sample Notebooks
- [Principal Component Analysis Sample Notebooks](https://docs.aws.amazon.com/sagemaker/latest/dg/pca.html#PCA-sample-notebooks)
- [An Introduction to PCA with MNIST](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/pca_mnist/pca_mnist.ipynb)

### 6.12.4 How PCA [Works](https://docs.aws.amazon.com/sagemaker/latest/dg/how-pca-works.html)
- finding a new set of features called **components**, which are composites of the original features, but are **uncorrelated with one another**. 
- The **first** component accounts for the **largest possible variability** in the data, the **second** component the **second most variability**, and so on.
- Mode 1: **Regular**: for datasets with sparse data and a moderate number of observations and features
- Mode 2: **[Randomized](https://docs.aws.amazon.com/sagemaker/latest/dg/how-pca-works.html#mode-2)**: for datasets with both a large number of observations and features, [FJLT transform](https://www.cs.princeton.edu/~chazelle/pubs/FJLT-sicomp09.pdf)

### 6.12.5 PCA Hyperparameters
- feature_dim, mini_batch_size, num_components, algorithm_mode
- extra_components, subtract_mean

### 6.12.6 PCA Response Formats
- JSON, JSONLINES, RECORDIO

## 6.13 Random Cut Forest
- algorithm for **detecting anomalous** data points within a data set

### 6.13.1 Input/Output Interface
- input: <code>application/x-recordio-protobuf, text/csv</code> and <code>application/json</code>
- output: <code>appplication/x-recordio-protobuf</code> or <code>application/json</code>

### 6.13.2 Instance Recommendations
- training: <code>ml.m4, ml.c4</code>, and <code>ml.c5</code> instance families
- inference:  <code>ml.c5.xl</code> instance type

### 6.13.3 Randon Cut Forest Sample Notebooks
[An Introduction to SageMaker Random Cut Forests](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/random_cut_forest/random_cut_forest.ipynb)

### 6.13.4 How RCF Works
Create a **forest of trees** where each tree is obtained using a **partition of a sample** of the training data<br>
- Randomly Sampling Data<br>
    [Reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling): draw sample data from big training data
- Training and Inference 
    - the sample is partitioned into a **number** of equal-sized **partitions** equal to the **number of trees** in the forest. 
    - each partition is sent to an individual tree. 
    - The **tree recursively organizes its partition** into a **binary** tree by partitioning the data domain into bounding boxes
- Choosing Hyperparameters: <code>num_trees</code> and <code>num_samples_per_tree</code>

### 6.13.5 RCF Hyperparameters
- <code>feature_dim</code>
- <code>eval_metrics</code>
- <code>num_samples_per_tree</code>
- <code>num_trees</code>

### 6.13.6 Tuning a RCF Model
- Metrics Computed by the RCF Algorithm:<br>
    Maximize <code>test:f1</code> - F1 score on the test dataset, based on the difference between calculated labels and actual labels.
- Tunable Hyperparameters:
<table id="w1685aac23c60c21c11b5">
  <tr>
     <th>Parameter Name</th>
     <th>Parameter Type</th>
     <th>Recommended Ranges</th>
  </tr>
  <tr>
     <td><code class="code">num_samples_per_tree</code></td>
     <td>  <p>IntegerParameterRanges</p>     </td>
     <td>       <p>MinValue: 1, MaxValue:2048</p>     </td>
  </tr>
  <tr>
     <td><code class="code">num_trees</code></td>
     <td>       <p>IntegerParameterRanges</p>     </td>
     <td>       <p>MinValue: 50, MaxValue:1000</p>    </td>
  </tr>
</table>

### 6.13.7 RCF Response Formats
- JSON/JSONLINES/RECODIO

## 6.14 Sequence to Sequence (seq2seq)
- a supervised learning algorithm where the **input** is a **sequence of tokens** and the **output** generated is **another sequence** of tokens
- example: machine translation, text summarization, speech-to-text, 
- uses **RNNs** and **CNNs** with **attention** as **encoder-decoder** architectures

### 6.14.1 Input/Output Interface
- Training
    - data in **RecordIO-Protobuf** format
    - tokens are expected as **integers**
    - The algorithm expects three channels: *train*, *validation* and *vocab*
- Inference
    - <code>application/json</code> and <code>application/x-recordio-protobuf</code>

### 6.14.2 EC2 Instance Recommendation
- only supported on GPU instance types on single machine

### 6.14.3 Sample Notebooks
[ Machine Translation English-German Example Using SageMaker Seq2Seq](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/seq2seq_translation_en-de/SageMaker-Seq2Seq-Translation-English-German.ipynb)

### 6.14.4 How Sequence to Sequence Works
- neural network for seq2seq includes:
    - An **embedding layer** transform one-hot encoding to dense feature vector
    - An **encoder layer** compresses input to a fixed-length feature vector
    - A **decoder layer** takes encoded feature vector and produces the output sequence
- The whole model is trained jointly to **maximize** the **probability** of the **target** sequence **given** the **source** sequence. [Link](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf)
- **Attention mechanism**: In an [attention mechanism](https://arxiv.org/pdf/1409.0473.pdf), the decoder tries to **find the location** in the encoder sequence where the **most important information** could be located and uses that information and **previously decoded words** to **predict the next** token in the sequence.
- Whitepaper: [1](https://arxiv.org/abs/1508.04025), [2](https://arxiv.org/abs/1609.08144)

### 6.14.5 [Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/seq-2-seq-hyperparameters.html)
- batch_size, beam_size, bleu_sample_size, bucket_width
- bucketing_enabled, checkpoint_frequency_num_batches, checkpoint_threshold, clip_gradient
- cnn_activation_type, cnn_hidden_dropout, cnn_kernel_width_decoder, cnn_kernel_width_encoder, cnn_num_hidden
- decoder_type, embed_dropout_source, embed_dropout_target, encoder_type
- fixed_rate_lr_half_life, learning_rate, loss_type, lr_scheduler_type
- max_num_batches, max_num_epochs, max_seq_len_source, max_seq_len_target, min_num_epochs, momentum
- num_embed_source, num_embed_target, num_layers_decoder, num_layers_encoder
- optimized_metric, optimizer_type, plateau_reduce_lr_factor, plateau_reduce_lr_threshold
- rnn_attention_in_upper_layers, rnn_attention_num_hidden, rnn_attention_type, rnn_cell_type, rnn_decoder_state_init, rnn_first_residual_layer, rnn_num_hidden, rnn_residual_connections, rnn_decoder_hidden_dropout
- training_metric, weight_decay, weight_init_scale, weight_init_type, xavier_factor_type

### 6.14.6 Tuning Model
- **Metrics Computed by the Sequence to Sequence Algorithm**
<table>
    <tr>
 <th>Metric Name </th>
 <th>Description</th>
 <th>Optimization Direction</th>
</tr>
<tr>
 <td><code class="code">validation:accuracy</code></td>
 <td>    <p>Accuracy       computed on the validation dataset.    </p> </td>
 <td>    <p>Maximize</p> </td>
</tr>
<tr>
 <td><code class="code">validation:bleu</code></td>
 <td>
    <p><a href="https://en.wikipedia.org/wiki/BLEU" target="_blank">Bleu﻿</a>
       score computed on the validation dataset. Because BLEU
       computation is expensive, you can choose to compute BLEU on a
       random subsample of the validation dataset to speed up the
       overall training process.
       Use
       the <code class="code">bleu_sample_size</code> parameter to specify the
       subsample.
    </p>
 </td>
 <td>    <p>Maximize</p> </td>
</tr>
<tr>
 <td><code class="code">validation:perplexity</code></td>
 <td>
    <p><a href="https://en.wikipedia.org/wiki/Perplexity" target="_blank">Perplexity</a>,
       is
       a loss function computed on the validation
       dataset.
       Perplexity measures the cross-entropy between an empirical
       sample and the distribution predicted by a model and so provides
       a measure of how well a model predicts the sample values, Models
       that are good at predicting a sample have a low
       perplexity.
    </p>
 </td>
 <td>    <p>Minimize</p> </td>
</tr>
</table>
- **Tunable Hyperparameters**
<table>
  <tr>
    <th>Parameter       Name     </th>
     <th>Parameter        Type     </th>
     <th>Recommended Ranges</th>
  </tr>
  <tr>
     <td><code class="code">num_layers_encoder</code></td>
     <td>        <p>IntegerParameterRange</p>     </td>
     <td>        <p>[1-10]</p>     </td>
  </tr>
  <tr>
    <td><code class="code">num_layers_decoder</code></td>
     <td>        <p>IntegerParameterRange</p>     </td>
    <td>        <p>[1-10]</p>     </td>
  </tr>
 <tr>
     <td><code class="code">batch_size</code></td>
    <td>        <p>CategoricalParameterRange</p>     </td>
    <td>        <p>[16,32,64,128,256,512,1024,2048]</p>     </td>
 </tr>
  <tr>
     <td><code class="code">optimizer_type</code></td>
     <td>        <p>CategoricalParameterRange</p>     </td>
     <td>        <p>['adam', 'sgd', 'rmsprop']</p>     </td>
  </tr>
  <tr>
     <td><code class="code">weight_init_type</code></td>
     <td>        <p>CategoricalParameterRange</p>     </td>
     <td>        <p>['xavier',          'uniform']        </p>     </td>
  </tr>
  <tr>
     <td><code class="code">weight_init_scale</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>
        <p>For the xavier type: MinValue: 2.0, MaxValue: 3.0 For the
           uniform type: MinValue: -1.0, MaxValue: 1.0
        </p>
     </td>
  </tr>
  <tr>
     <td><code class="code">learning_rate</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 0.00005, MaxValue: 0.2</p>     </td>
  </tr>
  <tr>
     <td><code class="code">weight_decay</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 0.0, MaxValue: 0.1</p>     </td>
  </tr>
  <tr>
     <td><code class="code">momentum</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 0.5, MaxValue: 0.9</p>     </td>
  </tr>
  <tr>
     <td><code class="code">clip_gradient</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 1.0, MaxValue: 5.0</p>     </td>
  </tr>
  <tr>
     <td><code class="code">rnn_num_hidden</code></td>
     <td>        <p>CategoricalParameterRange</p>     </td>
     <td>
        <p>Applicable
           only to recurrent neural networks (RNNs).
           [128,256,512,1024,2048] 
        </p>
     </td>
  </tr>
  <tr>
     <td><code class="code">cnn_num_hidden</code></td>
     <td>        <p>CategoricalParameterRange</p>     </td>
     <td>
        <p>Applicable
           only to convolutional neural networks (CNNs).
           [128,256,512,1024,2048] 
        </p>
     </td>
  </tr>
  <tr>
     <td><code class="code">num_embed_source</code></td>
     <td>        <p>IntegerParameterRange</p>     </td>
     <td>        <p>[256-512]</p>     </td>
  </tr>
  <tr>
     <td><code class="code">num_embed_target</code></td>
     <td>        <p>IntegerParameterRange</p>     </td>
     <td>        <p>[256-512]</p>     </td>
  </tr>
  <tr>
     <td><code class="code">embed_dropout_source</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 0.0, MaxValue: 0.5</p>     </td>
  </tr>
  <tr>
    <td><code class="code">embed_dropout_target</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 0.0, MaxValue: 0.5</p>     </td>
  </tr>
  <tr>
     <td><code class="code">rnn_decoder_hidden_dropout</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 0.0, MaxValue: 0.5</p>    </td>
  </tr>
  <tr>
     <td><code class="code">cnn_hidden_dropout</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 0.0, MaxValue: 0.5</p>     </td>
  </tr>
  <tr>
     <td><code class="code">lr_scheduler_type</code></td>
     <td>        <p>CategoricalParameterRange</p>     </td>
     <td><p>['plateau_reduce', 'fixed_rate_inv_t', 'fixed_rate_inv_sqrt_t']</p> </td>
  </tr>
  <tr>
     <td><code class="code">plateau_reduce_lr_factor</code></td>
     <td>        <p>ContinuousParameterRange</p>     </td>
     <td>        <p>MinValue: 0.1, MaxValue: 0.5</p>     </td>
  </tr>
  <tr>
     <td><code class="code">plateau_reduce_lr_threshold</code></td>
     <td>        <p>IntegerParameterRange</p>     </td>
     <td>        <p>[1-5]</p>     </td>
  </tr>
  <tr>
     <td><code class="code">fixed_rate_lr_half_life</code></td>
     <td>        <p>IntegerParameterRange</p>     </td>
     <td>        <p>[10-30]</p> </td></tr>
</table>

## 6.15 XGBoost Algorithm
- [XGBoost](https://github.com/dmlc/xgboost)(eXtreme Gradient Boosting) is a supervised learning algorithm that **attempts to accurately predict** a target variable by **combining** the estimates of a set of **simpler, weaker models**.

### 6.15.1 Input/Output Interface
- CSV and libsvm for training/inference

### 6.15.2 EC2 Instance Recommendation
- only CPUs (M4 & C4)

### 6.15.3 XGBoost Sample Notebooks
[ Regression with Amazon SageMaker XGBoost algorithm](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb)

### 6.15.4 How XGBoost Works
- When using gradient boosting for regression, the **weak learners are regression trees**, and **each** regression tree **maps an input** data point **to one** of its **leafs** that contains a continuous score. 
- **XGBoost minimizes a regularized** (L1 and L2) objective function that combines a **convex loss function** and a **penalty term** for model complexity. 
- The **training** proceeds **iteratively**, **adding new trees** that **predict the residuals** or errors **of prior** trees that are then **combined with previous** trees to **make the final** prediction. 
- It's called **gradient boosting** because it uses a **gradient descent** algorithm **to minimize the loss** when **adding new models**. 
- Link: [1](https://arxiv.org/pdf/1603.02754.pdf), [2](https://xgboost.readthedocs.io/en/latest/tutorials/model.html)

### 6.15.5 [XGBoost Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost_hyperparameters.html)
- num_class, num_round
- alpha, base_score
- booster, colsample_bylevel, colsample_bytree, csv_weights, early_stopping_rounds
- eta, eval_metric, gamma, grow_policy, lambda, lambda_bias
- max_bin, max_delta_step, max_depth, max_leaves, min_child_weight
- normalize_type, nthread, objective
- one_drop, process_type, rate_drop
- refresh_leaf, sample_type, scale_pos_weight, seed
- silent, sketch_eps, skip_drop, subsample, tree_method, tweedie_variance_power

### 6.15.6 Tuning a XGBoost Model
- **Metrics Computed by the XGBoost Algorithm**:
<table><tr>
     <th>Metric Name</th>
     <th>Description</th>
     <th>Optimization       Direction     </th>
  </tr>
  <tr>
     <td><code class="code">validation:auc</code></td>
     <td>       <p>Area under the curve.</p>     </td>
     <td>        <p>Maximize</p>     </td>
  </tr>
  <tr>
     <td><code class="code">validation:error</code></td>
     <td>
        <p>Binary classification error rate, calculated as #(wrong
           cases)/#(all cases).
        </p>
     </td>
     <td>       <p>Minimize</p>     </td>
  </tr>
  <tr>
     <td><code class="code">validation:logloss</code></td>
     <td>        <p>Negative log-likelihood.</p>     </td>
     <td>        <p>Minimize</p>     </td>
  </tr>
  <tr>
     <td><code class="code">validation:mae</code></td>
     <td>        <p>Mean absolute           error.        </p>     </td>
     <td>
        <p>You must choose one of them as an objective to optimize when
           tuning the algorithm with hyperparameter values.&gt;Minimize
        </p>
     </td>
  </tr>
  <tr>
     <td><code class="code">validation:map</code></td>
     <td>        <p>Mean average           precision.        </p>     </td>
     <td>        <p>Maximize</p>     </td>
  </tr>
  <tr>
     <td><code class="code">validation:merror</code></td>     <td>
        <p>Multiclass classification error rate, calculated as #(wrong
           cases)/#(all cases).        </p>
     </td>
     <td>       <p>Minimize</p>     </td>
  </tr>
  <tr>
     <td><code class="code">validation:mlogloss</code></td>
     <td>        <p>Negative log-likelihood for multiclass classification.</p>     </td>
     <td>        <p>Minimize</p>     </td>
  </tr>
  <tr>
     <td><code class="code">validation:ndcg</code></td>
     <td>        <p>Normalized Discounted Cumulative           Gain.        </p>     </td>
     <td>        <p>Maximize</p>     </td>
  </tr>
  <tr>
     <td><code class="code">validation:rmse</code></td>
     <td>        <p>Root mean square           error.        </p>     </td>
     <td>        <p>Minimize</p>     </td>
  </tr></table>
  
- **Tunable Hyperparameters**:
<table>
<tr>
     <th>Parameter Name</th>
     <th>Parameter        Type     </th>
     <th>Recommended Ranges</th>
  </tr>
  <tr>
     <td><code class="code">alpha</code></td>
     <td>         <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0, MaxValue: 1000</p>     </td>
  </tr>
  <tr>
     <td><code class="code">colsample_bylevel</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0.1, MaxValue: 1</p>     </td>
  </tr>
  <tr>
     <td><code class="code">colsample_bytree</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0.5, MaxValue: 1</p>     </td>
  </tr>
  <tr>
     <td><code class="code">eta</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0.1, MaxValue: 0.5</p>     </td>
  </tr>
  <tr>
     <td><code class="code">gamma</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0, MaxValue: 5</p>     </td>
  </tr>
  <tr>
     <td><code class="code">lambda</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0, MaxValue: 1000</p>     </td>
  </tr>
  <tr>
     <td><code class="code">max_delta_step</code></td>
     <td>        <p>IntegerParameterRanges</p>     </td>
     <td>        <p>[0, 10]</p>     </td>
  </tr>
  <tr>
     <td><code class="code">max_depth</code></td>
     <td>        <p>IntegerParameterRanges</p>     </td>
     <td>        <p>[0, 10]</p>     </td>
  </tr>
  <tr>
     <td><code class="code">min_child_weight</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0, MaxValue: 120</p>     </td>
  </tr>
  <tr>
     <td><code class="code">num_round</code></td>
     <td>        <p>IntegerParameterRanges</p>     </td>
     <td>        <p>[1, 4000]</p>     </td>
  </tr>
  <tr>
     <td><code class="code">subsample</code></td>
     <td>        <p>ContinuousParameterRanges</p>     </td>
     <td>        <p>MinValue: 0.5, MaxValue: 1</p>     </td>
  </tr>
</table>

# VII. Using Custom Algorithms
Custom algorithms can be packaged to use with Amazon SageMaker, regardless of programming language or framework. Amazon SageMaker allows:
- training by built-in algorithms and inference with custom code
- training and inference by built-in algorithms
- training and inference by custom algorithms
- training by deep learning containers and inference with custom code.

Amazon SageMaker algorithms are packaged as Docker images.<br>
Separate Docker images can be provided for the training algorithm and inference code

## 7.1. Using Custom Training Algorithms
This section explains how Amazon SageMaker interacts with a Docker container that runs custom training algorithm.

### 7.1.1 How Amazon SageMaker Runs Custom Training Image
- To configure a Docker container to run as an executable, use an <code>ENTRYPOINT</code> instruction in a Dockerfile
- [Note](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html)

### 7.1.2 How Amazon SageMaker Provides Training Information
- [CreateTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html) to SageMaker requests to specify: *Amazon Elastic Container Registry path of the Docker image that contain the training algorithm*, *S3 location* for training data and parameters 
- **Hyperparameters** are stored in <code> /opt/ml/input/config/hyperparameters.json</code>
- **Environment Variables**: <code>TRAINING_JOB_NAME</code> ...
- **Input Data Configuration**: in <code> /opt/ml/input/config/inputdataconfig.json </code>
- **Training Data**: in **FILE** mode or **PIPE** mode
- **Distributed Training Configuration**: <code> /opt/ml/input/config/resourceconfig.json</code>

### 7.1.3 Signalling Algorithm Success and Failure
- A training algorithm indicates whether it succeeded or failed using the exit code of its process.

### 7.1.4 How Amazon SageMaker Processes Training Output
- Amazon SageMaker returns the first 1024 characters from <code>/opt/ml/output/failure</code> as FailureReason
- all final model artifacts are written to <code>/opt/ml/model</code>

## 7.2. Using Custom Inference Code
### 7.2.1 Hosting Services
- **How Amazon SageMaker Runs Custom Inference Image**<br>
    use an <code>ENTRYPOINT</code> instruction in a Dockerfile
- **How Amazon SageMaker Loads Custom Model Artifacts**<br>
    Amazon SageMaker copies model artifects from <code>ModelDataUrl</code> to <code>/opt/ml/model</code>
- **How Containers Serve Requests**:<br>
    Containers need to implement a **web server** that responds to <code>/invocations</code> and <code>/ping</code> on **port 8080**.
- **How Custom Container Should Respond to Inference Requests**:<br>
    To **obtain inferences**, the **client application** sends a **POST request** to the Amazon SageMaker **endpoint**. [ InvokeEndpoint API](https://docs.aws.amazon.com/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html)
- **How Custom Container Should Respond to Health Check (Ping) Requests**:<br>
    The simplest requirement on the container is to respond with an **HTTP 200 status** code and an **empty body**
    
### 7.2.2. Batch Transform
- **How Amazon SageMaker Runs Custom Inference Image in Batch Transform**:<br>
    use an <code>ENTRYPOINT</code> instruction in a Dockerfile
- **How Amazon SageMaker Loads Custom Model Artifacts**:<br>
    Amazon SageMaker copies model artifects from <code>ModelDataUrl</code> to <code>/opt/ml/model</code>
- **How Containers Serve Requests**:<br>
    Containers need to implement a **web server** that responds to <code>/invocations</code> and <code>/ping</code> on **port 8080**.
- **How Custom Container Should Respond to Health Check (Ping) Requests**:<br>
    The simplest requirement on the container is to respond with an **HTTP 200 status** code and an **empty body**
 
## 7.3 [Example](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb)

# VIII. Automatically Scaling Amazon SageMaker Models
- Automatic scaling dynamically adjusts the number of instances provisioned for a production variant in response to changes in our workload. 
- When the **workload increases**, automatic scaling **brings more instances** online. When the **workload decreases**, automatic scaling **removes** unnecessary **instances** so that we don't pay for provisioned variant instances that you aren't using.
- [Topics](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html):
    - [Configure Automatic Scaling for a Variant](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-add-policy.html)
    - [Editing a Scaling Policy](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-edit.html)
    - [Deleting a Scaling Policy](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-delete.html)
    - [Load Testing for Variant Automatic Scaling](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-scaling-loadtest.html)
    - [Additional Considerations for Configuring Automatic Scaling](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling-considerations.html)

# IX. Using TensorFlow
- [Link](https://docs.aws.amazon.com/sagemaker/latest/dg/tf.html)
- We can use Amazon SageMaker to train a model using custom TensorFlow code.
- [PIPE mode](https://github.com/aws/sagemaker-tensorflow-extensions/blob/master/README.rst)
- [Version support](https://github.com/aws/sagemaker-python-sdk#tensorflow-sagemaker-estimators)
- [GitHub repository](https://github.com/aws/sagemaker-tensorflow-containers)

## 9.1. Writing Custom TensorFlow Model Training and Inference Code
TensorFlow training script must be a Python 2.7 source file, contaning:
- <code>model_fn</code>: Defines the model that will be trained.
- <code>train_input_fn</code></code>: Preprocess and load training data.
- <code>eval_input_fn</code>: Preprocess and load evaluation data.
- <code>serving_input_fn</code>: Defines the features to be passed to the model during prediction.

## 9.2. Examples
- [TensorFlow Example 1: Using the tf.estimator ](https://docs.aws.amazon.com/sagemaker/latest/dg/tf-example1.html)
- tf.estimator (iris_dnn_classifier)
- tf.layers (abalone_using_layers)
- tf.contrib.keras (abalone_using_keras)
- distributed TensorFlow (distributed_mnist)
- ResNet CIFAR-10 with Tensorboard (resnet_cifar10_with_tensorboard)

# X. Using Apache MXNet
- [Link](https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet.html)
- We can use Amazon SageMaker to train a model using your own custom Apache MXNet training code. 
- [supported versions](https://github.com/aws/sagemaker-python-sdk#mxnet-sagemaker-estimators)
- [GitHub repository](https://github.com/aws/sagemaker-mxnet-containers)

## 10.1. Writing Custom Apache MXNet Model Training and Inference Code 
**Must implement** [following intefaces](https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet-training-inference-code-template.html)
- <code>def train(hyperparameters, input_data_config, channel_input_dirs, output_data_dir, model_dir, num_gpus, num_cpus, hosts, current_host, ** kwargs)</code>
- <code>def save(model, model_dir)</code>

**Optinal**:<br>
- <code>def model_fn(model_dir)</code>
- <code>def transform_fn(model, input_data, content_type, accept)</code>
- <code>def input_fn(input_data, content_type)</code>
- <code>def predict_fn(block, array)</code>
- <code>def output_fn(ndarray, accept)</code>
- <code>def input_fn(model, input_data, content_type)</code>
- <code>def predict_fn(module, data)</code>
- <code>def output_fn(data, accept)</code>

## 10.2 Examples:
- The Apache MXNet Module API
- The Apache MXNet Gluon API
- [Apache MXNet Example 1: Using the Module API](https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet-example1.html)

# XI. Using Chainer
We can use Amazon SageMaker to train and deploy a model using custom Chainer code
- [GitHub repository](https://github.com/aws/sagemaker-chainer-container)
- [versions supported](https://github.com/aws/sagemaker-python-sdk#chainer-sagemaker-estimators)

# XII. Using PyTorch
We can use Amazon SageMaker to train and deploy a model using custom PyTorch code.
- [GitHub repository](https://github.com/aws/sagemaker-pytorch-container)
- [versions supported](https://github.com/aws/sagemaker-python-sdk#pytorch-sagemaker-estimators)
- [writing PyTorch training scripts](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/pytorch/README.rst)

# XIII. Using Apache Spark
- [supported versions](https://github.com/aws/sagemaker-spark#getting-sagemaker-spark)
- Amazon SageMaker Spark library <code>com.amazonaws.services.sagemaker.sparksdk</code> contains:
    - <code>SageMakerEstimator</code>
    - <code>KMeansSageMakerEstimator, PCASageMakerEstimator, XGBoostSageMakerEstimator</code>
    - <code>SageMakerModel</code>
 - **Downloading the Amazon SageMaker Spark Library**:
     - [GitHub link](https://github.com/aws/sagemaker-spark)
     - <code> pip install sagemaker_pyspark </code>
     - create new notebook with <code>Sparkmagic (PySpark)</code> or the <code>Sparkmagic (PySpark3)</code>, [more](http://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/)
 - **Integrating Custom Apache Spark Application with Amazon SageMaker**:
     - Continue data preprocessing using the Apache Spark library
     - Use the estimator in the Amazon SageMaker Spark library to train the model
     - Get inferences from the model hosted in Amazon SageMaker

## 13.1. Example 1
- [Link](https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark-example1.html)
- [Using Custom Algorithms for Model Training and Hosting on Amazon SageMaker with Apache Spark](https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark-example1-cust-algo.html)
- [Using the SageMakerEstimator in a Spark Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark-example1-extend-pipeline.html)

## 13.2. Additional Examples
- [GitHub repository](https://github.com/aws/sagemaker-spark/tree/master/examples)

# XIV. Amazon SageMaker Libraries
- [Amazon SageMaker Apache Spark Library](https://github.com/aws/sagemaker-spark)
- [Amazon SageMaker high-level Python library](https://github.com/aws/sagemaker-python-sdk)

# XV. Authentication and Access Control
[Access](https://docs.aws.amazon.com/sagemaker/latest/dg/authentication-and-access-control.html) to Amazon SageMaker requires credentials. Those credentials must have permissions to access AWS resources, such as an Amazon SageMaker notebook instance or an Amazon EC2 instance.

- **Authentication**:
    - AWS account root user: only user to [create IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#create-iam-users)
    - IAM user: an identity within AWS account that has specific custom permissions
    - IAM role: an IAM identity that we can create in our account that has specific permissions: Federated user access, AWS service access, Applications running on Amazon EC2

- **Access Control**:
- **Topic**
    - [Overview of Managing Access Permissions to Your Amazon SageMaker Resources](https://docs.aws.amazon.com/sagemaker/latest/dg/access-control-overview.html)
    - [Using Identity-based Policies (IAM Policies) for Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/using-identity-based-policies.html)
    - [Amazon SageMaker API Permissions: Actions, Permissions, and Resources Reference](https://docs.aws.amazon.com/sagemaker/latest/dg/api-permissions-reference.html)

# XVI. Monitoring
Monitoring is an important part of **maintaining the reliability, availability, and performance** of Amazon SageMaker and other AWS solutions.

- [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/) monitors our AWS resources and the applications that we run on AWS in real time.
- [Amazon CloudWatch Logs](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/) enables us to monitor, store, and access our log files from EC2 instances, AWS CloudTrail, and other sources.
- [AWS CloudTrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/) captures API calls and related events made by or on behalf of our AWS account and delivers the log files to an Amazon S3 bucket that we specify.

# XVII. Other TOPICS
- [Best Practices](https://docs.aws.amazon.com/sagemaker/latest/dg/best-pratices.html)
- [Security](https://docs.aws.amazon.com/sagemaker/latest/dg/security.html)
- [Limits and Supported Regions](https://docs.aws.amazon.com/sagemaker/latest/dg/appendix.html)
- [API Reference](https://docs.aws.amazon.com/sagemaker/latest/dg/API_Reference.html)