# I. What Is Amazon SageMaker?
- Amazon SageMaker is a **fully managed machine learning service**.
- helps to **quickly and easily build and train machine learning models**, and then **directly deploy** them into a production-ready hosted environment.
- provides an **integrated Jupyter** authoring **notebook** instance for **easy access** to your **data** sources
- provides **common machine learning algorithms** that are **optimized** to run efficiently against **extremely large data** in a **distributed environment**.
- Deploy **with a single click console**

# II. How It Works
This section provides an **overview of machine learning** and explains how Amazon SageMaker works.
## 2.1. Machine Learning with Amazon SageMaker
Typical workflow for creating a machine learning model:
![](https://docs.aws.amazon.com/sagemaker/latest/dg/images/ml-concepts-10.png)
1. **Generate example data**<br>
    Data type depends on business problem<br>
    a. **Fetch the data** — 
    pull the dataset or datasets into a single repository<br>
    b. **Clean the data** — 
    To improve model training<br>
    c. **Prepare or transform the data** — 
    To improve performance
2. **Train a model**:<br>
    a. **Training** the model — To train a model, you need<br>
    - an algorithm (can be provided by Amazon Sagemaker or self-implementation)
    - compute resources<br>
    b. **Evaluating** the model — 
    to determine whether the accuracy of the inferences is acceptable.<br>
    -.use either the AWS SDK for Python (Boto) or the high-level Python library in SageMaker<br>
    -.use a Jupyter notebook in SageMaker notebook instance to train and evaluate model.
3. **Deploy the model** — independently with Amazon SageMaker hosting services, decoupling from application code. 

Machine learning is a **continuous cycle**: 
- => **deploy** model
- => **monitor** inferences
- => **collect** "ground truth"
- => **evalutate** model
- => **retrain** model
- => **deploy new** model
- => ...

## 2.2. Explore and Preprocess Data

- Use a Jupyter notebook on an Amazon SageMaker 
- use a model to transform data by using Amazon SageMaker batch transform

## 2.3. Training a Model with Amazon SageMaker

![](https://docs.aws.amazon.com/sagemaker/latest/dg/images/sagemaker-architecture.png)

Create a training job by SageMaker Conslole or API, including:
- URL to data on S3
- Compute resources, managed by Amazon SageMaker
- URL for output data on S3
- Amazon Elastic Container Registry path where the training code is stored

**Training** options:
- Use an algorithm provided by Amazon SageMaker
- Use Apache Spark with Amazon SageMaker, similarly to use Spark MLLib
- Submit custom code to train with deep learning frameworks: [TensorFlow](https://docs.aws.amazon.com/sagemaker/latest/dg/tf.html), [Apache MXNet](https://docs.aws.amazon.com/sagemaker/latest/dg/mxnet.html)
- Use custom algorithms in  a Docker image

**WorkFlow**:
- User creates the training job
- => Amazon SageMaker launches the ML compute instances
- => SageMaker uses the training code and the training dataset to train the model
- =>SageMaker saves the resulting model artifacts and other output in the S3 bucket 

**Important**
- Prevent out-of-memory error!

## 2.4. Model Deployment in Amazon SageMaker
- use Amazon **[SageMaker hosting services](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html)** to set up a persistent endpoint to get **one prediction** at a time, .
- use Amazon **[SageMaker batch transform](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-batch.html)** to get **predictions** for an **entire dataset**.

### 2.4.1. Deploying a Model on Amazon SageMaker Hosting Services
A three-step process:
1. Create a model in Amazon SageMaker for finding model components
2. Create an endpoint configuration for an HTTPS endpoint, configure the endpoint to elastically scale the deployed ML compute instances in production
3. Create an HTTPS endpoint: endpoint configuration to Amazon SageMaker

**Considerations**
- client application sends requests to the Amazon SageMaker HTTPS endpoint to obtain inferences from a deployed model, requests can be sent to this endpoint from Jupyter notebook during testing.
- model trained with Amazon SageMaker can be deployed to specific deployment target.
- multiple variants of a model can be deployed to the same Amazon SageMaker HTTPS endpoint.
- a *ProductionVariant* can be configured to use Application Auto Scaling
- an endpoint can be modified without taking models that are already deployed into production out of service.
- **Changing or deleting** model artifacts or changing inference code **after deploying** a model produces **unpredictable results**.

### 2.4.2. Getting Inferences by Using Amazon SageMaker Batch Transform 
![](https://docs.aws.amazon.com/sagemaker/latest/dg/images/batch-transform.png)

Batch transform manages all compute resources necessary to get inferences. This includes launching instances and deleting them after the transform job completes. 

To perform a batch transform, create a transform job including:
- path to data on S3 bucket
- compute resources
- path to S3 for output data
- name of model in the transform job

Batch transform is ideal for situations where:
- You want to get inferences for an entire dataset and store them online.
- You don't need a persistent endpoint that applications can call to get inferences.
- You don't need the sub-second latency that Amazon SageMaker hosted endpoints provide.
- You want to preprocess your data before using the data to train a new model or generate inferences.

**Considerations**:
- transform job can be created by SageMaker Console or API
- Amazon SageMaker follows the transform job to read input, launches ML and save output
- Amazon SageMaker uses Multipart Upload API to upload output data results from a transform job to S3. 
- For testing model variants, create separate transform jobs for each variant using a validation data set.
- For large datasets or data of indeterminate size, create an infinite stream.

## 2.5. Validating Machine Learning Models 

After training, a model needs to be evaluated to determine whether its performance and accuracy allow to achieve business goals.

- **Offline testing**: Deploy trained model to an alpha endpoint, and use historical data to send inference requests to it.
- **Online testing** with live data: choose to send a portion of the traffic to a model variant for evaluation.

**Options** for **offline** evaluation:
- **using a "holdout set"**: use 20-30% of the training data for validation
- **k-fold validation**: split data into k+1 folds, user 1 folds for validation & k folds for training; run k+1 times => k+1 models => aggregate to obtain final model.

## 2.6 The Amazon SageMaker Programming Model 

- use SageMaker APIs to create and manage notebook instances and train and deploy models
- alternatives:
    - Use the Amazon SageMaker console
    - Modify the example Jupyter notebooks
    - Write model training and inference code from scratch
        - high-level Python library
        - AWS SDK

# III. Getting Started

## [3.1: Setting Up](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-set-up.html)
### [3.1.1: Create an AWS Account and an Administrator User](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-account.html)
- [Create an AWS Account](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-account.html#gs-account-create)
- [Create an IAM Administrator User and Sign In](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-account.html#gs-account-user)
### [3.1.2: Create an S3 Bucket](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-config-permissions.html)

## [3.2: Create an Amazon SageMaker Notebook Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html)

## [3.3: Train a Model with a Built-in Algorithm and Deploy It](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1.html)

### [3.3.1: Create a Jupyter Notebook and Initialize Variables](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-prepare.html)
### [3.3.2: Download, Explore, and Transform the Training Data](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-preprocess-data.html)
- 1: Download the MNIST Dataset
- 2: Explore the Training Dataset
- 3: Transform the Training Dataset and Upload It to S3

### [3.3.3: Train a Model](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-train-model.html)
- 1: Choose the Training Algorithm
- 2: Create a Training Job

### [3.3.4: Deploy the Model to Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-model-deployment.html)
- 1: Deploy the Model to Amazon **SageMaker Hosting** Services
- 2: Deploy the Model to Amazon **SageMaker Batch Transform**

### [3.3.5: Validate the Model](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-test-model.html)

## [3.4: Clean up](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html)
## [3.5: Additional Considerations](https://docs.aws.amazon.com/sagemaker/latest/dg/getting-started-client-app.html)

# IV. Automatic Model Tuning
- Automatic model tuning = **hyperparameter tuning**, finds the **best** version of a **model** by running **many training jobs** on dataset using the **algorithm** and **ranges of hyperparameters** that were specified.
- use Amazon SageMaker automatic model tuning with **built-in** algorithms, **custom algorithms**, and Amazon SageMaker **pre-built containers** for machine learning frameworks
- Before start hyperparameter tuning, it requires a well-defined machine learning problem, including:
    - A **dataset**
    - An understanding of the **type of algorithm** needed to train
    - A clear understanding of **how to measure success**
    
## 4.1 How Hyperparameter Tuning Works
- **Workflow**:
    - hyperparameter tuning makes guesses about which hyperparameter combinations are likely to get the best results, 
    - runs training jobs to test these guesses. 
    - after testing uses regression to choose the next set of hyperparameter values
- Hyperparameter tuning uses an Amazon SageMaker implementation of **Bayesian optimization**
- Use explore/exploit trade-off strategy
- **Note**:
    - might not improve model
    - exploring all of the possible combinations is impractical with complex model
    - need to choose the right ranges to explore
- **Refernces**:
    - A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning, [link](https://arxiv.org/abs/1012.2599)
    - Practical Bayesian Optimization of Machine Learning Algorithms, [link](https://arxiv.org/abs/1206.2944)
    - Taking the Human Out of the Loop: A Review of Bayesian Optimization, [link](http://ieeexplore.ieee.org/document/7352306/?reload=true)

## 4.2 Defining Objective Metrics
- Not required for SageMaker built-in algorithms, just select and use
- Required with regular expressions (**regex**) for custom algorithms. Algorthms needs to emit at least one metric by writing evaluation data to *stderr* or *stdout*
- The hyperparameter tuning job returns the training job that returned the best value for the objective metric as the best training job.

## 4.3 Defining Hyperparameter Ranges
Choosing hyperparameters and ranges significantly affects the performance of your tuning job.

Example:

## 4.4 Example: Hyperparameter Tuning Job
[Link](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex.html)

## 4.5 Design Considerations
- Choosing the Number of Hyperparameters
    - It's possible to use up to 20 variables simultaneously in a hyperparameter tuning job
- Choosing Hyperparameter Ranges
    - better results can obtain by searching only in a small range where all possible values in the range are reasonable.
- Use Logarithmic Scales for Hyperparameters
    - could improve hyperparameter optimization
- Choosing the Best Degree of Parallelism
    - running in parallel gets more work done quickly
    - running one training job at a time achieves the best results with the least amount of compute time. 
- Running Training Jobs on Multiple Instances
    - hyperparameter tuning uses the last-reported objective metric from all instances

# V. Using Notebook Instances
[Link](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html)
- Creating a Notebook Instance
- Accessing Notebook Instances
- Using Example Notebooks
- Set the Notebook Kernel
- Installing External Libraries and Kernels in Notebook Instances

# VI. Using Built-in Algorithms
Because a model is created to **address a business question**, the **first** step is to **understand the problem** needed to solve. Specifically, the format of the answer influences the algorithm.

**Examples:**
- Answers that fit into discrete categories >>> use *Linear Learner* and *XGBoost*
- Answers that are quantitative >>> also use *Linear Learner* and *XGBoost*
- Answers in the form of discrete recommendations >>> use *Factorization Machines*
- Classify customer >>> K-Means Algorithm
- understand customer attributes >>> PCA

## 6.1 Common Information
### 6.1.1 Common Parameters 

**Computer resources**
![](https://imgur.com/f3gNKOm.png)


**AWS region**<br>
[Link](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html)

### 6.1.2 Common Data Formats 
#### Training
- Training Data Formats 
    - CSV
    - protobuf [recordIO](https://mxnet.incubator.apache.org/architecture/note_data_loading.html#data-format) format
- Trained Model Deserialization
    - Amazon SageMaker models are stored as **model.tar.gz** in the S3 bucket specified in *OutputDataConfig S3OutputPath* parameter of the *create_training_job* call.

#### Inference
- Inference Request Serialization
    - text/csv, 
    - application/json, 
    - application/x-recordio-protobuf.
    - text/x-libsvm
- Inference Response Deserialization 
    - Amazon SageMaker algorithms return JSON in several layouts.
- Common Request Formats for All Algorithms 
    - JSON
    - JSONLINES
    - CSV
    - RECORDIO
- Using Batch Transform with Build-in Algorithms 
    - JSONLINES

### 6.1.3 Suggested Instance Types 
For training and hosting Amazon SageMaker algorithms, we recommend using the following EC2 instance types:
- ml.m4.xlarge, ml.m4.4xlarge, and ml.m4.10xlarge
- ml.c4.xlarge, ml.c4.2xlarge, and ml.c4.8xlarge
- ml.p2.xlarge, ml.p2.8xlarge, and ml.p2.16xlarge

### 6.1.4 Logs 
**Note**
If a job fails and logs do not appear in CloudWatch, it's likely that an error occurred before the start of training. Reasons include specifying the wrong training image or S3 location.

The contents of logs vary by algorithms. However, you can typically find the following information:
- Confirmation of arguments provided at the beginning of the log
- Errors that occurred during training
- Measurement of an algorithms accuracy or numerical performance
- Timings for the algorithm, and any major stages within the algorithm

## 6.2 BlazingText
[link](https://dl.acm.org/citation.cfm?doid=3146347.3146354)
- A highly optimized implementations of the **Word2vec** (sentiment analysis, named entity recognition, machine translation) and **text classification** (web search, information retrieval, ranking and document classification) algorithms.
- Similar to Word2vec, it provides the Skip-gram and continuous bag-of-words (CBOW) training architectures.

**BlazingText provides the following features**:
- Accelerated training of fastText text classifier on multi-core CPUs or a GPU and Word2Vec on GPUs using highly optimized CUDA kernels.
- Enriched Word Vectors with Subword Information by learning vector representations for character n-grams.
- A batch_skipgram mode for the Word2Vec algorithm that allows faster training and distributed computation across multiple CPU nodes. 

### 6.2.1 Input/Output Interface
### 6.2.2 Training and Validation Data Format
- Word2Vec algorithm: a training sentence per line
- Text Classification algorithm: a training sentence per line along with the labels
### 6.2.3 Model artifacts and Inference 
- Word2Vec algorithm: *vectors.txt* which contains words to vectors mapping (compatible with other tools like Gensim and Spacy) and *vectors.bin*.
- Text Classification algorithm: model.bin 
### 6.2.4 EC2 Instance [Recommendation]()
### 6.2.5 BlazingText Sample [Notebooks](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.ipynb)

### 6.2.6 [Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext_hyperparameters.html)
#### Word2Vec Hyperparameters
- mode, batch_size, buckets, epochs, evaluation, learning_rate
- min_char, min_count, max_char, negative_samples
- sampling_threshold, subwords, vector_dim, window_size
#### Text Classification Hyperparameters
- mode, buckets, early_stopping, epochs, learning_rate
- min_count, min_epochs, patience, vector_dim, word_ngrams

### 6.2.7 [Tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext-tuning.html) a BlazingText Model
#### Metrics Computed by the BlazingText Algorithm 
 <div class="table">
    <div class="table-contents">
       <table id="w1584aac23c25c29b7b7">
          <tr>
             <th>Metric Name</th>
             <th>Description</th>
             <th>Optimization Direction</th>
          </tr>
          <tr>
             <td><code class="code">train:mean_rho</code></td>
             <td>
                <p>Mean rho (Spearman's rank correlation coefficient) on <a href="https://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/" target="_blank">WS-353 word similarity datasets</a>.
                </p>
             </td>
             <td>
                <p>Maximize</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">validation:accuracy</code></td>
             <td>
                <p>Classification accuracy on user specified validation dataset.
                </p>
             </td>
             <td>
                <p>Maximize</p>
             </td>
          </tr>
       </table>
   </div>
 </div>

#### Tunable Hyperparameters for Word2Vec
<div class="table">
    <div class="table-contents">
       <table id="w1584aac23c25c29b9b3b5">
          <tr>
             <th>Parameter Name</th>
             <th>Parameter Type</th>
             <th>Recommended Ranges</th>
          </tr>
          <tr>
             <td><code class="code">batch_size</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[8-32]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">epochs</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[5-15]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">learning_rate</code></td>
             <td>
                <p>ContinuousParameterRange</p>
             </td>
             <td>
                <p>MinValue: 0.005, MaxValue: 0.01</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">min_count</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[0-100]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">mode</code></td>
             <td>
                <p>CategoricalParameterRange</p>
             </td>
             <td>
                <p>['batch_skipgram', 'skipgram', 'cbow']</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">negative_samples</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[5-25]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">sampling_threshold</code></td>
             <td>
                <p>ContinuousParameterRange</p>
             </td>
             <td>
               <p>MinValue: 0.0001, MaxValue: 0.001</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">vector_dim</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[32-300]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">window_size</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[1-10]</p>
             </td>
          </tr>
       </table>
   </div>
 </div>
 
#### Tunable Hyperparameters for Text Classification
 <div class="table">
    <div class="table-contents">
       <table id="w1584aac23c25c29b9b5b5">
          <tr>
             <th>Parameter Name</th>
             <th>Parameter Type</th>
             <th>Recommended Ranges</th>
          </tr>
          <tr>
             <td><code class="code">buckets</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[1000000-10000000]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">epochs</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[5-15]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">learning_rate</code></td>
             <td>
                <p>ContinuousParameterRange</p>
             </td>
             <td>
                <p>MinValue: 0.005, MaxValue: 0.01</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">min_count</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[0-100]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">mode</code></td>
             <td>
                <p>CategoricalParameterRange</p>
             </td>
             <td>
                <p>['supervised']</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">vector_dim</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[32-300]</p>
             </td>
          </tr>
          <tr>
             <td><code class="code">word_ngrams</code></td>
             <td>
                <p>IntegerParameterRange</p>
             </td>
             <td>
                <p>[1-3]</p>
             </td>
          </tr>
       </table>
   </div>
</div>

## 6.3 DeepAR Forecasting
[Link](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html)

Amazon SageMaker DeepAR is a supervised learning algorithm for forecasting scalar (that is, one-dimensional) **time series** using recurrent neural networks (**RNN**) - **ARIMA, ETS**

Examples of such time series groupings are **demand** for different **products**, **server loads**, and **requests** for web pages. In this case, it can be beneficial to train a single model jointly over all of these time series. 

**Topics**:
### 6.3.1    Input/Output Interface
- DeepAR supports two data channels **train** and **test** with *JSONLINES* file format (either ***.json*** or ***.json.gz*** or ***.parquet***)
- Use RMSE or weighted quantile loss for evaluation
     
     $$\text{RMSE} = \sqrt{\frac{1}{nT}\sum_{i,t}{\left(\hat{y}_{i,t}-y_{i.t}\right)^2}}$$
     $$\text{wQuantileLoss}[\tau] = 2\frac{\sum_{i,t}{Q_{i,t}^{(\tau)}}}{\sum_{i,t}{|y_{i,t}|}}\quad\text{with}\quad Q_{i,t}^{(\tau)}=\begin{cases}(1-\tau)|q_{i,t}^{(\tau)}-y_{i,t}| & \text{ if } q_{i,t}^{(\tau)}>y_{i,t}\\ \tau|q_{i,t}^{(\tau)}-y_{i,t}| & \text{ otherwise } \end{cases}$$


### 6.3.2    Recommended Best Practices
- always provide entire time series for training, testing, and when calling the model for prediction
- dataset can be split into training and test datasets for tuning a DeepAR Model at different end points
- do not use very large values (>400) for *prediction length*
- use the same values for *prediction* and *context* lengths.
- train DeepAR model on as **many time series** as available
### 6.3.3 EC2 Instance Recommendations
- can use GPU and CPU
- use large machine for large model size
    
### 6.3.4 DeepAR Sample Notebooks: 
[Time series forecasting with DeepAR - Synthetic data](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/deepar_synthetic/deepar_synthetic.ipynb)

### 6.3.5 How DeepAR Works
- Under the Hood:<br>
    DeepAR automatically creates feature time series<br>
    DeepAR model is trained by randomly sampling several training examples from each of the time series in the training dataset.<br>
    To capture seasonality patterns, DeepAR also automatically feeds lagged values from the target time series. <br>
    For inference, the trained model takes as input target time series, which might or might not have been used during training, and forecasts a probability distribution for the next prediction_length values.
    
### 6.3.6 [DeepAR Hyperparameters](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar_hyperparameters.html)
- context_length, prediction_length
- epochs (+ early_stopping_patience)
- time_freq (every minutes, hourly, daily, weekly, monthly)
- cardinality (for categorical)
- dropout_rate, embedding_dimension, learning_rate
- likelihood (gaussian, beta, negative-binomial, student-T, deterministic-L1)
- mini_batch_size, num_cells, num_dynamic_feat, num_eval_samples, num_layers, test_quantiles

### 6.3.7 Tuning a DeepAR Model
- **Metrics Computed by the DeepAR Algorithm**:
<table id="w1649aac23c28c21b7b5">
  <tr>
     <th>Metric Name</th>
     <th>Description</th>
     <th>Optimization Direction</th>
  </tr>
  <tr>
     <td><code class="code">test:RMSE</code></td>
     <td>
        <p>Root mean square error between forecast and actual target computed on the test set.
        </p>
     </td>
     <td>
        <p>Minimize</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">test:mean_wQuantileLoss</code></td>
     <td>
        <p>Average overall quantile losses computed on the test set. Setting the <code class="code">test_quantiles</code> hyperparameter controls which quantiles are used. 
        </p>
     </td>
     <td>
        <p>Minimize</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">train:final_loss</code></td>
     <td>
        <p>Training negative log-likelihood loss averaged over the last training epoch for the model.
        </p>
     </td>
     <td>
        <p>Minimize</p>
     </td>
  </tr>
</table>    
- **Tunable Hyperparameters**:
<table id="w1649aac23c28c21b9b5">
  <tr>
     <th>Parameter Name</th>
     <th>Parameter Type</th>
     <th>Recommended Ranges</th>
  </tr>
  <tr>
     <td><code class="code">mini_batch_size</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 32, MaxValue: 1028</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">epochs</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 1, MaxValue: 1000</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">context_length</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 1, MaxValue: 200</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">num_cells</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 30, MaxValue: 200</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">num_layers</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 1, MaxValue: 8</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">dropout_rate</code></td>
     <td>
        <p>ContinuousParameterRange</p>
     </td>
     <td>
        <p>MinValue: 0.00, MaxValue: 0.2</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">embedding_dimension</code></td>
     <td>
        <p>IntegerParameterRanges</p>
     </td>
     <td>
        <p>MinValue: 1, MaxValue: 50</p>
     </td>
  </tr>
  <tr>
     <td><code class="code">learning_rate</code></td>
     <td>
        <p>ContinuousParameterRange</p>
     </td>
     <td>
        <p>MinValue: 1e-5, MaxValue: 1e-1</p>
     </td>
  </tr>
</table>

### 6.3.8 DeepAR Inference Formats
[**JSON**](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar-in-formats.html)

## 6.4 Factorization Machines
- A factorization machine is a **general-purpose supervised learning** algorithm that can be used for both **classification** and **regression** tasks. 
- It is an **extension of a linear model** that is designed to capture interactions between features within high dimensional sparse datasets economically.

### 6.4.1 Input/Output Interface
- RMSE, Log Loss, Accuracy, F1-score
- application/json, x-recordio-protobuf

### 6.4.2 EC2 Instance Recommendation
- CPUs instances

### 6.4.3 Factorization Machines Sample Notebooks
- [ An Introduction to Factorization Machines with MNIST](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/factorization_machines_mnist/factorization_machines_mnist.ipynb)

### 6.4.4 How Factorization Machines Work
- for prediction task, [FM](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) estimates a function $\hat{y}$ from a feature set $x_i$ to a target domain:
$$\hat{y}=w_0+\sum_i w_ix_i +\sum_i\sum_{j>i}<v_i,v_j>x_ix_j$$
- for regression task, FM minimizes:
$$L=\frac{1}{N}\sum_n(y_n-\hat{y}_n)^2$$
- for classification task, FM minimizes:
$$L=\frac{1}{N}\sum_n\left[y_n\log\hat{p}_n+(1-y_n)\log(1-\hat{p}_n)\right]\text{ where } \hat{p}_n=\frac{1}{1+e^{-\hat{y}_n}}$$

### 6.4.5 Factorization Machines Hyperparameters
- feature_dim, num_factors,
- predictor_type (binary_classifier, regressor)
- bias_init_method, bias_init_scale, bias_init_sigma
- bias_init_value, bias_lr, bias_wd, 
- clip_gradient, epochs, eps
- factors_init_method, factors_init_scale, factors_init_sigma, factors_init_value, factors_lr, factors_wd, 
- linear_lr, linear_init_method, linear_init_scale, linear_init_sigma, linear_init_value, linear_wd, 
- mini_batch_size, rescale_grad

### 6.4.6 Tuning a Factorization Machines Model
- **Metrics Computed by the Factorization Machines Algorithm**
<table>
<tr>
    <th>a</th>
    <th>a</th>
    <th>a</th>
</tr>
</table>

### 6.4.7 Factorization Machine Response Formats


In [None]:

## 6.5 Image Classification Algorithm
## 6.6 K-Means Algorithm
## 6.7 K-Nearest Neighbors
## 6.8 Latent Dirichlet Allocation (LDA)
## 6.9 Linear Learner
## 6.10 Neural Topic Model (NTM)
## 6.11 Object Detection Algorithm
## 6.12 Principal Component Analysis (PCA)
## 6.13 Random Cut Forest
## 6.14 Sequence to Sequence (seq2seq)
## 6.15 XGBoost Algorithm