# Demystifying AWS SageMaker Training for Sklearn Lovers 
> This post is about using AWS SageMaker to train and deploy models.

- toc: true 
- badges: true
- comments: true
- categories: [aws, ml, sagemaker]
- keyword: [aws, ml, sagemaker]
- image: images/copied_from_nb/images/2022-06-08-sagemaker-training-overview.jpeg

![](images/2022-06-08-sagemaker-training-overview.jpeg)

# Enviornment

This notebook is prepared with Amazon SageMaker Studio using `Python 3 (Data Science)` Kernel and `ml.t3.medium` instance.

# About

This post is about understanding the end-to-end machine learning workflow for AWS SageMaker. We will apply SageMaker builtin [Linear Learner](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html) on [Kaggle Boston Housing dataset](https://www.kaggle.com/c/boston-housing). Our goal will be to understand all the steps involved in training a model with SageMaker.

# Introduction

A typical SageMaker machine learning flow has the following steps. If we have a good understanding of them then we can use this approach to train any model with SageMaker.
1. **Put Data on S3 Bucket**
    In most of the use cases, you will keep your training data on S3 bucket. You may also need to preprocess your data and for this, you can use [SageMaker Data Wrangler](https://hassaanbinaslam.github.io/myblog/aws/ml/sagemaker/2022/05/17/aws-sagemaker-wrangler-p1.html). In this post, we will consider that data has already been processed and is ready for training.
2. **Configure the Training Job**
   While configuring a training job you need to take care of the following requirements
   a. select the algorithm you want to use for training
   b. set the hyperparameters (if any)
   c. define the infrastructure requirements like how many CPUs or GPUs you want to throw at your training run
3. **Launch Training Job**
   Tell you training job where the input data is located, and once training is complete where should the output artifacts be stored. Once input and output are configured you can then start the training run. Once a run is started SageMaker will automatically create and provide the required infrastructure, and once the training is complete it will be terminated, and you will be only billed for what you have used.
4. **Deploy model and make predictions**
   Deploy the model to make real-time HTTPS predictions. Again, you need to define the infrastructure requirements where you want your model to be deployed.
5. Clean Up (Optional)
   If you are experimenting, you may want to terminate the machine on which you have deployed your model for testing purposes to avoid unnecessary charges.

# Put Data on S3 Bucket