# [Module 1.5] 로컬 모드 훈련

본 워크샵의 모든 노트북은 `conda_python3` 여기에서 작업 합니다.

이 노트북은 아래와 같은 작업을 합니다.
- 아래는 세이지메이커의 어떤 피쳐도 사용하지 않고, PyTorch 만을 사용해서 훈련 합니다.

# PyTorch CIFAR-10 local training  



In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/DEMO-pytorch-cnn-cifar10"

role = sagemaker.get_execution_role()

In [3]:
import os
import subprocess

instance_type = "local"

try:
    if subprocess.call("nvidia-smi") == 0:
        ## Set type to GPU if one is present
        instance_type = "local_gpu"
except:
    pass

print("Instance type = " + instance_type)

Instance type = local_gpu


### Upload the data
We use the ```sagemaker.Session.upload_data``` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use this later when we start the training job.

In [4]:
inputs = sagemaker_session.upload_data(path="../data", bucket=bucket, key_prefix="data/cifar10")
print("s3 inputs: ", inputs)

s3 inputs:  s3://sagemaker-ap-northeast-2-057716757052/data/cifar10


# Construct a script for training 
Here is the full code for the network model:

In [5]:
import os
import subprocess

instance_type = "local"

try:
    if subprocess.call("nvidia-smi") == 0:
        ## Set type to GPU if one is present
        instance_type = "local_gpu"
except:
    pass

print("Instance type = " + instance_type)

Instance type = local_gpu


In [6]:
from sagemaker.pytorch import PyTorch

cifar10_estimator = PyTorch(
    entry_point="train.py",    
    source_dir='source',    
    role=role,
    framework_version='1.6.0',
    py_version='py3',
    instance_count=1,
    instance_type=instance_type,
    hyperparameters={'epochs': 1, 
                     'lr': 0.1,
                    }                      
    
)
cifar10_estimator.fit(inputs)

Creating c5xivjdtjf-algo-1-sa2f6 ... 
Creating c5xivjdtjf-algo-1-sa2f6 ... done
Attaching to c5xivjdtjf-algo-1-sa2f6
[36mc5xivjdtjf-algo-1-sa2f6 |[0m 2021-06-01 08:25:35,272 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training
[36mc5xivjdtjf-algo-1-sa2f6 |[0m 2021-06-01 08:25:35,315 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.
[36mc5xivjdtjf-algo-1-sa2f6 |[0m 2021-06-01 08:25:35,318 sagemaker_pytorch_container.training INFO     Invoking user training script.
[36mc5xivjdtjf-algo-1-sa2f6 |[0m 2021-06-01 08:25:35,471 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:
[36mc5xivjdtjf-algo-1-sa2f6 |[0m /opt/conda/bin/python3.6 -m pip install -r requirements.txt
[36mc5xivjdtjf-algo-1-sa2f6 |[0m Collecting torchsummary==1.5.1
[36mc5xivjdtjf-algo-1-sa2f6 |[0m   Downloading torchsummary-1.5.1-py3-none-any.whl (2.8 kB)
[36mc5xivjdtjf-algo-1-sa2f6 |[0m Installing collec

## SageMaker Host Mode 로 훈련

In [7]:
from sagemaker.pytorch import PyTorch

instance_type = 'ml.p3.8xlarge'

cifar10_estimator = PyTorch(
    entry_point="train.py",    
    source_dir='source',    
    role=role,
    framework_version='1.6.0',
    py_version='py3',
    instance_count=1,
    instance_type=instance_type,
    hyperparameters={'epochs': 2, 
                     'lr': 0.01,
                    }                      
    
)
cifar10_estimator.fit(inputs)

2021-06-01 08:28:04 Starting - Starting the training job...
2021-06-01 08:28:27 Starting - Launching requested ML instancesProfilerReport-1622536084: InProgress
......
2021-06-01 08:29:28 Starting - Preparing the instances for training.........
2021-06-01 08:30:58 Downloading - Downloading input data
2021-06-01 08:30:58 Training - Downloading the training image............[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2021-06-01 08:33:00,433 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2021-06-01 08:33:00,475 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2021-06-01 08:33:03,497 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2021-06-01 08:33:03,974 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:[0m
[34m/opt/conda/bin/pyt

In [8]:
last_job_name = cifar10_estimator.latest_training_job.job_name
artifact_path = "s3://{}/{}/output/model.tar.gz".format(bucket, last_job_name)
print("artifact_path: ", artifact_path)

%store artifact_path

artifact_path:  s3://sagemaker-ap-northeast-2-057716757052/pytorch-training-2021-06-01-08-28-04-095/output/model.tar.gz
Stored 'artifact_path' (str)
