# [모듈 2.1] 로컬 머신에서 훈련 (1)
### 데이터 인터넷에서 다운로드 및 훈련 이미지를 ECR에서 다운로드

- 필요 리소스
    - Sagemaker Session
    - AWS IAM Role
    - 로컬 머신에 Docker 설치

---
### 참조 자료
- Setting up Amazon SageMaker Environment On Your Local Machine
    - 아래 블로그에서 로컬 환경 세팅만 진행
    - https://towardsdatascience.com/setting-up-amazon-sagemaker-environment-on-your-local-machine-7329e0178adc
- Train with Amazon SageMaker on your local machine
    - 환경 셋업이 완료된 후에 코드만 수행
    - https://www.youtube.com/watch?v=K3ngZKF31mc

In [1]:
import pandas as pd
import boto3
import sagemaker

In [2]:
print(pd.__version__)
print(boto3.__version__)
print(sagemaker.__version__)

1.2.4
1.17.98
2.46.0


## 세이지 메이커 세션 및 Role 설정

In [3]:
sagemaker_session= sagemaker.Session()
role='arn:aws:iam::057716757052:role/local_machine_sagemaker_gsmoon'



In [4]:
print(sagemaker_session.account_id())
print(sagemaker_session.get_caller_identity_arn())


057716757052


Couldn't call 'get_role' to get Role ARN from role name Administrator to get Role path.


arn:aws:iam::057716757052:user/Administrator


## 데이터를 인터넷에서 다운로드

In [5]:
import os
import keras
import numpy as np
from keras.datasets import fashion_mnist

In [6]:
(x_train, y_train), (x_val, y_val) = fashion_mnist.load_data()

## 로컬에 데이터 저장

In [7]:
os.makedirs("./data", exist_ok=True)
np.savez('./data/training', image=x_train, label=y_train)
np.savez('./data/validation', image=x_val, label=y_val)


In [8]:
training_input_path = 'file://data/training.npz'
validation_input_path = 'file://data/validation.npz'
output_path = 'file:///tmp/model'


## 훈련 Estimator 정의 및 로컬 모드 실행

In [9]:
from sagemaker.tensorflow import TensorFlow

tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py',
                          role=role,
                          instance_count=1, 
                          instance_type='local',   # Train on the local CPU ('local_gpu' if it has a GPU)
                          framework_version='1.15', 
                          py_version='py3',
                          hyperparameters={'epochs': 1},
                          output_path=output_path
                         )

In [10]:
# Train! This will pull (once) the SageMaker CPU/GPU container for TensorFlow to your local machine.
# Make sure that Docker is running and that docker-compose is installed

tf_estimator.fit({'training': training_input_path, 'validation': validation_input_path})

Creating yaat9bkt0p-algo-1-7amcy ... 
Creating yaat9bkt0p-algo-1-7amcy ... done
Docker Compose is now in the Docker CLI, try `docker compose up`

Attaching to yaat9bkt0p-algo-1-7amcy
[36myaat9bkt0p-algo-1-7amcy |[0m 
[36myaat9bkt0p-algo-1-7amcy |[0m 2021-06-23 13:39:30,773 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training
[36myaat9bkt0p-algo-1-7amcy |[0m 2021-06-23 13:39:30,783 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36myaat9bkt0p-algo-1-7amcy |[0m 2021-06-23 13:39:30,936 botocore.credentials INFO     Found credentials in environment variables.
[36myaat9bkt0p-algo-1-7amcy |[0m 2021-06-23 13:39:31,510 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36myaat9bkt0p-algo-1-7amcy |[0m 2021-06-23 13:39:31,532 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)
[36myaat9bkt0p-algo-1-7amcy |[0m 2021-06-23 13:39:31,551 sagemaker

## 다운로드된 훈련 다커 이미지 확인

In [11]:
!docker images | grep tensorflow

763104351884.dkr.ecr.ap-northeast-2.amazonaws.com/tensorflow-training   1.15-cpu-py3   9b598117e095   2 months ago    1.94GB


In [12]:
!tar tvfz /tmp/model/model.tar.gz

drwxr-xr-x  0 moongons ANT\Domain Users 0 Jun 23 22:41 model/
drwxr-xr-x  0 moongons ANT\Domain Users 0 Jun 23 22:41 model/1/
-rw-r--r--  0 moongons ANT\Domain Users 218515 Jun 23 22:40 model/1/saved_model.pb
drwxr-xr-x  0 moongons ANT\Domain Users      0 Jun 23 22:41 model/1/variables/
-rw-r--r--  0 moongons ANT\Domain Users 19520132 Jun 23 22:40 model/1/variables/variables.data-00000-of-00001
-rw-r--r--  0 moongons ANT\Domain Users     1500 Jun 23 22:40 model/1/variables/variables.index


## S3에 데이터 업로딩

In [14]:
def upload_s3(bucket, file_path, prefix):
    '''
    bucket = sagemaker.Session().default_bucket()
    prefix = 'comprehend'
    train_file_name = 'test/train/train.csv'
    s3_train_path = upload_s3(bucket, train_file_name, prefix)
    '''
    
    prefix_path = os.path.join(prefix, file_path)
    # prefix_test_path = os.path.join(prefix, 'infer/test.csv')

    boto3.Session().resource('s3').Bucket(bucket).Object(prefix_path).upload_file(file_path)
    s3_path = "s3://{}/{}".format(bucket, prefix_path)
    # print("s3_path: ", s3_path)

    return s3_path

# prod train file
local_train_file = 'data/training.npz'
local_val_file = 'data/validation.npz'


bucket = sagemaker.Session().default_bucket()
prefix = 'train-on-local-machine'
s3_train_path = upload_s3(bucket, local_train_file, prefix)
print("s3_train_path: ", s3_train_path)

s3_val_path = upload_s3(bucket, local_val_file, prefix)
print("s3_val_path: ", s3_val_path)

s3_train_path:  s3://sagemaker-ap-northeast-2-057716757052/train-on-local-machine/data/training.npz
s3_val_path:  s3://sagemaker-ap-northeast-2-057716757052/train-on-local-machine/data/validation.npz


## 훈련 Estimator 정의 및 세이지 메이커 호스트 모드 실행
아래 두가지를 변경하여 로컬이 아닌 세이지 메이커 호스트 모드에서 실행
- sagemaker_session 추가
- instance_type 변경

In [15]:
from sagemaker.tensorflow import TensorFlow

tf_estimator = TensorFlow(entry_point='mnist_keras_tf.py',
                          role=role,
                          sagemaker_session = sagemaker_session,
                          instance_count=1, 
                          instance_type='ml.p3.2xlarge',   # Train on sagemaker host mode
                          framework_version='1.15', 
                          py_version='py3',
                          hyperparameters={'epochs': 1},
                         )

In [16]:
tf_estimator.fit({'training': s3_train_path, 'validation': s3_val_path})

2021-06-23 14:00:11 Starting - Starting the training job...
2021-06-23 14:00:13 Starting - Launching requested ML instancesProfilerReport-1624456810: InProgress
......
2021-06-23 14:01:38 Starting - Preparing the instances for training......
2021-06-23 14:02:39 Downloading - Downloading input data...
2021-06-23 14:03:18 Training - Downloading the training image......
[0m
[34m2021-06-23 14:04:12,460 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34m2021-06-23 14:04:12,950 sagemaker-training-toolkit INFO     Invoking user script
[0m
[34mTraining Env:
[0m
[34m{
    "additional_framework_parameters": {},
    "channel_input_dirs": {
        "training": "/opt/ml/input/data/training",
        "validation": "/opt/ml/input/data/validation"
    },
    "current_host": "algo-1",
    "framework_module": "sagemaker_tensorflow_container.training:main",
    "hosts": [
        "algo-1"
    ],
    "hyperparameters": {
        "model_dir": "s3:/