
# FSx를 사용한 Amazon SageMaker의 Mask-RCNN 분산 교육

Mask R-CNN은 중량물 감지 모델이라고도 하며 [MLPerf](https://www.mlperf.org/training-results-0-6/)의 일부입니다.


단계의 개요는 다음과 같습니다.

1. [Amazon S3](https://aws.amazon.com/s3/)에서 COCO 2017 데이터 세트 준비
2. Amazon FSx Lustre 파일 시스템을 생성하고 S3에서 파일 시스템으로 데이터 가져오기

시작하기 전에 노트북 전체에서 사용할 두 개의 파이썬 변수 ```aws_region``` 및 ```s3_bucket```을 초기화하겠습니다. ```3_bucket```은 이 노트북 인스턴스의 영역에 있어야 합니다.


# FSx를 사용한 Amazon SageMaker의 Mask-RCNN 분산 교육

Mask R-CNN은 중량물 감지 모델이라고도 하며 [MLPerf](https://www.mlperf.org/training-results-0-6/)의 일부입니다.


단계의 개요는 다음과 같습니다.

1. [Amazon S3](https://aws.amazon.com/s3/)에서 COCO 2017 데이터 세트 준비
2. Amazon FSx Lustre 파일 시스템을 생성하고 S3에서 파일 시스템으로 데이터 가져오기

시작하기 전에 노트북 전체에서 사용할 두 개의 파이썬 변수 ```aws_region``` 및 ```s3_bucket```을 초기화하겠습니다. ```3_bucket```은 이 노트북 인스턴스의 영역에 있어야 합니다.

In [1]:
import boto3

session = boto3.session.Session()
aws_region = session.region_name
# s3_bucket  = "cf-mask-cnn-handson" # your-s3-bucket-name
s3_bucket  = "cf-mask-cnn-handson" # your-s3-bucket-name


try:
    s3_client = boto3.client('s3')
    response = s3_client.get_bucket_location(Bucket=s3_bucket)
    print(f"Bucket region: {response['LocationConstraint']}")
except:
    print(f"Access Error: Check if '{s3_bucket}' S3 bucket is in '{aws_region}' region")

Bucket region: None


## Amazon S3에서 COCO 2017 데이터 세트 준비

훈련에는 [COCO 2017 데이터세트](http://cocodataset.org/#home)를 사용합니다. COCO 2017 교육 및 검증 데이터 세트를 이 노트북 인스턴스에 다운로드하고, 데이터 세트 아카이브에서 파일을 추출하고, 추출된 파일을 Amazon [S3 버킷](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html)에 업로드합니다. ). ```prepare-s3-bucket.sh``` 스크립트가 이 단계를 실행합니다.

In [2]:
!cat ./prepare-s3-bucket.sh

#!/bin/bash

set -e

if [ "$#" -ne 1 ]; then
    echo "usage: $0 <s3-bucket-name>"
    exit 1
fi

S3_BUCKET=$1
S3_PREFIX="mask-rcnn/sagemaker/input"

# Stage directory must be on EBS volume with 100 GB available space
STAGE_DIR=$HOME/SageMaker/coco-2017-$(date +"%Y-%m-%d-%H-%M-%S")

echo "Create stage directory: $STAGE_DIR"
mkdir -p $STAGE_DIR

wget -O $STAGE_DIR/train2017.zip http://images.cocodataset.org/zips/train2017.zip
echo "Extracting $STAGE_DIR/train2017.zip"
unzip -o $STAGE_DIR/train2017.zip  -d $STAGE_DIR | awk 'BEGIN {ORS="="} {if(NR%1000==0)print "="}'
echo "Done."
rm $STAGE_DIR/train2017.zip

wget -O $STAGE_DIR/val2017.zip http://images.cocodataset.org/zips/val2017.zip
echo "Extracting $STAGE_DIR/val2017.zip"
unzip -o $STAGE_DIR/val2017.zip -d $STAGE_DIR | awk 'BEGIN {ORS="="} {if(NR%1000==0)print "="}'
echo "Done."
rm $STAGE_DIR/val2017.zip

wget -O $STAGE_DIR/annotations_trainval2017.zip http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip -o $STA

Using your *Amazon S3 bucket* as argument, run the cell below. If you have already uploaded COCO 2017 dataset to your Amazon S3 bucket, you may skip this step.

In [5]:
%%sh
chmod +x prepare-s3-bucket.sh
chmod +x stack-fsx.sh

In [4]:
%%time
!./prepare-s3-bucket.sh {s3_bucket}

Create stage directory: /home/ec2-user/SageMaker/coco-2017-2022-03-09-09-34-04
--2022-03-09 09:34:04--  http://images.cocodataset.org/zips/train2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 52.217.39.172
Connecting to images.cocodataset.org (images.cocodataset.org)|52.217.39.172|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19336861798 (18G) [application/zip]
Saving to: ‘/home/ec2-user/SageMaker/coco-2017-2022-03-09-09-34-04/train2017.zip’


2022-03-09 09:37:31 (89.3 MB/s) - ‘/home/ec2-user/SageMaker/coco-2017-2022-03-09-09-34-04/train2017.zip’ saved [19336861798/19336861798]

Extracting /home/ec2-user/SageMaker/coco-2017-2022-03-09-09-34-04/train2017.zip
--2022-03-09 09:39:57--  http://images.cocodataset.org/zips/val2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 52.216.26.44
Connecting to images.cocodataset.org (images.cocodataset.org)|52.216.26.44|:80... connected.
HTTP request sent, awaiting response... 200 OK

## Create FSx Lustre file-system and import data from S3

Below, we use [AWS CloudFomration stack](https://docs.aws.amazon.com/en_pv/AWSCloudFormation/latest/UserGuide/stacks.html) to create a FSx Lustre file-system and import COCO 2017 dataset into the FSx file-system from your S3 bucket. 

In [8]:
print("tst")

tst


In [9]:
!cat stack-fsx.sh

#!/bin/bash

if [ $# -lt 5 ]; then
    echo "usage: $0 <aws-region> <s3-import-path> <fsx-capacity> <subnet-id> <security-group-id>"
    exit 1
fi

# AWS Region
AWS_REGION=$1

# S3 import path
S3_IMPORT_PATH=$2

# FSx storage capacity in multiples of 3600
STORAGE_CAPACITY=$3

SUBNET_ID=$4

SG_ID=$5

DATE=`date +%s`

#Customize stack name as needed
STACK_NAME="fsx-stack-$DATE"

# cfn template name
CFN_TEMPLATE='cfn-fsx.yaml'

aws cloudformation create-stack --region $AWS_REGION  --stack-name $STACK_NAME \
--template-body file://$CFN_TEMPLATE \
--capabilities CAPABILITY_NAMED_IAM \
--parameters \
ParameterKey=S3ImportPath,ParameterValue=$S3_IMPORT_PATH \
ParameterKey=StorageCapacityGiB,ParameterValue=$STORAGE_CAPACITY \
ParameterKey=SecurityGroupId,ParameterValue=$SG_ID \
ParameterKey=SubnetId,ParameterValue=$SUBNET_ID 

echo "Creating FSx Luster file-system [eta 600 seconds]"

sleep 30

progress=$(aws cloudformation list-stacks --stack-status-filter 'CREATE_IN_PROGRESS' | grep $STACK_NA

For this step, the [IAM Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) attached to this notebook instance needs full access to Amazon CloudFormation and FSx services. If you created this notebook instance using the ```./stack-sm.sh``` script in this repository, the IAM Role attached to this notebook instance is already setup with requried access. 

```usage: ./stack-fsx.sh <aws-region> <s3-import-path> <fsx-capacity> <subnet-id> <security-group-id>```
    
<table>
    <caption>FSx configuraiton</caption>
    <tr>
        <th style="text-align:center">Argument</th>
        <th style="text-align:center">Description</th>
        <th style="text-align:center">Value</th>
    </tr>
    <tr>
        <td style="text-align:center">aws-region</td>
        <td style="text-align:left">AWS region name</td>
        <td style="text-align:center">e.g. us-east-1 </td>
     <tr>
    <tr>
        <td style="text-align:center">s3-import-path</td>
        <td style="text-align:left">S3 import path for importing data to FSx file-system</td>
        <td style="text-align:center">s3://&lt;s3-bucket-name&gt;/mask-rcnn/sagemaker/input</td>
     <tr>
      <tr>
        <td style="text-align:center">fsx-capacity</td>
        <td style="text-align:left">FSx Lustre file-system capacity in GiB</td>
        <td style="text-align:center">3600 or 7200</td>
     <tr>
      <tr>
        <td style="text-align:center">subnet-id</td>
        <td style="text-align:left">This is available in the output of ./stack-sm.sh script you used to create this notebook instance.  Specify only one subnet.</td>
        <td style="text-align:center">subnet-xxxx</td>
     <tr>
     <tr>
        <td style="text-align:center">security-group-id</td>
        <td style="text-align:left">Security group id for FSx lustre file system. This is available in the output of ./stack-sm.sh script you used to create this notebook instance.   </td>
        <td style="text-align:center">sg-xxxx</td>
     <tr>
</table>


If you have already created a FSx Lustre file-system and populated it with COCO 2017 dataset, you may skip this step.

In [14]:

#security_group_id = # 'sg-xxxxxxxx' 
security_group_id = "sg-09cd03c48cfa209b4"  

# subnet_id = # 'subnet-xxxxxxx'
subnet_id = "subnet-009b8a529792a4279"# 'subnet-009b8a529792a4279'

    

In [15]:
%%time
!./stack-fsx.sh {aws_region} s3://{s3_bucket}/mask-rcnn/sagemaker/input 3600 {subnet_id} {security_group_id}

{
    "StackId": "arn:aws:cloudformation:us-east-1:189546603447:stack/fsx-stack-1646830945/2b06d440-9fa9-11ec-8982-12d1fd7a2609"
}
Creating FSx Luster file-system [eta 600 seconds]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 31 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 62 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 93 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 123 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 154 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 185 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 215 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 246 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 277 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 307 secs elapsed ]
Stack fsx-stack-1646830945 status: CREATE_IN_PROGRESS: [ 338 s