# PaddlePaddle BYOS

## Pre-requisites

This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments.  This can speed up iterative testing and debugging while using the same familiar Python SDK interface.  Just change your estimator's `train_instance_type` to `local` (or `local_gpu` if you're using an ml.p2 or ml.p3 notebook instance).

In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU).

**Note, you can only run a single local notebook at one time.**

In [1]:
# !/bin/bash ./utils/setup.sh

In [1]:
!ls

data		 finetune.py  README.md			   uie_byos.ipynb
doccano_org.py	 lambda       requirements.txt		   utils
doccano.py	 model	      uie_byos_en_stary_gpu.ipynb  utils.py
evaluate.py	 model.py     uie_byos_gpu_en.ipynb
export_model.py  prepare.py   uie_byos_gpu.ipynb


In [1]:
!pip install paddlepaddle paddlenlp

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting paddlepaddle
  Using cached paddlepaddle-2.3.2-cp38-cp38-manylinux1_x86_64.whl (112.6 MB)
Collecting paddlenlp
  Using cached paddlenlp-2.4.0-py3-none-any.whl (1.8 MB)
Collecting paddle-bfloat==0.1.7
  Using cached paddle_bfloat-0.1.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385 kB)
Collecting astor
  Using cached astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting opt-einsum==3.3.0
  Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting colorlog
  Using cached colorlog-6.7.0-py2.py3-none-any.whl (11 kB)
Collecting paddlefsl
  Using cached paddlefsl-1.1.0-py3-none-any.whl (101 kB)
Collecting datasets>=2.0.0
  Using cached datasets-2.4.0-py3-none-any.whl (365 kB)
Collecting sentencepiece
  Using cached sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Collecting jieba
  Using cached jieba-0.42.1-py3-none-any.whl
Collecting paddl

## Overview

The **SageMaker Python SDK** helps you deploy your models for training and hosting in optimized, productions ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow, MXNet, PyTorch and Chainer. This tutorial focuses on how to create a convolutional neural network model to train the [Cifar10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) using **PyTorch in local mode**.

### Set up the environment

This notebook was created and tested on a single ml.p2.xlarge notebook instance.

Let's start by specifying:

- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the sagemaker.get_execution_role() with appropriate full IAM role arn string(s).

In [12]:
import os
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/DEMO-PaddleNLP'

role = sagemaker.get_execution_role()

In [13]:
!python prepare.py \
    --mode 'folder' \
    --input_path '../Annotated_Data/Data_Mining' \
    --output_folder './output'

# Prepare data

In [14]:
!python doccano.py \
    --folder_path ./output \
    --task_type ext \
    --save_dir ./data \
    --splits 0.9 0.1 0

[32m[2022-10-11 08:12:24,984] [    INFO][0m - Converting doccano data...[0m
100%|██████████████████████████████████████| 331/331 [00:00<00:00, 28855.29it/s]
[32m[2022-10-11 08:12:24,997] [    INFO][0m - Adding negative samples for first stage prompt...[0m
100%|██████████████████████████████████████| 331/331 [00:00<00:00, 48295.92it/s]
[32m[2022-10-11 08:12:25,004] [    INFO][0m - Adding negative samples for second stage prompt...[0m
100%|██████████████████████████████████████| 331/331 [00:00<00:00, 17172.34it/s]
[32m[2022-10-11 08:12:25,025] [    INFO][0m - Converting doccano data...[0m
100%|████████████████████████████████████████| 37/37 [00:00<00:00, 44505.09it/s]
[32m[2022-10-11 08:12:25,026] [    INFO][0m - Adding negative samples for first stage prompt...[0m
100%|████████████████████████████████████████| 37/37 [00:00<00:00, 72620.14it/s]
[32m[2022-10-11 08:12:25,027] [    INFO][0m - Adding negative samples for second stage prompt...[0m
100%|██████████████████████

### Upload the data
We use the ```sagemaker.Session.upload_data``` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use this later when we start the training job.

In [15]:
data_location = sagemaker.Session().upload_data(path = "./data", key_prefix=prefix)
# base_dir = 'file:///home/ec2-user/SageMaker/paddlenlp_sagemaker/data/'
# inputs = {'training': base_dir}
# print(inputs)

In [16]:
data_location

's3://sagemaker-us-west-2-064542430558/sagemaker/DEMO-PaddleNLP'

## Script Functions

SageMaker invokes the main function defined within your training script for training. When deploying your trained model to an endpoint, the model_fn() is called to determine how to load your trained model. The model_fn() along with a few other functions list below are called to enable predictions on SageMaker.

### [Predicting Functions](https://github.com/aws/sagemaker-pytorch-containers/blob/master/src/sagemaker_pytorch_container/serving.py)
* model_fn(model_dir) - loads your model.
* input_fn(serialized_input_data, content_type) - deserializes predictions to predict_fn.
* output_fn(prediction_output, accept) - serializes predictions from predict_fn.
* predict_fn(input_data, model) - calls a model on data deserialized in input_fn.

The model_fn() is the only function that doesn't have a default implementation and is required by the user for using PyTorch on SageMaker. 

## Create a training job using the sagemaker.PyTorch estimator

The `PyTorch` class allows us to run our training function on SageMaker. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type. For local training with GPU, we could set this to "local_gpu".  In this case, `instance_type` was set above based on your whether you're running a GPU instance.

After we've constructed our `PyTorch` object, we fit it using the data we uploaded to S3. Even though we're in local mode, using S3 as our data source makes sense because it maintains consistency with how SageMaker's distributed, managed training ingests data.


## SageMaker Training using GPU instance

In [17]:
inputs = {'training': data_location}

print(inputs)

{'training': 's3://sagemaker-us-west-2-064542430558/sagemaker/DEMO-PaddleNLP'}


In [18]:
#upload uie-base-en pretrain

# uie_en_model_s3 = sagemaker.Session().upload_data(path = "../uie-base-en/taskflow/information_extraction/uie-base-en", key_prefix="model_uie_base_en")
uie_en_model_s3 = 's3://sagemaker-us-west-2-064542430558/model_uie_base_en'

In [None]:
from sagemaker.pytorch import PyTorch

hyperparameters = {'train_path': '/opt/ml/input/data/training/train.txt', 
                   'dev_path': '/opt/ml/input/data/training/dev.txt', 
                   'save_dir': '/opt/ml/model', 
                   'learning_rate': 1e-5, 
                   'batch_size': 16, 
                   'max_seq_len':512, 
                   'num_epochs': 20, 
                   'model': 'uie-base',
                   'seed': 1000,
                   'logging_steps': 10,
                   'valid_steps': 1000,
                   'device': 'gpu',
                   'freeze': True}

instance_type = 'ml.g4dn.12xlarge'  # 'ml.p3.2xlarge' or 'ml.p3.8xlarge' or ...

#git_config = {'repo': 'https://github.com/whn09/paddlenlp_sagemaker.git', 'branch': 'main'}

estimator = PyTorch(entry_point='finetune.py',
                    source_dir='./',
                           # git_config=git_config,
                    role=role,
                    hyperparameters=hyperparameters,
                    framework_version='1.9.1',
                    py_version='py38',
                    script_mode=True,
                    instance_count=1,  # 1 or 2 or ...
                    instance_type=instance_type,
                    # Parameters required to enable checkpointing
                    checkpoint_s3_uri=uie_en_model_s3, #使用你自己用来保存/加载模型的s3桶地址, 注意桶需要在us-east-1
                    checkpoint_local_path="/opt/ml/checkpoints")

estimator.fit(inputs)

2022-10-11 08:30:52 Starting - Starting the training job...
2022-10-11 08:31:19 Starting - Preparing the instances for trainingProfilerReport-1665477052: InProgress
.........
2022-10-11 08:32:36 Downloading - Downloading input data...
2022-10-11 08:33:16 Training - Downloading the training image.....................
2022-10-11 08:36:49 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-10-11 08:36:52,430 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-10-11 08:36:52,476 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-10-11 08:36:52,482 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-10-11 08:36:53,009 sagemaker-training-toolkit INFO     Installing dependencies from requirements.t

In [11]:
training_job_name = estimator.model_data
# training_job_name = 'xxx'
print(training_job_name)

s3://sagemaker-us-west-2-064542430558/pytorch-training-2022-10-11-07-37-29-125/output/model.tar.gz


In [1]:
#!aws s3 cp s3://$bucket/$training_job_name/output/model.tar.gz ../
!tar -zxvf ../model.tar.gz -C ../

model_1000/
model_6000/
model_2000/
model_5000/
model_best/
model_best/model_state.pdparams
model_best/vocab.txt
model_best/tokenizer_config.json
model_best/special_tokens_map.json
model_best/model_config.json
inference.pdiparams.info
model_3000/
inference.pdiparams
inference.pdmodel
model_4000/


In [None]:
! python evaluate.py \
--model_path ../model_best \
--test_path ./data/test.txt \
--batch_size 4 \
--debug

[32m[2022-09-23 03:00:09,151] [    INFO][0m - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load '../model_best'.[0m
W0923 03:00:09.174229 30833 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.6, Runtime API Version: 11.1
W0923 03:00:09.176781 30833 gpu_resources.cc:91] device: 0, cuDNN Version: 8.0.
<<<< class dict: dict_keys(['royal', 'family', 'personality', 'person', 'occupation', 'pregnant', 'status', 'parts of body', 'origanization', 'supernature', 'color', 'hobby', 'body type', 'age', 'event', 'race', 'location', 'size', 'Sexual description', 'sexual description', 'gender', 'facility', 'office-work', 'cheating', 'high-tech', 'height', 'love stage', 'shape', 'campus', 'abuse'])
<<< start evaluate
<<< batch 6
predict_start_ids[52, 62] predict_end_ids[52, 62]
pred_set {(52, 52), (62, 62)}
predict_start_ids[43] predict_end_ids[45]
pred_set {(43, 45)}
predict_start_ids[33] predict_end_ids[33]
pred_s

# Deploy the trained model to prepare for predictions

The deploy() method creates an endpoint (in this case locally) which serves prediction requests in real-time.

In [23]:
!aws s3 cp s3://sagemaker-us-west-2-064542430558/pytorch-training-2022-10-11-07-37-29-125/output/model.tar.gz /tmp/
!tar -zxvf /tmp/model.tar.gz -C /tmp/

download: s3://sagemaker-us-west-2-064542430558/pytorch-training-2022-10-11-07-37-29-125/output/model.tar.gz to ../../../../../../tmp/model.tar.gz
model_200/
model_300/
inference.pdmodel
model_best/
model_best/tokenizer_config.json
model_best/vocab.txt
model_best/model_state.pdparams
model_best/special_tokens_map.json
model_best/model_config.json
inference.pdiparams
inference.pdiparams.info
model_100/


In [29]:
!cp /tmp/inference.* model/
!cp /tmp/model_best/* model/
!cp model/code/requirements_gpu.txt model/code/requirements.txt
!cd model && tar -czvf ../model-inference-gpu.tar.gz *

#!aws s3 cp model-inference-gpu.tar.gz s3://$bucket/output/model-inference-gpu.tar.gz

code/
code/infer.py
code/.ipynb_checkpoints/
code/.ipynb_checkpoints/infer_gpu-checkpoint.py
code/uie_predictor.py
code/infer_cpu.py
code/requirements.txt
code/requirements_gpu.txt
code/model.py
code/infer_gpu.py
code/requirements_cpu.txt
inference.pdiparams
inference.pdiparams.info
inference.pdmodel
model_config.json
model_state.pdparams
special_tokens_map.json
tokenizer_config.json
vocab.txt


In [30]:
!aws s3 cp model-inference-gpu.tar.gz s3://$bucket/output/model-inference-gpu.tar.gz

upload: ./model-inference-gpu.tar.gz to s3://sagemaker-us-west-2-064542430558/output/model-inference-gpu.tar.gz


In [31]:
# instance_type = 'local'
# instance_type = 'ml.m5.xlarge'
instance_type = 'ml.g4dn.xlarge'

# predictor = estimator.deploy(initial_instance_count=1, instance_type=instance_type)

from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(model_data='s3://{}/output/model-inference-gpu.tar.gz'.format(bucket), role=role,
                             entry_point='infer_gpu.py', framework_version='1.9.0', py_version='py38', model_server_workers=4)  # TODO [For GPU], model_server_workers=6

predictor = pytorch_model.deploy(instance_type=instance_type, initial_instance_count=1)

---------------!

In [16]:
# # endpoint_name = 'pytorch-inference-2022-07-05-07-28-16-183'  # m5.2xlarge
# # endpoint_name = 'pytorch-inference-2022-07-06-04-02-11-091'  # g4dn.xlarge, 6 threads
# endpoint_name = 'pytorch-inference-2022-07-06-06-19-21-855'  # g4dn.xlarge, 4 threads
# predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name)

# Invoking the endpoint

In [32]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor.serializer = JSONSerializer()
predictor.deserializer = JSONDeserializer()

In [33]:
texts = ["After a long discussion about it. Selene's brother, Helios, came up with a compromise. 'Alright, Selene, they shall have a chance for change. For now, you will pair them with their own races, but when the time comes you will choose a pure-hearted female to be your Moon Princess. She will have three mates, one of her own kind and two of different races. If she can bring three races together with her mates, then we will not destroy them.' Selene was happy that her children were given a chance. A chan"]

import time
start = time.time()
outputs = predictor.predict(texts)
end = time.time()
print('outputs: ', outputs)
print('time:', end-start)

# for i in range(1000):
#     start = time.time()
#     outputs = predictor.predict(texts)
#     end = time.time()
#     print('time:', end-start)

outputs:  [{'person': [{'text': 'Helios', 'start': 52, 'end': 58, 'probability': 0.9983993172645569}, {'text': 'Selene', 'start': 34, 'end': 40, 'probability': 0.9956828355789185}, {'text': 'Selene', 'start': 97, 'end': 103, 'probability': 0.9991198182106018}, {'text': 'Selene', 'start': 441, 'end': 447, 'probability': 0.9987065196037292}], 'status': [{'text': 'Moon Princess', 'start': 265, 'end': 278, 'probability': 0.7326385974884033}], 'personality': [{'text': 'pure-hearted', 'start': 234, 'end': 246, 'probability': 0.8689507246017456}]}]
time: 0.8999683856964111


# Clean-up

Deleting the local endpoint when you're finished is important since you can only run one local endpoint at a time.

In [None]:
# estimator.delete_endpoint()
predictor.delete_endpoint()

In [None]:
x = "I wipe whatever tears had trickled down my face, removing my rings from my fingers and clutching them in my hands.\nThe hallway seems longer than normal but I walk briskly to the office where I find Christian, the elders, the lawyer, Jordan, Derek and Vanessa waiting for me."