# PaddlePaddle BYOS

## Pre-requisites

This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments.  This can speed up iterative testing and debugging while using the same familiar Python SDK interface.  Just change your estimator's `train_instance_type` to `local` (or `local_gpu` if you're using an ml.p2 or ml.p3 notebook instance).

In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU).

**Note, you can only run a single local notebook at one time.**

In [2]:
# !/bin/bash ./utils/setup.sh

In [5]:
!pip install paddlepaddle paddlenlp

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting paddlepaddle
  Downloading paddlepaddle-2.3.2-cp38-cp38-manylinux1_x86_64.whl (112.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.6/112.6 MB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting paddlenlp
  Downloading paddlenlp-2.4.1-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m79.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting astor
  Downloading astor-0.8.1-py2.py3-none-any.whl (27 kB)
Collecting opt-einsum==3.3.0
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 KB[0m [31m16.3 MB/s[0m eta [36m0:00:00[0m
Collecting paddle-bfloat==0.1.7
  Downloading paddle_bfloat-0.1.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (385 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [

## Overview

The **SageMaker Python SDK** helps you deploy your models for training and hosting in optimized, productions ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow, MXNet, PyTorch and Chainer. This tutorial focuses on how to create a convolutional neural network model to train the [Cifar10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) using **PyTorch in local mode**.

### Set up the environment

This notebook was created and tested on a single ml.p2.xlarge notebook instance.

Let's start by specifying:

- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the sagemaker.get_execution_role() with appropriate full IAM role arn string(s).

In [1]:
import os
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/shulex-jackie'

role = sagemaker.get_execution_role()

In [2]:
!ls ./shulex

鞋-标注.jsonl


In [3]:
# first upload the labeled data to local path ./shulex
!python prepare_shulex.py \
    --input_path './shulex/鞋-标注.jsonl' \
    --output_folder './output_shulex'

# Prepare data

In [6]:
!python doccano.py \
    --folder_path ./output_shulex \
    --task_type ext \
    --save_dir ./data_shulex \
    --splits 0.9 0.1 0

[32m[2022-10-25 05:46:19,809] [    INFO][0m - Converting doccano data...[0m
100%|████████████████████████████████████████| 18/18 [00:00<00:00, 31068.92it/s]
[32m[2022-10-25 05:46:19,812] [    INFO][0m - Adding negative samples for first stage prompt...[0m
100%|███████████████████████████████████████| 18/18 [00:00<00:00, 142179.80it/s]
[32m[2022-10-25 05:46:19,812] [    INFO][0m - Converting doccano data...[0m
100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 17886.16it/s]
[32m[2022-10-25 05:46:19,813] [    INFO][0m - Adding negative samples for first stage prompt...[0m
100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 43018.50it/s]
[32m[2022-10-25 05:46:19,813] [    INFO][0m - Converting doccano data...[0m
0it [00:00, ?it/s]
[32m[2022-10-25 05:46:19,814] [    INFO][0m - Adding negative samples for first stage prompt...[0m
0it [00:00, ?it/s]
[32m[2022-10-25 05:46:19,815] [    INFO][0m - Save 90 examples to ./data_shulex/train.txt.[0

### Upload the data
We use the ```sagemaker.Session.upload_data``` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use this later when we start the training job.

In [7]:
data_location = sagemaker.Session().upload_data(path = "./data_shulex", key_prefix=prefix)

In [8]:
data_location

's3://sagemaker-us-east-1-726335585155/sagemaker/shulex-jackie'

## Script Functions

SageMaker invokes the main function defined within your training script for training. When deploying your trained model to an endpoint, the model_fn() is called to determine how to load your trained model. The model_fn() along with a few other functions list below are called to enable predictions on SageMaker.

### [Predicting Functions](https://github.com/aws/sagemaker-pytorch-containers/blob/master/src/sagemaker_pytorch_container/serving.py)
* model_fn(model_dir) - loads your model.
* input_fn(serialized_input_data, content_type) - deserializes predictions to predict_fn.
* output_fn(prediction_output, accept) - serializes predictions from predict_fn.
* predict_fn(input_data, model) - calls a model on data deserialized in input_fn.

The model_fn() is the only function that doesn't have a default implementation and is required by the user for using PyTorch on SageMaker. 

## Create a training job using the sagemaker.PyTorch estimator

The `PyTorch` class allows us to run our training function on SageMaker. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type. For local training with GPU, we could set this to "local_gpu".  In this case, `instance_type` was set above based on your whether you're running a GPU instance.

After we've constructed our `PyTorch` object, we fit it using the data we uploaded to S3. Even though we're in local mode, using S3 as our data source makes sense because it maintains consistency with how SageMaker's distributed, managed training ingests data.


## SageMaker Training using GPU instance

In [9]:
inputs = {'training': data_location}

print(inputs)

{'training': 's3://sagemaker-us-east-1-726335585155/sagemaker/shulex-jackie'}


In [10]:
# prepare pretrained model, it will downlaod pretrained en-model from model repo, approximately 1-3 minutes
from paddlenlp import Taskflow

schema = ['asin',
              'design for Device',
              'Hub/Dock',
              'Number of Ports',  # 接口数
              'usb transfer speed',  # USB接口传输速度
              'SD transfer speed',  # SD卡传输速度
              'contain HDMI hub',  # 含有HDMI接口
              'contain VGA hub',  # 含有VGA接口
              ]  # Define the schema for entity extraction
ie = Taskflow("information_extraction", model='uie-base-en', schema=schema, home_path='../uie-base-en')


[32m[2022-10-25 05:47:39,444] [    INFO][0m - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base_en_v1.1/model_state.pdparams[0m
100%|██████████| 418M/418M [00:58<00:00, 7.52MB/s]    
[32m[2022-10-25 05:48:40,673] [    INFO][0m - Downloading model_config.json from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base_en/model_config.json[0m
100%|██████████| 347/347 [00:00<00:00, 325kB/s]
[32m[2022-10-25 05:48:41,793] [    INFO][0m - Downloading vocab.txt from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base_en/vocab.txt[0m
100%|██████████| 226k/226k [00:03<00:00, 58.3kB/s] 
[32m[2022-10-25 05:48:47,217] [    INFO][0m - Downloading special_tokens_map.json from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base_en/special_tokens_map.json[0m
100%|██████████| 112/112 [00:00<00:00, 98.8kB/s]
[32m[2022-10-25 05:48:48,402] [    INFO][0m - Downloading token

In [11]:
#upload uie-base-en pretrain

uie_en_model_s3 = sagemaker.Session().upload_data(path = "../uie-base-en/taskflow/information_extraction/uie-base-en", key_prefix="model_uie_base_en")
#uie_en_model_s3 = 's3://sagemaker-us-west-2-064542430558/model_uie_base_en'
print ("uie_en_model path:", uie_en_model_s3)

uie_en_model path: s3://sagemaker-us-east-1-726335585155/model_uie_base_en


In [None]:
from sagemaker.pytorch import PyTorch

hyperparameters = {'train_path': '/opt/ml/input/data/training/train.txt', 
                   'dev_path': '/opt/ml/input/data/training/dev.txt', 
                   'save_dir': '/opt/ml/model', 
                   'learning_rate': 1e-5, 
                   'batch_size': 16, 
                   'max_seq_len':512, 
                   'num_epochs': 50, 
                   'model': 'uie-base',
                   'seed': 1000,
                   'logging_steps': 10,
                   'valid_steps': 50, # note this step should not larger than total
                   'device': 'gpu',
                   'freeze':True}

instance_type = 'ml.p3.2xlarge'  # 'ml.p3.2xlarge' or 'ml.p3.8xlarge' or ...

#git_config = {'repo': 'https://github.com/whn09/paddlenlp_sagemaker.git', 'branch': 'main'}

estimator = PyTorch(entry_point='finetune.py',
                    source_dir='./',
                           # git_config=git_config,
                    role=role,
                    hyperparameters=hyperparameters,
                    framework_version='1.9.1',
                    py_version='py38',
                    script_mode=True,
                    instance_count=1,  # 1 or 2 or ...
                    instance_type=instance_type,
                    # Parameters required to enable checkpointing
                    checkpoint_s3_uri=uie_en_model_s3, #使用你自己用来保存/加载模型的s3桶地址, 注意桶需要在us-east-1
                    checkpoint_local_path="/opt/ml/checkpoints")

estimator.fit(inputs)

2022-10-25 06:09:42 Starting - Starting the training job...ProfilerReport-1666678182: InProgress
...
2022-10-25 06:10:30 Starting - Preparing the instances for training.........
2022-10-25 06:12:10 Downloading - Downloading input data......
2022-10-25 06:13:11 Training - Downloading the training image............

In [14]:
training_job_name = estimator.latest_training_job.name
# training_job_name = 'xxx'
print(training_job_name)

pytorch-training-2022-10-25-06-09-41-771


In [15]:
model_data = estimator.model_data
print (model_data)

s3://sagemaker-us-east-1-726335585155/pytorch-training-2022-10-25-06-09-41-771/output/model.tar.gz


# Deploy the trained model to prepare for predictions

The deploy() method creates an endpoint (in this case locally) which serves prediction requests in real-time.

In [29]:
#!mkdir /tmp
!aws s3 cp $model_data /tmp/
!tar -zxvf /tmp/model.tar.gz -C /tmp/

download: s3://sagemaker-us-east-1-726335585155/pytorch-training-2022-10-25-06-09-41-771/output/model.tar.gz to ../../../../../../../tmp/model.tar.gz
inference.pdmodel
model_best/
model_best/model_config.json
model_best/special_tokens_map.json
model_best/tokenizer_config.json
model_best/model_state.pdparams
model_best/vocab.txt
inference.pdiparams
inference.pdiparams.info


In [None]:
!cp /tmp/inference.* model/
!cp /tmp/model_best/* model/
!cp model/code/requirements_gpu.txt model/code/requirements.txt
!cd model && tar -czvf ../model-inference-gpu.tar.gz *

code/
code/requirements.txt
code/infer_gpu_shulex.py
code/infer.py
code/uie_predictor.py
code/infer_cpu.py
code/model.py
code/requirements_cpu.txt
code/requirements_gpu.txt
code/infer_gpu.py
inference.pdiparams
inference.pdiparams.info
inference.pdmodel
model_config.json
model_state.pdparams


In [None]:
!aws s3 cp model-inference-gpu.tar.gz s3://$bucket/output/model-inference-gpu.tar.gz

In [None]:
instance_type = 'ml.g4dn.xlarge'

# predictor = estimator.deploy(initial_instance_count=1, instance_type=instance_type)

from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(model_data='s3://{}/output/model-inference-gpu.tar.gz'.format(bucket), role=role,
                             entry_point='infer_gpu_shulex.py', framework_version='1.9.0', py_version='py38', model_server_workers=4)  # TODO [For GPU], model_server_workers=6

predictor = pytorch_model.deploy(instance_type=instance_type, initial_instance_count=1)

# Invoking the endpoint

In [50]:
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import JSONDeserializer

predictor.serializer = JSONSerializer()
predictor.deserializer = JSONDeserializer()

In [55]:
texts = ["Funtasma by Pleaser Women's Gogo-300 Boot\n100% Synthetic  \n Manmade sole  \n Shaft measures approximately 16 1/2\" from arch  \n Heel measures approximately 3\"  \n Boot opening measures approximately 13\" around  \n Retro knee-high boot featuring square toe and block heel"]
import time
start = time.time()
outputs = predictor.predict(texts)
end = time.time()
print('outputs: ', outputs)
print('time:', end-start)

outputs:  [{'Shoe Type': [{'text': 'Retro knee-high boot', 'start': 210, 'end': 230, 'probability': 0.9831036329269409}], 'Shoe Heel Height': [{'text': 'approximately 13"', 'start': 182, 'end': 199, 'probability': 0.44407209753990173}], 'Shoe Heel Type': [{'text': 'block heel', 'start': 256, 'end': 266, 'probability': 0.45025530457496643}], 'Shoe Toe Style': [{'text': 'square toe', 'start': 241, 'end': 251, 'probability': 0.9709370732307434}]}]
time: 0.22002243995666504


In [62]:
label = [{"id": 40097, "start_offset": 28, "end_offset": 41, "label": "Shoe Type"}, {"id": 40098, "start_offset": 60, "end_offset": 74, "label": "Shoe Pattern"}, {"id": 40099, "start_offset": 126, "end_offset": 158, "label": "Shoe Heel Height"}, {"id": 40100, "start_offset": 210, "end_offset": 230, "label": "Shoe Type"}, {"id": 40101, "start_offset": 231, "end_offset": 251, "label": "Shoe Toe Style"}, {"id": 40102, "start_offset": 256, "end_offset": 266, "label": "Shoe Heel Type"}]

In [73]:
ls = ['Shoe Type',
     'Shoe Heel Height',
     'Shoe Pattern',
     'Shoe Heel Type',
     'Shoe Toe Style']

true = []
for i in ls:
    for j in label:
        if j['label']==i:
            true.append({'type':i,'text':texts[0][int(j['start_offset']):int(j['end_offset'])],'start':j['start_offset'],'end':j['end_offset']})
            #true.append({'start':j['start_offset'],'end':j['end_offset']})

In [74]:
true

[{'type': 'Shoe Type', 'text': 'Gogo-300 Boot', 'start': 28, 'end': 41},
 {'type': 'Shoe Type',
  'text': 'Retro knee-high boot',
  'start': 210,
  'end': 230},
 {'type': 'Shoe Heel Height',
  'text': 'Heel measures approximately 3"  ',
  'start': 126,
  'end': 158},
 {'type': 'Shoe Pattern', 'text': 'Manmade sole  ', 'start': 60, 'end': 74},
 {'type': 'Shoe Heel Type', 'text': 'block heel', 'start': 256, 'end': 266},
 {'type': 'Shoe Toe Style',
  'text': 'featuring square toe',
  'start': 231,
  'end': 251}]

In [77]:
# compare 

predict = []
for i in ls:
    try:
        predict.append(outputs[0][i]) 
    except:
        predict.append("")


In [78]:
predict

[[{'text': 'Retro knee-high boot',
   'start': 210,
   'end': 230,
   'probability': 0.9831036329269409}],
 [{'text': 'approximately 13"',
   'start': 182,
   'end': 199,
   'probability': 0.44407209753990173}],
 '',
 [{'text': 'block heel',
   'start': 256,
   'end': 266,
   'probability': 0.45025530457496643}],
 [{'text': 'square toe',
   'start': 241,
   'end': 251,
   'probability': 0.9709370732307434}]]

# Clean-up

Deleting the local endpoint when you're finished is important since you can only run one local endpoint at a time.

In [None]:
# estimator.delete_endpoint()
predictor.delete_endpoint()

In [None]:
x = "I wipe whatever tears had trickled down my face, removing my rings from my fingers and clutching them in my hands.\nThe hallway seems longer than normal but I walk briskly to the office where I find Christian, the elders, the lawyer, Jordan, Derek and Vanessa waiting for me."