# Deploy a pretrained PyTorch BERT model from HuggingFace on Amazon SageMaker with Neuron container

## Overview

In this tutotial we will deploy on SageMaker a pretraine BERT Base model from HuggingFace Transformers, using the [AWS Deep Learning Containers](https://github.com/aws/deep-learning-containers). We will use the same model as shown in the [Neuron Tutorial "PyTorch - HuggingFace Pretrained BERT Tutorial"](../../../../frameworks/torch/torch-neuronx/tutorials/training/bert.html#). We will compile the model and build a custom AWS Deep Learning Container, to include the HuggingFace Transformers Library. 

This Jupyter Notebook should run on a ml.c5.4xlarge SageMaker Notebook instance. You can set up your SageMaker Notebook instance by following the [Get Started with Amazon SageMaker Notebook Instances](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-console.html) documentation. 

> We recommend increasing the size of the base root volume of you SM notebook instance, to accomodate the models and containers built locally. A root volume of 10Gb should suffice. 


## Install Dependencies:

This tutorial requires the following pip packages:

- torch-neuron
- neuron-cc[tensorflow]
- transformers

In [1]:
!pip install --upgrade --no-cache-dir torch-neuron neuron-cc[tensorflow] torchvision torch --extra-index-url=https://pip.repos.neuron.amazonaws.com
!pip install --upgrade --no-cache-dir 'transformers==4.6.0'

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com, https://pip.repos.neuron.amazonaws.com
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


## Compile the model into an AWS Neuron optimized TorchScript

In [2]:
import torch
import torch_neuron

from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

In [3]:
!pip install torch torch-scatter

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


In [4]:
from transformers import TapasTokenizer, TapasForQuestionAnswering
import pandas as pd
import torch


from transformers import AutoTokenizer, AutoModel

#tokenizer = AutoTokenizer.from_pretrained("google/tapas-base")
#model = AutoModel.from_pretrained("google/tapas-base")



# model_name = "google/tapas-base-finetuned-wtq"
# model_name = "google/tapas-small-finetuned-wtq"
model_name = "google/tapas-mini"
model = TapasForQuestionAnswering.from_pretrained(model_name)
tokenizer = TapasTokenizer.from_pretrained(model_name)
#tokenizer = AutoTokenizer.from_pretrained(model_name)

data = {"Actors": ["Brad Pitt", "Leonardo Di Caprio", "George Clooney"], "Number of movies": ["87", "53", "69"]}
queries = [
    "What is the name of the first actor?",
    "How many movies has George Clooney played in?",
    "What is the total number of movies?",
]
table = pd.DataFrame.from_dict(data)
inputs = tokenizer(table=table, queries=queries, padding="max_length", return_tensors="pt")
outputs = model(**inputs)
predicted_answer_coordinates, predicted_aggregation_indices = tokenizer.convert_logits_to_predictions(
    inputs, outputs.logits.detach(), outputs.logits_aggregation.detach()
)

"""
# let's print out the results:
id2aggregation = {0: "NONE", 1: "SUM", 2: "AVERAGE", 3: "COUNT"}
aggregation_predictions_string = [id2aggregation[x] for x in predicted_aggregation_indices]

answers = []
for coordinates in predicted_answer_coordinates:
    if len(coordinates) == 1:
        # only a single cell:
        answers.append(table.iat[coordinates[0]])
    else:
        # multiple cells
        cell_values = []
        for coordinate in coordinates:
            cell_values.append(table.iat[coordinate])
        answers.append(", ".join(cell_values))
"""




# Build tokenizer and model
# tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
# model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc", return_dict=False)

# Setup some example inputs
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

max_length=128
#paraphrase = tokenizer.encode_plus(sequence_0, sequence_2, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")
#not_paraphrase = tokenizer.encode_plus(sequence_0, sequence_1, max_length=max_length, padding='max_length', truncation=True, return_tensors="pt")

# Run the original PyTorch model on compilation exaple
#paraphrase_classification_logits = model(**paraphrase)[0]

# Convert example inputs to a format that is compatible with TorchScript tracing
#example_inputs_paraphrase = paraphrase['input_ids'], paraphrase['attention_mask'], paraphrase['token_type_ids']
#example_inputs_not_paraphrase = not_paraphrase['input_ids'], not_paraphrase['attention_mask'], not_paraphrase['token_type_ids']


example_inputs = inputs['input_ids'], inputs['attention_mask'], inputs['token_type_ids']

Downloading:   0%|          | 0.00/1.66k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/117M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/262k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/154 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/490 [00:00<?, ?B/s]

In [None]:
# Run torch.neuron.trace to generate a TorchScript that is optimized by AWS Neuron
# This step may need 3-5 min
# model_neuron = torch.neuron.trace(model, example_inputs_paraphrase, verbose=1, compiler_workdir='./compilation_artifacts')
# model_neuron = torch.neuron.trace(model, example_inputs, verbose=1, compiler_workdir='./compilation_artifacts')
model_neuron = torch.neuron.trace(model, example_inputs, verbose=1, compiler_workdir='./compilation_artifacts', strict=False)

  self.indices = torch.as_tensor(indices)
  self.num_segments = torch.as_tensor(num_segments, device=indices.device)
  batch_size = torch.prod(torch.tensor(list(index.batch_shape())))
  batch_size = torch.prod(torch.tensor(list(index.batch_shape())))
  [torch.as_tensor([-1], dtype=torch.long), torch.as_tensor(vector_shape, dtype=torch.long)], dim=0
  flat_values = values.reshape(flattened_shape.tolist())
  torch.as_tensor(index.batch_shape(), dtype=torch.long),
  torch.as_tensor(index.batch_shape(), dtype=torch.long),
  torch.as_tensor([index.num_segments], dtype=torch.long),
  torch.as_tensor([index.num_segments], dtype=torch.long),
  torch.as_tensor(vector_shape, dtype=torch.long),
  output_values = segment_means.view(new_shape.tolist())
  batch_shape, dtype=torch.long
  batch_shape, dtype=torch.long
  num_segments = torch.as_tensor(num_segments)  # create a rank 0 tensor (scalar) containing num_segments (e.g. 64)
  new_shape = [int(x) for x in new_tensor.tolist()]
  multiples = torc

06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=8 blocks=1 instructions=8
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:07:41 2023
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:07:41 2023
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: Total count: 26
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: Save: 13
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: TensorCopy: 8
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: Memset: 2
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: Load: 1
06/05/2023 06:07:41 AM INFO [WalrusDriver.0]: ru_maxrss:  1326mb (delta=0mb)
06/05/2023 06:07:41 AM INFO

INFO:Neuron:Compiling function _NeuronGraph$294 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/1/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/1/graph_def.neff --io-config {"inputs": {}, "outputs": ["TapasModel_7/TapasEmbeddings_27/prim_Constant/Const:0"]} --verbose 1'
06/05/2023 06:07:42 AM INFO 15830 [root]: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/1/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_t

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


INFO:Neuron:Compiling function _NeuronGraph$295 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/18/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/18/graph_def.neff --io-config {"inputs": {"tensor.1:0": [[3, 512, 7], "int64"]}, "outputs": ["TapasModel_7/TapasEmbeddings_27/aten_select/Reshape:0"]} --verbose 1'
06/05/2023 06:07:46 AM INFO 15890 [root]: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/18/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-

06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=4
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:07:50 2023
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:07:50 2023
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: Total count: 32
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: Save: 12
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: Load: 12
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: TensorCopy: 6
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 06:07:50 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05

06/05/2023 06:07:50 AM INFO 15890 [job.WalrusDriver.3]: IR signature: e8dcf23fb56f85807fb17204658ea5b58a4101fdef9ddd89a9c12902fd6f65ff for sg00/walrus_bir.out.json
06/05/2023 06:07:50 AM INFO 15890 [job.WalrusDriver.3]: Job finished
06/05/2023 06:07:50 AM INFO 15890 [pipeline.compile.0]: Finished job job.WalrusDriver.3 with state 0
06/05/2023 06:07:50 AM INFO 15890 [pipeline.compile.0]: Starting job job.Backend.3 state state 0
06/05/2023 06:07:50 AM INFO 15890 [job.Backend.3]: Replay this job by calling: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile --framework TENSORFLOW --state '{"model": ["/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/18/graph_def.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "state_dir": "/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/18/sg00", "state_id": "sg00"}' --pipeline Backend

06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=4
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:07:56 2023
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:07:56 2023
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: Total count: 32
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: Save: 12
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: Load: 12
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: TensorCopy: 6
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 06:07:56 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05

06/05/2023 06:07:56 AM INFO 15969 [job.WalrusDriver.3]: IR signature: e8dcf23fb56f85807fb17204658ea5b58a4101fdef9ddd89a9c12902fd6f65ff for sg00/walrus_bir.out.json
06/05/2023 06:07:56 AM INFO 15969 [job.WalrusDriver.3]: Job finished
06/05/2023 06:07:56 AM INFO 15969 [pipeline.compile.0]: Finished job job.WalrusDriver.3 with state 0
06/05/2023 06:07:56 AM INFO 15969 [pipeline.compile.0]: Starting job job.Backend.3 state state 0
06/05/2023 06:07:56 AM INFO 15969 [job.Backend.3]: Replay this job by calling: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile --framework TENSORFLOW --state '{"model": ["/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/24/graph_def.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "state_dir": "/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/24/sg00", "state_id": "sg00"}' --pipeline Backend

06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=4
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:08:02 2023
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:08:02 2023
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: Total count: 32
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: Save: 12
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: Load: 12
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: TensorCopy: 6
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 06:08:02 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05

06/05/2023 06:08:02 AM INFO 16076 [job.Kelper.2]: neuroncc version is 1.15.0.0+eec0c3604, neff version is 1.0 (features 0)
06/05/2023 06:08:02 AM INFO 16076 [job.Kelper.2]: wrote /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/30/graph_def.neff
06/05/2023 06:08:02 AM INFO 16076 [pipeline.compile.0]: Finished job job.Kelper.2 with state 0
06/05/2023 06:08:02 AM INFO 16076 [pipeline.compile.0]: Finished pipeline compile
06/05/2023 06:08:02 AM INFO 16076 [pipeline.compile.0]: Job finished
06/05/2023 06:08:02 AM INFO 16076 [pipeline.custom.0]: Finished job pipeline.compile.0 with state 0
06/05/2023 06:08:02 AM INFO 16076 [pipeline.custom.0]: Starting job job.SaveTemps.0 state state 0
06/05/2023 06:08:02 AM INFO 16076 [pipeline.custom.0]: Finished job job.SaveTemps.0 with state 0
06/05/2023 06:08:02 AM INFO 16076 [pipeline.custom.0]: Finished pipeline custom
06/05/2023 06:08:02 AM INFO 16076 [pipeline.custom.0]: Job finished
06/

06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=4
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:08:08 2023
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:08:08 2023
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: Total count: 32
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: Save: 12
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: Load: 12
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: TensorCopy: 6
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 06:08:08 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05

06/05/2023 06:08:08 AM INFO 16190 [pipeline.compile.0]: Finished job job.WalrusDriver.3 with state 0
06/05/2023 06:08:08 AM INFO 16190 [pipeline.compile.0]: Starting job job.Backend.3 state state 0
06/05/2023 06:08:08 AM INFO 16190 [job.Backend.3]: Replay this job by calling: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile --framework TENSORFLOW --state '{"model": ["/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/36/graph_def.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "state_dir": "/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/36/sg00", "state_id": "sg00"}' --pipeline Backend --enable-experimental-bir-backend
06/05/2023 06:08:08 AM INFO 16190 [job.Backend.3]: IR signature: d433b862ba67f9bf6f64445a4903880babb3addc69659ed2c5f788b44fc15eb7 for sg00/wavegraph-bin.json
06/05/2023 06:08:08 AM INFO 16190 [job.B

06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=4
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:08:14 2023
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:08:14 2023
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: Total count: 32
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: Save: 12
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: Load: 12
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: TensorCopy: 6
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 06:08:14 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05

06/05/2023 06:08:14 AM INFO 16279 [job.Kelper.2]: neuroncc version is 1.15.0.0+eec0c3604, neff version is 1.0 (features 0)
06/05/2023 06:08:14 AM INFO 16279 [job.Kelper.2]: wrote /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/42/graph_def.neff
06/05/2023 06:08:14 AM INFO 16279 [pipeline.compile.0]: Finished job job.Kelper.2 with state 0
06/05/2023 06:08:14 AM INFO 16279 [pipeline.compile.0]: Finished pipeline compile
06/05/2023 06:08:14 AM INFO 16279 [pipeline.compile.0]: Job finished
06/05/2023 06:08:14 AM INFO 16279 [pipeline.custom.0]: Finished job pipeline.compile.0 with state 0
06/05/2023 06:08:14 AM INFO 16279 [pipeline.custom.0]: Starting job job.SaveTemps.0 state state 0
06/05/2023 06:08:14 AM INFO 16279 [pipeline.custom.0]: Finished job job.SaveTemps.0 with state 0
06/05/2023 06:08:14 AM INFO 16279 [pipeline.custom.0]: Finished pipeline custom
06/05/2023 06:08:14 AM INFO 16279 [pipeline.custom.0]: Job finished
06/

06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=4
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:08:19 2023
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:08:19 2023
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: Total count: 32
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: Save: 12
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: Load: 12
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: TensorCopy: 6
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 06:08:19 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05

06/05/2023 06:08:19 AM INFO 16360 [job.Kelper.2]: neuroncc version is 1.15.0.0+eec0c3604, neff version is 1.0 (features 0)
06/05/2023 06:08:19 AM INFO 16360 [job.Kelper.2]: wrote /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/48/graph_def.neff
06/05/2023 06:08:19 AM INFO 16360 [pipeline.compile.0]: Finished job job.Kelper.2 with state 0
06/05/2023 06:08:19 AM INFO 16360 [pipeline.compile.0]: Finished pipeline compile
06/05/2023 06:08:19 AM INFO 16360 [pipeline.compile.0]: Job finished
06/05/2023 06:08:19 AM INFO 16360 [pipeline.custom.0]: Finished job pipeline.compile.0 with state 0
06/05/2023 06:08:19 AM INFO 16360 [pipeline.custom.0]: Starting job job.SaveTemps.0 state state 0
06/05/2023 06:08:19 AM INFO 16360 [pipeline.custom.0]: Finished job job.SaveTemps.0 with state 0
06/05/2023 06:08:19 AM INFO 16360 [pipeline.custom.0]: Finished pipeline custom
06/05/2023 06:08:19 AM INFO 16360 [pipeline.custom.0]: Job finished
06/

06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=4
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:08:25 2023
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:08:25 2023
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: Total count: 32
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: Save: 12
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: Load: 12
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: TensorCopy: 6
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 06:08:25 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05

06/05/2023 06:08:25 AM INFO 16446 [job.Kelper.2]: neuroncc version is 1.15.0.0+eec0c3604, neff version is 1.0 (features 0)
06/05/2023 06:08:25 AM INFO 16446 [job.Kelper.2]: wrote /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/54/graph_def.neff
06/05/2023 06:08:25 AM INFO 16446 [pipeline.compile.0]: Finished job job.Kelper.2 with state 0
06/05/2023 06:08:25 AM INFO 16446 [pipeline.compile.0]: Finished pipeline compile
06/05/2023 06:08:25 AM INFO 16446 [pipeline.compile.0]: Job finished
06/05/2023 06:08:25 AM INFO 16446 [pipeline.custom.0]: Finished job pipeline.compile.0 with state 0
06/05/2023 06:08:25 AM INFO 16446 [pipeline.custom.0]: Starting job job.SaveTemps.0 state state 0
06/05/2023 06:08:25 AM INFO 16446 [pipeline.custom.0]: Finished job job.SaveTemps.0 with state 0
06/05/2023 06:08:25 AM INFO 16446 [pipeline.custom.0]: Finished pipeline custom
06/05/2023 06:08:25 AM INFO 16446 [pipeline.custom.0]: Job finished
06/

06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=16 blocks=1 instructions=11
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:08:52 2023
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:08:52 2023
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: Total count: 117
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: TensorScalarPtr: 48
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: Load: 25
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: Save: 24
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: TensorCopy: 14
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: TensorScalar: 6
06/05/2023 06:08:52 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 0

06/05/2023 06:08:52 AM INFO 16835 [job.WalrusDriver.3]: IR signature: a4b1cd8fd268be95191102734e3951c191c924e87caec6ac7949ff90ba68e7de for sg00/walrus_bir.out.json
06/05/2023 06:08:52 AM INFO 16835 [job.WalrusDriver.3]: Job finished
06/05/2023 06:08:52 AM INFO 16835 [pipeline.compile.0]: Finished job job.WalrusDriver.3 with state 0
06/05/2023 06:08:52 AM INFO 16835 [pipeline.compile.0]: Starting job job.Backend.3 state state 0
06/05/2023 06:08:52 AM INFO 16835 [job.Backend.3]: Replay this job by calling: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile --framework TENSORFLOW --state '{"model": ["/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/100/graph_def.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "state_dir": "/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/100/sg00", "state_id": "sg00"}' --pipeline Backe

06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=10 blocks=1 instructions=5
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:08:57 2023
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:08:57 2023
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: Total count: 3872
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: Save: 1548
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: Load: 1548
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: TensorCopy: 582
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: TensorScalar: 194
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: ru_maxrss:  1386mb (delta=0mb)
06/05/2023 06:08:57 AM INFO [WalrusDriver.0]: Walrus pass: unroll succe

Analyzing dependencies of sg00/Block1
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************


06/05/2023 06:08:58 AM INFO [Stargazer.0]: [Sailfish] Data race analysis found no races, run time: 0:00:00
06/05/2023 06:08:58 AM INFO [Stargazer.0]: [Sailfish] Remove redundant edges
06/05/2023 06:08:58 AM INFO [Stargazer.0]: Data race checker engines
06/05/2023 06:08:58 AM INFO [Stargazer.0]: Transitive reduction start 
06/05/2023 06:08:58 AM INFO [Stargazer.0]: Transitive reduction removed 2 redundant edges, time: 0:00:00
06/05/2023 06:08:58 AM INFO [Stargazer.0]: Sync Critical Load Chains Start
06/05/2023 06:08:58 AM INFO [Stargazer.0]: Sync Critical Load Chains added 0 new Load-2-Load syncs
06/05/2023 06:08:58 AM INFO [Stargazer.0]: Sync Critical Load Chains Done.0:00:00
06/05/2023 06:08:58 AM INFO [Stargazer.0]: Out wavegraph bin file is wavegraph-bin.json
06/05/2023 06:08:58 AM INFO [Stargazer.0]: Writing NN JSON to file 'wavegraph-bin.json'
06/05/2023 06:08:59 AM INFO [Stargazer.0]: Virtual memory peak = 1687916 K bytes
06/05/2023 06:08:59 AM INFO [Stargazer.0]: PASSED - Total 

06/05/2023 06:08:59 AM INFO 16935 [job.WalrusDriver.3]: IR signature: 8d028021dc438232e84a544fcb9dfb70718ec7f96cacb52227c22bdcc522c7c5 for sg00/walrus_bir.out.json
06/05/2023 06:08:59 AM INFO 16935 [job.WalrusDriver.3]: Job finished
06/05/2023 06:08:59 AM INFO 16935 [pipeline.compile.0]: Finished job job.WalrusDriver.3 with state 0
06/05/2023 06:08:59 AM INFO 16935 [pipeline.compile.0]: Starting job job.Backend.3 state state 0
06/05/2023 06:08:59 AM INFO 16935 [job.Backend.3]: Replay this job by calling: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile --framework TENSORFLOW --state '{"model": ["/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/102/graph_def.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "state_dir": "/home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/102/sg00", "state_id": "sg00"}' --pipeline Backe

06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=10 blocks=1 instructions=6
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:09:05 2023
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:09:05 2023
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: Total count: 61
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: TensorScalarPtr: 24
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: Load: 13
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: Save: 12
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: TensorCopy: 8
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: TensorScalar: 2
06/05/2023 06:09:05 AM INFO [WalrusDriver.0]: Memset: 2
06/05/2023 06:09:05 AM INFO [WalrusD

INFO:Neuron:Compiling function _NeuronGraph$311 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/106/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/106/graph_def.neff --io-config {"inputs": {"tensor.1:0": [[3, 512], "int64"], "1:0": [[3, 512, 512], "float32"], "2:0": [[3, 512, 512], "float32"], "3:0": [[3, 512, 512], "float32"], "4:0": [[3, 512, 512], "float32"], "5:0": [[3, 512, 512], "float32"], "6:0": [[3, 512, 512], "float32"], "7:0": [[3, 512, 512], "float32"], "8:0": [[3, 512, 512], "float32"], "9:0": [[3, 512, 512], "float32"], "tensor.9:0": [[3, 512, 7], "int64"], "tensor.25:0": [[], "int64"], "tensor.39:0": [[], "int64"], "tensor.59:0": [[], "int

06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=2
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:09:22 2023
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:09:22 2023
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: Total count: 151
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: Shuffle: 96
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: TensorCopy: 48
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: Save: 6
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: Load: 1
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: ru_maxrss:  2756mb (delta=0mb)
06/05/2023 06:09:22 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05/202

INFO:Neuron:Compiling function _NeuronGraph$313 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/114/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/114/graph_def.neff --io-config {"inputs": {"0:0": [[6144], "float32"]}, "outputs": ["aten_zeros/zeros:0"]} --verbose 1'
06/05/2023 06:09:23 AM INFO 17341 [root]: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/114/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tuto

06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=2
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:09:26 2023
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:09:26 2023
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: Total count: 151
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: Shuffle: 96
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: TensorCopy: 48
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: Save: 6
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: Load: 1
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: ru_maxrss:  2756mb (delta=0mb)
06/05/2023 06:09:26 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05/202

INFO:Neuron:Compiling function _NeuronGraph$314 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/121/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/121/graph_def.neff --io-config {"inputs": {"0:0": [[6144], "float32"]}, "outputs": ["aten_zeros/zeros:0"]} --verbose 1'
06/05/2023 06:09:28 AM INFO 17416 [root]: /home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/121/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tuto

06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=5 blocks=1 instructions=2
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:09:31 2023
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:09:31 2023
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: Total count: 151
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: Shuffle: 96
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: TensorCopy: 48
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: Save: 6
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: Load: 1
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: ru_maxrss:  2756mb (delta=0mb)
06/05/2023 06:09:31 AM INFO [WalrusDriver.0]: Walrus pass: unroll succeeded!
06/05/202

INFO:Neuron:Compiling function _NeuronGraph$315 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/amazonei_pytorch_latest_p37/bin/neuron-cc compile /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/127/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /home/ec2-user/SageMaker/scrub/aws-neuron-sdk/src/examples/pytorch/byoc_sm_bert_tutorial/compilation_artifacts/127/graph_def.neff --io-config {"inputs": {"0:0": [[6144], "float32"], "1:0": [[6144], "float32"], "2:0": [[6144], "float32"], "3:0": [[6144], "float32"], "tensor.1:0": [[3, 2048], "int64"], "tensor.7:0": [[], "int64"], "tensor.9:0": [[], "int64"], "tensor.23:0": [[], "int64"]}, "outputs": ["aten_view/Reshape:0", "aten_reshape/Reshape:0", "aten_expand_2/Cast_1:0", "aten_zeros/zeros:0", "Identity:0", "aten_reshape_1/Reshape:0", "aten_expand_3/Cast_1:0", "aten_zeros_1/zeros:0"]} --verbose 1'
06/05/2023 06:09:33 

06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: max_allowed_parallelism=4
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: Running walrus pass: unroll
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: Input to unroll: modules=1 functions=1 allocs=26 blocks=1 instructions=20
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: INFO (Unroll) Start unrolling at Mon Jun  5 06:09:46 2023
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: INFO (Unroll) DONE unrolling Mon Jun  5 06:09:46 2023
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: Instruction count after Unroll: 
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: Total count: 42
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: Matmult: 25
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: TensorCopy: 8
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: Load: 6
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: Save: 2
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: TensorScalarPtr: 1
06/05/2023 06:09:46 AM INFO [WalrusDriver.0]: ru_maxrss:  2756mb (delta=0mb)
06/05/2023 06:09:46 A

INFO:Neuron:Number of arithmetic operators (post-compilation) before = 582, compiled = 86, percent compiled = 14.78%
INFO:Neuron:The neuron partitioner created 25 sub-graphs
INFO:Neuron:Neuron successfully compiled 15 sub-graphs, Total fused subgraphs = 25, Percent of model sub-graphs successfully compiled = 60.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 11
INFO:Neuron: => aten::ScalarImplicit: 3
INFO:Neuron: => aten::add: 2
INFO:Neuron: => aten::arange: 3
INFO:Neuron: => aten::expand: 1
INFO:Neuron: => aten::linear: 1
INFO:Neuron: => aten::min: 1
INFO:Neuron: => aten::mul: 3
INFO:Neuron: => aten::reshape: 1
INFO:Neuron: => aten::select: 9
INFO:Neuron: => aten::size: 11
INFO:Neuron: => aten::slice: 18
INFO:Neuron: => aten::sub: 1
INFO:Neuron: => aten::to: 9
INFO:Neuron: => aten::unsqueeze: 3
INFO:Neuron: => aten::view: 6
INFO:Neuron: => aten::zeros: 3
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron:

You may inspect **model_neuron.graph** to see which part is running on CPU versus running on the accelerator. All native **aten** operators in the graph will be running on CPU.

In [None]:
# See  which part is running on CPU versus running on the accelerator.
print(model_neuron.graph)

Save the compiled model, so it can be packaged and sent to S3.

In [None]:
# Save the TorchScript for later use
model_neuron.save('neuron_compiled_model.pt')

### Package the pre-trained model and upload it to S3

To make the model available for the SageMaker deployment, you will TAR the serialized graph and upload it to the default Amazon S3 bucket for your SageMaker session. 

In [None]:
# Now you'll create a model.tar.gz file to be used by SageMaker endpoint
!tar -czvf model.tar.gz neuron_compiled_model.pt

In [None]:
import boto3
import time
from sagemaker.utils import name_from_base
import sagemaker

In [None]:
# upload model to S3
role = sagemaker.get_execution_role()
sess=sagemaker.Session()
region=sess.boto_region_name
bucket=sess.default_bucket()
sm_client=boto3.client('sagemaker')

In [None]:
model_key = '{}/model/model.tar.gz'.format('inf1_compiled_model')
model_path = 's3://{}/{}'.format(bucket, model_key)
boto3.resource('s3').Bucket(bucket).upload_file('model.tar.gz', model_key)
print("Uploaded model to S3:")
print(model_path)

## Build and Push the container

The following shell code shows how to build the container image using docker build and push the container image to ECR using docker push.
The Dockerfile in this example is available in the ***container*** folder.
Here's an example of the Dockerfile:

```Dockerfile
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference-neuron:1.7.1-neuron-py36-ubuntu18.04

# Install packages 
RUN pip install "transformers==4.7.0"
```

In [None]:
!cat container/Dockerfile

Before running the next cell, make sure your SageMaker IAM role has access to ECR. If not, you can attache the role `AmazonEC2ContainerRegistryPowerUser` to your IAM role ARN, which allows you to upload image layers to ECR.  

It takes 5 minutes to build docker images and upload image to ECR

In [None]:
%%sh

# The name of our algorithm
algorithm_name=neuron-py36-inference

cd container

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR in order to pull down the SageMaker PyTorch image
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com
# Build the docker image locally with the image name and then push it to ECR
# with the full name.
docker build  -t ${algorithm_name} . --build-arg REGION=${region}
docker tag ${algorithm_name} ${fullname}

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region} | docker login --username AWS --password-stdin ${account}.dkr.ecr.${region}.amazonaws.com
docker push ${fullname}

## Deploy Container and run inference based on the pretrained model

To deploy a pretrained PyTorch model, you'll need to use the PyTorch estimator object to create a PyTorchModel object and set a different entry_point.

You'll use the PyTorchModel object to deploy a PyTorchPredictor. This creates a SageMaker Endpoint -- a hosted prediction service that we can use to perform inference.

In [None]:
import sys

!{sys.executable} -m pip install Transformers

In [None]:
import os
import boto3
import sagemaker

role = sagemaker.get_execution_role()
sess = sagemaker.Session()

bucket = sess.default_bucket()
prefix = "inf1_compiled_model/model"

# Get container name in ECR
client=boto3.client('sts')
account=client.get_caller_identity()['Account']

my_session=boto3.session.Session()
region=my_session.region_name

algorithm_name="neuron-py36-inference"
ecr_image='{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, algorithm_name)
print(ecr_image)

An implementation of *model_fn* is required for inference script.
We are going to implement our own **model_fn** and **predict_fn** for Hugging Face Bert, and use default implementations of **input_fn** and **output_fn** defined in sagemaker-pytorch-containers.

In this example, the inference script is put in ***code*** folder. Run the next cell to see it:


In [None]:
!pygmentize code/inference.py

Path of compiled pretrained model in S3:

In [None]:
key = os.path.join(prefix, "model.tar.gz")
pretrained_model_data = "s3://{}/{}".format(bucket, key)
print(pretrained_model_data)

The model object is defined by using the SageMaker Python SDK's PyTorchModel and pass in the model from the estimator and the entry_point. The endpoint's entry point for inference is defined by model_fn as seen in the previous code block that prints out **inference.py**. The model_fn function will load the model and required tokenizer.

Note, **image_uri** must be user's own ECR images.

In [None]:
from sagemaker.pytorch.model import PyTorchModel

pytorch_model = PyTorchModel(
    model_data=pretrained_model_data,
    role=role,
    source_dir="code",
    framework_version="1.7.1",
    entry_point="inference.py",
    image_uri=ecr_image
)

# Let SageMaker know that we've already compiled the model via neuron-cc
pytorch_model._is_compiled_model = True

The arguments to the deploy function allow us to set the number and type of instances that will be used for the Endpoint.

Here you will deploy the model to a single **ml.inf1.2xlarge** instance.
It may take 6-10 min to deploy.

In [None]:
%%time

predictor = pytorch_model.deploy(initial_instance_count=1, instance_type="ml.inf1.2xlarge")

In [None]:
print(predictor.endpoint_name)

Since in the input_fn we declared that the incoming requests are json-encoded, we need to use a json serializer, to encode the incoming data into a json string. Also, we declared the return content type to be json string, we Need to use a json deserializer to parse the response.

In [None]:
predictor.serializer = sagemaker.serializers.JSONSerializer()
predictor.deserializer = sagemaker.deserializers.JSONDeserializer()

Using a list of sentences, now SageMaker endpoint is invoked to get predictions.

In [None]:
%%time
result = predictor.predict(
    [
        "Never allow the same bug to bite you twice.",
        "The best part of Amazon SageMaker is that it makes machine learning easy.",
    ]
)
print(result)

In [None]:
%%time
result = predictor.predict(
    [
        "The company HuggingFace is based in New York City",
        "HuggingFace's headquarters are situated in Manhattan",
    ]
)
print(result)

## Benchmarking your endpoint

The following cells create a load test for your endpoint. You first define some helper functions: `inference_latency` runs the endpoint request, collects cliend side latency and any errors, `random_sentence` builds random to be sent to the endpoint.  

In [None]:
import numpy as np 
import datetime
import math
import time
import boto3   
import matplotlib.pyplot as plt
from joblib import Parallel, delayed
import numpy as np
from tqdm import tqdm
import random

In [None]:
def inference_latency(model,*inputs):
    """
    infetence_time is a simple method to return the latency of a model inference.

        Parameters:
            model: torch model onbject loaded using torch.jit.load
            inputs: model() args

        Returns:
            latency in seconds
    """
    error = False
    start = time.time()
    try:
        results = model(*inputs)
    except:
        error = True
        results = []
    return {'latency':time.time() - start, 'error': error, 'result': results}

In [None]:
def random_sentence():
    
    s_nouns = ["A dude", "My mom", "The king", "Some guy", "A cat with rabies", "A sloth", "Your homie", "This cool guy my gardener met yesterday", "Superman"]
    p_nouns = ["These dudes", "Both of my moms", "All the kings of the world", "Some guys", "All of a cattery's cats", "The multitude of sloths living under your bed", "Your homies", "Like, these, like, all these people", "Supermen"]
    s_verbs = ["eats", "kicks", "gives", "treats", "meets with", "creates", "hacks", "configures", "spies on", "retards", "meows on", "flees from", "tries to automate", "explodes"]
    p_verbs = ["eat", "kick", "give", "treat", "meet with", "create", "hack", "configure", "spy on", "retard", "meow on", "flee from", "try to automate", "explode"]
    infinitives = ["to make a pie.", "for no apparent reason.", "because the sky is green.", "for a disease.", "to be able to make toast explode.", "to know more about archeology."]
    
    return (random.choice(s_nouns) + ' ' + random.choice(s_verbs) + ' ' + random.choice(s_nouns).lower() or random.choice(p_nouns).lower() + ' ' + random.choice(infinitives))

print([random_sentence(), random_sentence()])

The following cell creates `number_of_clients` concurrent threads to run `number_of_runs` requests. Once completed, a `boto3` CloudWatch client will query for the server side latency metrics for comparison.   

In [None]:
# Defining Auxiliary variables
number_of_clients = 2
number_of_runs = 1000
t = tqdm(range(number_of_runs),position=0, leave=True)

# Starting parallel clients
cw_start = datetime.datetime.utcnow()

results = Parallel(n_jobs=number_of_clients,prefer="threads")(delayed(inference_latency)(predictor.predict,[random_sentence(), random_sentence()]) for mod in t)
avg_throughput = t.total/t.format_dict['elapsed']

cw_end = datetime.datetime.utcnow() 

# Computing metrics and print
latencies = [res['latency'] for res in results]
errors = [res['error'] for res in results]
error_p = sum(errors)/len(errors) *100
p50 = np.quantile(latencies[-1000:],0.50) * 1000
p90 = np.quantile(latencies[-1000:],0.95) * 1000
p95 = np.quantile(latencies[-1000:],0.99) * 1000

print(f'Avg Throughput: :{avg_throughput:.1f}\n')
print(f'50th Percentile Latency:{p50:.1f} ms')
print(f'90th Percentile Latency:{p90:.1f} ms')
print(f'95th Percentile Latency:{p95:.1f} ms\n')
print(f'Errors percentage: {error_p:.1f} %\n')

# Querying CloudWatch
print('Getting Cloudwatch:')
cloudwatch = boto3.client('cloudwatch')
statistics=['SampleCount', 'Average', 'Minimum', 'Maximum']
extended=['p50', 'p90', 'p95', 'p100']

# Give 5 minute buffer to end
cw_end += datetime.timedelta(minutes=5)

# Period must be 1, 5, 10, 30, or multiple of 60
# Calculate closest multiple of 60 to the total elapsed time
factor = math.ceil((cw_end - cw_start).total_seconds() / 60)
period = factor * 60
print('Time elapsed: {} seconds'.format((cw_end - cw_start).total_seconds()))
print('Using period of {} seconds\n'.format(period))

cloudwatch_ready = False
# Keep polling CloudWatch metrics until datapoints are available
while not cloudwatch_ready:
  time.sleep(30)
  print('Waiting 30 seconds ...')
  # Must use default units of microseconds
  model_latency_metrics = cloudwatch.get_metric_statistics(MetricName='ModelLatency',
                                             Dimensions=[{'Name': 'EndpointName',
                                                          'Value': predictor.endpoint_name},
                                                         {'Name': 'VariantName',
                                                          'Value': "AllTraffic"}],
                                             Namespace="AWS/SageMaker",
                                             StartTime=cw_start,
                                             EndTime=cw_end,
                                             Period=period,
                                             Statistics=statistics,
                                             ExtendedStatistics=extended
                                             )
  # Should be 1000
  if len(model_latency_metrics['Datapoints']) > 0:
    print('{} latency datapoints ready'.format(model_latency_metrics['Datapoints'][0]['SampleCount']))
    side_avg = model_latency_metrics['Datapoints'][0]['Average'] / number_of_runs
    side_p50 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p50'] / number_of_runs
    side_p90 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p90'] / number_of_runs
    side_p95 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p95'] / number_of_runs
    side_p100 = model_latency_metrics['Datapoints'][0]['ExtendedStatistics']['p100'] / number_of_runs
    
    print(f'50th Percentile Latency:{side_p50:.1f} ms')
    print(f'90th Percentile Latency:{side_p90:.1f} ms')
    print(f'95th Percentile Latency:{side_p95:.1f} ms\n')

    cloudwatch_ready = True




### Cleanup
Endpoints should be deleted when no longer in use, to avoid costs.

In [None]:
predictor.delete_endpoint(predictor.endpoint)