# FAIRSeq in Amazon SageMaker: Translation task - German to English - Distributed / multi machine training

The Facebook AI Research (FAIR) Lab made available through the [FAIRSeq toolkit](https://github.com/pytorch/fairseq) their state-of-the-art Sequence to Sequence models. 

In this notebook, we will show you how to train a German to English translation model using a fully convolutional architecture on multiple GPUs and machines.

## Permissions

Running this notebook requires permissions in addition to the regular SageMakerFullAccess permissions. This is because it creates new repositories in Amazon ECR. The easiest way to add these permissions is simply to add the managed policy AmazonEC2ContainerRegistryFullAccess to the role that you used to start your notebook instance. There's no need to restart your notebook instance when you do this, the new permissions will be available immediately.

## Prepare dataset

To train the model, we will be using the IWSLT'14 dataset as descibed [here](https://github.com/pytorch/fairseq/tree/master/examples/translation#prepare-iwslt14sh). This was used in the IWSLT'14 German to English translation task: ["Report on the 11th IWSLT evaluation campaign" by Cettolo et al](http://workshop2014.iwslt.org/downloads/proceeding.pdf).

First, we'll download the dataset and start the pre-processing. Among other steps, this pre-processing cleans the tokens and applys BPE encoding as you can see [here](https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-iwslt14.sh).

In [1]:
%%sh
cd data
chmod +x prepare-iwslt14.sh

# Download dataset and start pre-processing
./prepare-iwslt14.sh

Cloning Moses github repository (for tokenization scripts)...
Cloning Subword NMT repository (for BPE pre-processing)...
Downloading data from https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz...
Data successfully downloaded.
de-en/
de-en/IWSLT14.TED.dev2010.de-en.de.xml
de-en/IWSLT14.TED.dev2010.de-en.en.xml
de-en/IWSLT14.TED.tst2010.de-en.de.xml
de-en/IWSLT14.TED.tst2010.de-en.en.xml
de-en/IWSLT14.TED.tst2011.de-en.de.xml
de-en/IWSLT14.TED.tst2011.de-en.en.xml
de-en/IWSLT14.TED.tst2012.de-en.de.xml
de-en/IWSLT14.TED.tst2012.de-en.en.xml
de-en/IWSLT14.TEDX.dev2012.de-en.de.xml
de-en/IWSLT14.TEDX.dev2012.de-en.en.xml
de-en/README
de-en/train.en
de-en/train.tags.de-en.de
de-en/train.tags.de-en.en
pre-processing train data...


pre-processing valid/test data...
orig/de-en/IWSLT14.TED.dev2010.de-en.de.xml iwslt14.tokenized.de-en/tmp/IWSLT14.TED.dev2010.de-en.de

orig/de-en/IWSLT14.TED.tst2010.de-en.de.xml iwslt14.tokenized.de-en/tmp/IWSLT14.TED.tst2010.de-en.de

orig/de-en/IWSLT14

Cloning into 'mosesdecoder'...
Cloning into 'subword-nmt'...
--2019-06-28 21:12:43--  https://wit3.fbk.eu/archive/2014-01/texts/de/en/de-en.tgz
Resolving wit3.fbk.eu (wit3.fbk.eu)... 217.77.80.8
Connecting to wit3.fbk.eu (wit3.fbk.eu)|217.77.80.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19982877 (19M) [application/x-gzip]
Saving to: ‘de-en.tgz’

     0K .......... .......... .......... .......... ..........  0%  148K 2m11s
    50K .......... .......... .......... .......... ..........  0%  446K 87s
   100K .......... .......... .......... .......... ..........  0%  447K 72s
   150K .......... .......... .......... .......... ..........  1% 99.7M 54s
   200K .......... .......... .......... .......... ..........  1%  448K 52s
   250K .......... .......... .......... .......... ..........  1%  101M 43s
   300K .......... .......... .......... .......... ..........  1% 90.8M 37s
   350K .......... .......... .......... .......... ..........  2%  451K 37s
 

Next step is to apply the second set of pre-processing, which binarizes the dataset based on the source and target language. Full information on this script [here](https://github.com/pytorch/fairseq/blob/master/preprocess.py).  

In [2]:
%%sh

# First we download fairseq in order to have access to the scripts
git clone https://github.com/pytorch/fairseq.git fairseq-git
cd fairseq-git

# Binarize the dataset:
TEXT=../data/iwslt14.tokenized.de-en
python preprocess.py --source-lang de --target-lang en \
  --trainpref $TEXT/train --validpref $TEXT/valid --testpref $TEXT/test \
  --destdir ../data/iwslt14.tokenized.de-en

Namespace(alignfile=None, cpu=False, criterion='cross_entropy', dataset_impl='cached', destdir='../data/iwslt14.tokenized.de-en', fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=1000, lr_scheduler='fixed', memory_efficient_fp16=False, min_loss_scale=0.0001, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer='nag', padding_factor=8, seed=1, source_lang='de', srcdict=None, target_lang='en', task='translation', tbmf_wrapper=False, tensorboard_logdir='', testpref='../data/iwslt14.tokenized.de-en/test', tgtdict=None, threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, trainpref='../data/iwslt14.tokenized.de-en/train', user_dir=None, validpref='../data/iwslt14.tokenized.de-en/valid', workers=1)
| [de] Dictionary: 8847 types
| [de] ../data/iwslt14.tokenized.de-en/train.de: 160239 sents, 4035591 tokens, 0.0% replaced by <unk>
| [de] Dictionary: 8847 types
| [de] ../data

Cloning into 'fairseq-git'...


The dataset is now all prepared for training on one of the FAIRSeq translation models. The next step is upload the data to Amazon S3 in order to make it available for training.

### Upload data to Amazon S3

In [3]:
import sagemaker

sagemaker_session = sagemaker.Session()
region =  sagemaker_session.boto_session.region_name
account = sagemaker_session.boto_session.client('sts').get_caller_identity().get('Account')

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/DEMO-pytorch-fairseq/datasets/iwslt14'

role = sagemaker.get_execution_role()

In [4]:
inputs = sagemaker_session.upload_data(path='data/iwslt14.tokenized.de-en', bucket=bucket, key_prefix=prefix)

Next we need to register a Docker image in Amazon SageMaker that will contain the FAIRSeq code and that will be pulled at training and inference time to perform the respective training of the model and the serving of the precitions. 

## Build FAIRSeq Translation task container

In [11]:
%%sh
chmod +x create_container.sh 

./create_container.sh pytorch-fairseq

Getting from region us-east-1 and account 578276202366
Login Succeeded
Login Succeeded
Sending build context to Docker daemon  560.4MB
Step 1/21 : FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
 ---> 65dee97b9662
Step 2/21 : ARG PYTHON_VERSION=3.6
 ---> Using cache
 ---> dbabb7c39cda
Step 3/21 : RUN apt-get update && apt-get install -y --no-install-recommends          build-essential          cmake          nginx          jq          wget          git          curl          vim          ca-certificates          libjpeg-dev          libpng-dev &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> fcac7a7c9256
Step 4/21 : RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  &&      chmod +x ~/miniconda.sh &&      ~/miniconda.sh -b -p /opt/conda &&      rm ~/miniconda.sh &&      /opt/conda/bin/conda install -y python=$PYTHON_VERSION numpy pyyaml scipy ipython mkl mkl-include cython typing &&      /opt/conda/bin/conda install -y -c pyt

https://docs.docker.com/engine/reference/commandline/login/#credentials-store

https://docs.docker.com/engine/reference/commandline/login/#credentials-store



The FAIRSeq image has been pushed into Amazon ECR, the registry from which Amazon SageMaker will be able to pull that image and launch both training and prediction. 

## Training on Amazon SageMaker



Next we will set the hyper-parameters of the model we want to train. Here we are using the recommended ones from the [FAIRSeq example](https://github.com/pytorch/fairseq/tree/master/examples/translation#prepare-iwslt14sh). The full list of hyper-parameters available for use can be found [here](https://fairseq.readthedocs.io/en/latest/command_line_tools.html). Please note you can use dataset, training, and generation parameters. For the distributed backend, **gloo** is the only supported option and is set as default. 

In [6]:
hyperparameters = {
    "lr": 0.25,    
    "clip-norm": 0.1,
    "dropout": 0.2,
    "max-tokens": 4000,
    "criterion": "label_smoothed_cross_entropy",
    "label-smoothing": 0.1,
    "lr-scheduler": "fixed",
    "force-anneal": 200,
    "arch": "fconv_iwslt_de_en"
}

We are ready to define the Estimator, which will encapsulate all the required parameters needed for launching the training on Amazon SageMaker. 

For training, the FAIRSeq toolkit recommends to train on GPU instances, such as the `ml.p3` instance family [available in Amazon SageMaker](https://aws.amazon.com/sagemaker/pricing/instance-types/). In this example, we are training on 2 instances.

In [12]:
from sagemaker.estimator import Estimator

algorithm_name = "pytorch-fairseq"
image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, algorithm_name)

estimator = Estimator(image,
                     role,
                     train_instance_count=2,
                     train_instance_type='ml.p3.8xlarge',
                     train_volume_size=100, 
                     output_path='s3://{}/output'.format(bucket),
                     hyperparameters=hyperparameters)

The call to fit will launch the training job and regularly report on the different performance metrics related to the training. 

In [13]:
estimator.fit(inputs=inputs)

2019-06-28 22:06:40 Starting - Starting the training job...
2019-06-28 22:06:55 Starting - Launching requested ML instances......
2019-06-28 22:07:57 Starting - Preparing the instances for training......
2019-06-28 22:09:08 Downloading - Downloading input data
2019-06-28 22:09:08 Training - Downloading the training image............
2019-06-28 22:11:02 Training - Training image download completed. Training in progress.
[32mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[32mbash: no job control in this shell[0m
[32mStarting the training.[0m
[32m{'force-anneal': '200', 'criterion': 'label_smoothed_cross_entropy', 'lr': '0.25', 'dropout': '0.2', 'label-smoothing': '0.1', 'clip-norm': '0.1', 'lr-scheduler': 'fixed', 'max-tokens': '4000', 'arch': 'fconv_iwslt_de_en'}[0m
[32m['--force-anneal', '200', '--criterion', 'label_smoothed_cross_entropy', '--lr', '0.25', '--dropout', '0.2', '--label-smoothing', '0.1', '--clip-norm', '0.1', '--lr-scheduler', '

[32m| distributed init (rank 7): tcp://algo-1:1112[0m
[32m| distributed init (rank 4): tcp://algo-1:1112[0m
[32m| distributed init (rank 5): tcp://algo-1:1112[0m
[32m| distributed init (rank 6): tcp://algo-1:1112[0m
[31m| initialized host algo-1 as rank 0[0m
[31mNamespace(arch='fconv_iwslt_de_en', beam=5, bucket_cap_mb=150, buffer_size=0, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', data=['/opt/ml/input/data/training'], ddp_backend='c10d', decoder_attention='True', decoder_embed_dim=256, decoder_embed_path=None, decoder_layers='[(256, 3)] * 3', decoder_out_embed_dim=256, device_id=0, distributed_backend='gloo', distributed_init_method='tcp://algo-1:1112', distributed_port=-1, distributed_rank=0, distributed_world_size=8, diverse_beam_groups=1, diverse_beam_strength=0.5, dropout=0.2, encoder_embed_dim=256, encoder_embed_path=None, encoder_layers='[(256, 3)] * 4', fix_batches_to_gpus=False, force_anneal=200, fp16=False, fp16_init_scale=128, keep_interval

[31m| epoch 017 | loss 4.995 | nll_loss 3.675 | ppl 12.77 | wps 192844 | ups 6.7 | wpb 27811 | bsz 1128 | num_updates 2414 | lr 0.25 | gnorm 0.156 | clip 100% | oom 0 | wall 357 | train_wall 299[0m
[31m| epoch 017 | valid on 'valid' subset | valid_loss 4.81364 | valid_nll_loss 3.48223 | valid_ppl 11.18 | num_updates 2414 | best 4.81364[0m
[31m| epoch 018 | loss 4.945 | nll_loss 3.618 | ppl 12.27 | wps 196139 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 2556 | lr 0.25 | gnorm 0.155 | clip 100% | oom 0 | wall 378 | train_wall 317[0m
[31m| epoch 018 | valid on 'valid' subset | valid_loss 4.77616 | valid_nll_loss 3.44514 | valid_ppl 10.89 | num_updates 2556 | best 4.77616[0m
[31m| epoch 019 | loss 4.899 | nll_loss 3.564 | ppl 11.83 | wps 195428 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 2698 | lr 0.25 | gnorm 0.153 | clip 100% | oom 0 | wall 399 | train_wall 334[0m
[31m| epoch 019 | valid on 'valid' subset | valid_loss 4.74052 | valid_nll_loss 3.39411 | valid_ppl 10.51 | n

[31m| epoch 041 | loss 4.417 | nll_loss 3.009 | ppl 8.05 | wps 193276 | ups 6.7 | wpb 27811 | bsz 1128 | num_updates 5822 | lr 0.25 | gnorm 0.143 | clip 100% | oom 0 | wall 860 | train_wall 721[0m
[31m| epoch 041 | valid on 'valid' subset | valid_loss 4.40194 | valid_nll_loss 2.99931 | valid_ppl 8.00 | num_updates 5822 | best 4.40194[0m
[31m| epoch 042 | loss 4.405 | nll_loss 2.996 | ppl 7.98 | wps 197481 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 5964 | lr 0.25 | gnorm 0.140 | clip 100% | oom 0 | wall 881 | train_wall 738[0m
[31m| epoch 042 | valid on 'valid' subset | valid_loss 4.40072 | valid_nll_loss 2.98311 | valid_ppl 7.91 | num_updates 5964 | best 4.40072[0m
[31m| epoch 043 | loss 4.394 | nll_loss 2.983 | ppl 7.90 | wps 195480 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 6106 | lr 0.25 | gnorm 0.141 | clip 100% | oom 0 | wall 902 | train_wall 756[0m
[31m| epoch 043 | valid on 'valid' subset | valid_loss 4.39715 | valid_nll_loss 2.97854 | valid_ppl 7.88 | num_upd

[31m| epoch 065 | loss 4.221 | nll_loss 2.783 | ppl 6.88 | wps 196311 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 9230 | lr 0.25 | gnorm 0.136 | clip 100% | oom 0 | wall 1361 | train_wall 1142[0m
[31m| epoch 065 | valid on 'valid' subset | valid_loss 4.29163 | valid_nll_loss 2.866 | valid_ppl 7.29 | num_updates 9230 | best 4.29163[0m
[31m| epoch 066 | loss 4.215 | nll_loss 2.776 | ppl 6.85 | wps 194740 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 9372 | lr 0.25 | gnorm 0.135 | clip 100% | oom 0 | wall 1382 | train_wall 1159[0m
[31m| epoch 066 | valid on 'valid' subset | valid_loss 4.29269 | valid_nll_loss 2.86084 | valid_ppl 7.26 | num_updates 9372 | best 4.29163[0m
[31m| epoch 067 | loss 4.208 | nll_loss 2.768 | ppl 6.81 | wps 194554 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 9514 | lr 0.25 | gnorm 0.133 | clip 100% | oom 0 | wall 1403 | train_wall 1177[0m
[31m| epoch 067 | valid on 'valid' subset | valid_loss 4.28482 | valid_nll_loss 2.85499 | valid_ppl 7.23 | num

[31m| epoch 089 | loss 4.112 | nll_loss 2.656 | ppl 6.30 | wps 195801 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 12638 | lr 0.25 | gnorm 0.128 | clip 100% | oom 0 | wall 1864 | train_wall 1564[0m
[31m| epoch 089 | valid on 'valid' subset | valid_loss 4.24336 | valid_nll_loss 2.79807 | valid_ppl 6.96 | num_updates 12638 | best 4.24185[0m
[31m| epoch 090 | loss 4.109 | nll_loss 2.652 | ppl 6.29 | wps 194820 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 12780 | lr 0.25 | gnorm 0.128 | clip 100% | oom 0 | wall 1885 | train_wall 1582[0m
[31m| epoch 090 | valid on 'valid' subset | valid_loss 4.24087 | valid_nll_loss 2.79213 | valid_ppl 6.93 | num_updates 12780 | best 4.24087[0m
[31m| epoch 091 | loss 4.104 | nll_loss 2.647 | ppl 6.27 | wps 198287 | ups 6.9 | wpb 27811 | bsz 1128 | num_updates 12922 | lr 0.25 | gnorm 0.126 | clip 99% | oom 0 | wall 1906 | train_wall 1599[0m
[31m| epoch 091 | valid on 'valid' subset | valid_loss 4.23347 | valid_nll_loss 2.79233 | valid_ppl 6.93

[31m| epoch 113 | loss 4.038 | nll_loss 2.570 | ppl 5.94 | wps 194889 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 16046 | lr 0.25 | gnorm 0.123 | clip 100% | oom 0 | wall 2365 | train_wall 1986[0m
[31m| epoch 113 | valid on 'valid' subset | valid_loss 4.21808 | valid_nll_loss 2.76184 | valid_ppl 6.78 | num_updates 16046 | best 4.20533[0m
[31m| epoch 114 | loss 4.036 | nll_loss 2.568 | ppl 5.93 | wps 193966 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 16188 | lr 0.25 | gnorm 0.125 | clip 100% | oom 0 | wall 2386 | train_wall 2004[0m
[31m| epoch 114 | valid on 'valid' subset | valid_loss 4.2023 | valid_nll_loss 2.75497 | valid_ppl 6.75 | num_updates 16188 | best 4.2023[0m
[31m| epoch 115 | loss 4.033 | nll_loss 2.564 | ppl 5.92 | wps 196668 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 16330 | lr 0.25 | gnorm 0.126 | clip 99% | oom 0 | wall 2407 | train_wall 2021[0m
[31m| epoch 115 | valid on 'valid' subset | valid_loss 4.20354 | valid_nll_loss 2.7505 | valid_ppl 6.73 | 

[31m| epoch 137 | loss 3.985 | nll_loss 2.508 | ppl 5.69 | wps 194455 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 19454 | lr 0.25 | gnorm 0.120 | clip 99% | oom 0 | wall 2867 | train_wall 2408[0m
[31m| epoch 137 | valid on 'valid' subset | valid_loss 4.1852 | valid_nll_loss 2.72947 | valid_ppl 6.63 | num_updates 19454 | best 4.18275[0m
[31m| epoch 138 | loss 3.982 | nll_loss 2.506 | ppl 5.68 | wps 197923 | ups 6.9 | wpb 27811 | bsz 1128 | num_updates 19596 | lr 0.25 | gnorm 0.120 | clip 99% | oom 0 | wall 2887 | train_wall 2425[0m
[31m| epoch 138 | valid on 'valid' subset | valid_loss 4.18651 | valid_nll_loss 2.72689 | valid_ppl 6.62 | num_updates 19596 | best 4.18275[0m
[31m| epoch 139 | loss 3.980 | nll_loss 2.503 | ppl 5.67 | wps 196977 | ups 6.9 | wpb 27811 | bsz 1128 | num_updates 19738 | lr 0.25 | gnorm 0.120 | clip 100% | oom 0 | wall 2908 | train_wall 2443[0m
[31m| epoch 139 | valid on 'valid' subset | valid_loss 4.19032 | valid_nll_loss 2.72522 | valid_ppl 6.61 |

[31m| epoch 161 | loss 3.943 | nll_loss 2.460 | ppl 5.50 | wps 193856 | ups 6.7 | wpb 27811 | bsz 1128 | num_updates 22862 | lr 0.25 | gnorm 0.117 | clip 96% | oom 0 | wall 3367 | train_wall 2828[0m
[31m| epoch 161 | valid on 'valid' subset | valid_loss 4.18098 | valid_nll_loss 2.71744 | valid_ppl 6.58 | num_updates 22862 | best 4.1692[0m
[31m| epoch 162 | loss 3.941 | nll_loss 2.458 | ppl 5.49 | wps 194801 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 23004 | lr 0.25 | gnorm 0.119 | clip 97% | oom 0 | wall 3388 | train_wall 2845[0m
[31m| epoch 162 | valid on 'valid' subset | valid_loss 4.17317 | valid_nll_loss 2.71119 | valid_ppl 6.55 | num_updates 23004 | best 4.1692[0m
[31m| epoch 163 | loss 3.940 | nll_loss 2.457 | ppl 5.49 | wps 196551 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 23146 | lr 0.25 | gnorm 0.116 | clip 96% | oom 0 | wall 3408 | train_wall 2863[0m
[31m| epoch 163 | valid on 'valid' subset | valid_loss 4.17511 | valid_nll_loss 2.7108 | valid_ppl 6.55 | nu

[31m| epoch 185 | loss 3.910 | nll_loss 2.422 | ppl 5.36 | wps 196490 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 26270 | lr 0.25 | gnorm 0.116 | clip 94% | oom 0 | wall 3869 | train_wall 3250[0m
[31m| epoch 185 | valid on 'valid' subset | valid_loss 4.17393 | valid_nll_loss 2.70348 | valid_ppl 6.51 | num_updates 26270 | best 4.16027[0m
[31m| epoch 186 | loss 3.908 | nll_loss 2.419 | ppl 5.35 | wps 195824 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 26412 | lr 0.25 | gnorm 0.116 | clip 95% | oom 0 | wall 3890 | train_wall 3267[0m
[31m| epoch 186 | valid on 'valid' subset | valid_loss 4.15678 | valid_nll_loss 2.69467 | valid_ppl 6.47 | num_updates 26412 | best 4.15678[0m
[31m| epoch 187 | loss 3.907 | nll_loss 2.418 | ppl 5.34 | wps 196470 | ups 6.8 | wpb 27811 | bsz 1128 | num_updates 26554 | lr 0.25 | gnorm 0.114 | clip 92% | oom 0 | wall 3911 | train_wall 3285[0m
[31m| epoch 187 | valid on 'valid' subset | valid_loss 4.16455 | valid_nll_loss 2.69942 | valid_ppl 6.50 |

Once the model has finished training, we can go ahead and test its translation capabilities by deploying it on an endpoint.

## Hosting the model

We first need to define a base JSONPredictor class that will help us with sending predictions to the model once it's hosted on the Amazon SageMaker endpoint. 

In [22]:
from sagemaker.predictor import RealTimePredictor, json_serializer, json_deserializer

class JSONPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(JSONPredictor, self).__init__(endpoint_name, sagemaker_session, json_serializer, json_deserializer)

We can now use the estimator object to deploy the model artificats (the trained model), and deploy it on a CPU instance as we no longer need a GPU instance for simply infering from the model. Let's use a `ml.m5.xlarge`. 

In [32]:
#predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m5.12xlarge', predictor_cls=JSONPredictor)

## modifications by nigenda@ (Sagemaker Hosting on-call)
## per https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.Estimator.deploy
## the endpoint reuses the training name if no name is defined, therefore when retrying endpoint creation you should do:
nigenda_predictor = estimator.deploy(initial_instance_count=1, endpoint_name="pytorch-fairseq-20190715T14", instance_type='ml.m5.12xlarge', predictor_cls=JSONPredictor)
## that or let the estimator update the existing endpoint
# predictor = estimator.deploy(initial_instance_count=1, update_endpoint=True, instance_type='ml.m5.12xlarge', predictor_cls=JSONPredictor)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------!


Now it's your time to play. Input a sentence in German and get the translation in English by simply calling predict. 

In [36]:
import html

text_input = 'Guten Morgen'

result = nigenda_predictor.predict(text_input)
#  Some characters are escaped HTML-style requiring to unescape them before printing
print(html.unescape(result))

it 's the same .


Once you're done with getting predictions, remember to shut down your endpoint as you no longer need it. 

## Delete endpoint

In [20]:
sagemaker_session.delete_endpoint(predictor.endpoint)

NameError: name 'predictor' is not defined

Voila! For more information, you can check out the [FAIRSeq toolkit homepage](https://github.com/pytorch/fairseq). 