Skip to content

shimdx/sm-fairseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fairseq on Amazon SageMaker

Fairseq Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

In this repository, we will show how to integrate Fairseq into Amazon SageMaker Training Job using Pytorch Estimator. Instead of using Custom Docker container, this example uses shell script as an entry point of Pytorch estimator. Which contains dependancy installation commands and data preprocessing commands.

Example notebooks

Local Mode

In case of using local mode, we recommend using the following command as a startup script of SageMaker Notebook to change the docker repository path.

#!/bin/bash

set -ex

DAEMON_PATH="/etc/docker"
MEMORY_SIZE=10G

FLAG=$(cat $DAEMON_PATH/daemon.json | jq 'has("data-root")')
# echo $FLAG

if [ "$FLAG" == true ]; then
    echo "Already revised"
else
    echo "Add data-root and default-shm-size=$MEMORY_SIZE"
    sudo cp $DAEMON_PATH/daemon.json $DAEMON_PATH/daemon.json.bak
    sudo cat $DAEMON_PATH/daemon.json.bak | jq '. += {"data-root":"/home/ec2-user/SageMaker/.container/docker","default-shm-size":"'$MEMORY_SIZE'"}' | sudo tee $DAEMON_PATH/daemon.json > /dev/null
    sudo service docker restart
    echo "Docker Restart"
fi

About

This repo contains example notebook to build and run fairseq toolkit in SageMaker Notebook

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors