This repository is inspired in some Hugginface tutorials and contains notebooks to explore the text dataset and to train the model. There is only a notebook on how to upload the model to the Gugginface Hub.
In the main notebook we will describe the most relevant steps to train a Hugginface model in AWS SageMaker, showing how to deal with experiments and solving some of the problems when facing with custom models when using SageMaker script mode on. Some basics concepts on SageMaker will not be detailed in order to focus on the relevant concepts.
Following steps will be explained:
-
Create an Experiment and Trial to keep track of our experiments
-
Load the training data to our training instance and create train, validation and test dataset and upload to S3
-
Create the scripts to train our Huggingface model, a RoBERTa based model pretrained in a spanish corpus: RuPERTa.
-
Create an Estimator to train our model in a huggingface container in script mode
-
Download and deploy the trained model to make predictions
-
Create a Batch Transform job to make predictions for the test dataset
On development Different models are under development
On development
On development
On development
If you find some bug or typo, please let me know or fixit and push it to be analyzed.
These notebooks are under a public GNU License.