You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it possibile to use run_lm_finetuning.py script to train one of the models from scratch in a language not covered by the available pretrained models? (like spanish, italian, german).
My idea is to replicate something like camemBERT for a language different from french, given that I have the corpora needed for the training.
What are some suggestions that you could give me? What are the changes to make in the script in order to run it correctly for this purpose? How can I deal with a corpus of ~150GB?