Can I use run_lm_finetuning.py for training models in an uncovered language?

Is it possibile to use run_lm_finetuning.py script to train one of the models from scratch in a language not covered by the available pretrained models? (like spanish, italian, german).

My idea is to replicate something like camemBERT for a language different from french, given that I have the corpora needed for the training.

What are some suggestions that you could give me? What are the changes to make in the script in order to run it correctly for this purpose? How can I deal with a corpus of ~150GB?

Thanks for any help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can I use run_lm_finetuning.py for training models in an uncovered language? #2301

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can I use run_lm_finetuning.py for training models in an uncovered language? #2301

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions