Skip to content

Can I use run_lm_finetuning.py for training models in an uncovered language? #2301

@cppntn

Description

@cppntn

Is it possibile to use run_lm_finetuning.py script to train one of the models from scratch in a language not covered by the available pretrained models? (like spanish, italian, german).

My idea is to replicate something like camemBERT for a language different from french, given that I have the corpora needed for the training.

What are some suggestions that you could give me? What are the changes to make in the script in order to run it correctly for this purpose? How can I deal with a corpus of ~150GB?

Thanks for any help

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions