You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because I have never trained the pre-training model, I have a small question about what the paralleldata input format looks likeTRAIN_FILE=/path/to/train/file. Do you need a separator between src and tgt? What is the format?
In addition, can you fine-tune the xlm-roberta-large model?
And the xlm-roberta-base file contains these contents:config.json 、gitattributes 、pytorch_model.bin 、sentencepiece.bpe.model 、tokenizer.json
This error occurred while running:
10/26/2021 19:03:46 - INFO - awesome_align.tokenization_utils - Didn't find file xlm-roberta-base/vocab.txt. We won't load it.
10/26/2021 19:03:46 - INFO - awesome_align.tokenization_utils - Didn't find file xlm-roberta-base/added_tokens.json. We won't load it.
10/26/2021 19:03:46 - INFO - awesome_align.tokenization_utils - Didn't find file xlm-roberta-base/special_tokens_map.json. We won't load it.
10/26/2021 19:03:46 - INFO - awesome_align.tokenization_utils - Didn't find file xlm-roberta-base/tokenizer_config.json. We won't load it.
Now I really want to use bilingual data to continue training xlm-roberta-base model and ask for advice through TLM task.
The text was updated successfully, but these errors were encountered:
As in README, the inputs should be tokenized and each line is a source language sentence and its target language translation, separated by (|||). You can see some examples in the examples folder.
I haven't fine-tuned xlm-roberta-large before, but I think you can use the code in the xlmr branch and tune some parameters (e.g. align_layer, learning_rate, max_steps) and see if you can get reasonable performance.
Because I have never trained the pre-training model, I have a small question about what the paralleldata input format looks like
TRAIN_FILE=/path/to/train/file
. Do you need a separator between src and tgt? What is the format?In addition, can you fine-tune the xlm-roberta-large model?
And the xlm-roberta-base file contains these contents:
config.json 、gitattributes 、pytorch_model.bin 、sentencepiece.bpe.model 、tokenizer.json
This error occurred while running:
Now I really want to use bilingual data to continue training xlm-roberta-base model and ask for advice through TLM task.
The text was updated successfully, but these errors were encountered: