Unclear which transformers version should be used when testing Tatoeba #85

vrmer · 2022-03-16T13:53:30Z

I installed all the necessary dependencies and tried running the Tatoeba task using bash scripts/train.sh "bert-base-multilingual-cased" tatoeba.

However, I immediately ran into an ImportError:

Traceback (most recent call last): File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 39, in <module> from bert import BertForRetrieval File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/bert.py", line 4, in <module> from transformers.modeling_bert import BertModel, BertPreTrainedModel ModuleNotFoundError: No module named 'transformers.modeling_bert'

This is with transformers-4.17.0.

I tried downgrading transformers to version 3.5 and 2.0 but I am running into other issues then.

Traceback (most recent call last): File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 43, in <module> from xlm_roberta import XLMRobertaConfig, XLMRobertaForRetrieval, XLMRobertaModel File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/xlm_roberta.py", line 24, in <module> from roberta import ( File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/roberta.py", line 27, in <module> from transformers.modeling_bert import BertEmbeddings, BertLayerNorm, BertModel, BertPreTrainedModel, gelu ImportError: cannot import name 'BertLayerNorm' from 'transformers.modeling_bert' (/Users/marcellfekete/miniforge3/envs/rosetta/lib/python3.8/site-packages/transformers/modeling_bert.py)

This is with transformers 3.5.

Traceback (most recent call last): File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 57, in <module> "xlmr": (XLMRobertaConfig, XLMRobertaModel, XLMRobertaTokenizer), NameError: name 'XLMRobertaTokenizer' is not defined

This is with transformers 2.0.

Do you have any advice? Which transformers version is recommended to run the tests?

I don't know if it matters but I am trying to run on Apple Silicon using the Rosetta layer (due to faiss not installing natively).

Thank you!

The text was updated successfully, but these errors were encountered:

sebastianruder · 2022-03-16T14:06:49Z

Hi @vrmer, we used transformers==2.3.0 as far as I am aware. Did you try running the install_tools.sh script? This should install the correct transformer version (see this line).

vrmer · 2022-03-16T14:12:24Z

Thanks for the response! I ran into issues running the install_tools.sh script when I first started using the library but I don't have the output for that at the moment.

Nevertheless, I followed the lines you pointed at and install transformers==2.3.0. However, I still get the following errors:

Traceback (most recent call last): File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 57, in <module> "xlmr": (XLMRobertaConfig, XLMRobertaModel, XLMRobertaTokenizer), NameError: name 'XLMRobertaTokenizer' is not defined

sebastianruder · 2022-03-16T14:46:37Z

Hmm, that's strange. I just checked the HuggingFace Transformers repo and XLMRobertaTokenizer should be available in v2.3.0 (see here)? Could you double check that you have the correct version and if the file tokenization_xlm_roberta.py is available in the transformers version you are using?

vrmer · 2022-03-16T15:03:16Z

Apologies, apparently I uncommented the import statement when I was trying to make the code run and forgot to put it back!

Now the code starts running with this message:

03/16/2022 15:57:00 - INFO - root -   Input args: Namespace(batch_size=100, cache_dir='', candidate_prefix='candidates', concate_layers=False, config_name='', data_dir='/Users/marcellfekete/PycharmProjects/xtreme/download//tatoeba/', dist='cosine', do_lower_case=False, embed_size=768, encoding='utf-8', extract_embeds=False, gold=None, init_checkpoint=None, local_rank=-1, log_file='embed-cosine', max_answer_length=92, max_query_length=64, max_seq_length=512, mine_bitext=False, model_name_or_path='/mnt/disk-1/models/squad/xlm-roberta-large_LR3e-5_EPOCH2.0_maxlen384_batchsize2_gradacc16', model_type='bert', no_cuda=False, num_layers=12, output_dir='/Users/marcellfekete/PycharmProjects/xtreme/outputs-temp//tatoeba//mnt/disk-1/models/squad/xlm-roberta-large_LR3e-5_EPOCH2.0_maxlen384_batchsize2_gradacc16_512/', overwrite_cache=False, overwrite_output_dir=False, pool_skip_special_token=False, pool_type='mean', predict_dir=None, specific_layer=7, split='training', src_embed_file=None, src_file=None, src_id_file=None, src_language='ar', src_text_file=None, src_tok_file=None, task_name='tatoeba', tgt_embed_file=None, tgt_file=None, tgt_id_file=None, tgt_language='en', tgt_text_file=None, tgt_tok_file=None, threshold=-1, tokenizer_name='', unify=False, use_shift_embeds=False)

But then it gives me this error message:

Traceback (most recent call last):
  File "/Users/marcellfekete/miniforge3/envs/rosetta/lib/python3.8/site-packages/transformers/configuration_utils.py", line 204, in get_config_dict
    raise EnvironmentError
OSError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 748, in <module>
    main()
  File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 733, in main
    all_src_embeds = extract_embeddings(args, src_text_file, src_tok_file, None, lang=src_lang2, pool_type=args.pool_type)
  File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 173, in extract_embeddings
    config, model, tokenizer, langid = load_model(args, lang,
  File "/Users/marcellfekete/PycharmProjects/xtreme/third_party/evaluate_retrieval.py", line 150, in load_model
    config = config_class.from_pretrained(args.model_name_or_path)
  File "/Users/marcellfekete/miniforge3/envs/rosetta/lib/python3.8/site-packages/transformers/configuration_utils.py", line 160, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/Users/marcellfekete/miniforge3/envs/rosetta/lib/python3.8/site-packages/transformers/configuration_utils.py", line 220, in get_config_dict
    raise EnvironmentError(msg)
OSError: Model name '/mnt/disk-1/models/squad/xlm-roberta-large_LR3e-5_EPOCH2.0_maxlen384_batchsize2_gradacc16' was not found in model name list. We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert//mnt/disk-1/models/squad/xlm-roberta-large_LR3e-5_EPOCH2.0_maxlen384_batchsize2_gradacc16/config.json' was a path, a model identifier, or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

I'm not even sure why it is trying to use XLM-RoBERTa when I explicitly tried using multilingual BERT.

sebastianruder · 2022-03-16T15:38:12Z

I assume you're running the run_tatoeba.sh script? We are now recommending to use a model fine-tuned on SQuAD for retrieval, rather than using the representations of the pre-trained model directly.
In the run_tatoeba.sh script, you can replace the path to the fine-tuned model here. If you prefer not to use a fine-tuned model, you can simply uncomment that line and things should run as expected.

Edit: Running scripts/train.sh "bert-base-multilingual-cased" tatoeba calls the run_tatoeba.sh script.

vrmer · 2022-03-16T15:49:38Z

Oh thank you, that was actually really helpful! Now the code seems to be running without issues.

I am closing the issue because it has been sorted.

vrmer closed this as completed Mar 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear which transformers version should be used when testing Tatoeba #85

Unclear which transformers version should be used when testing Tatoeba #85

vrmer commented Mar 16, 2022

sebastianruder commented Mar 16, 2022

vrmer commented Mar 16, 2022

sebastianruder commented Mar 16, 2022

vrmer commented Mar 16, 2022

sebastianruder commented Mar 16, 2022 •

edited

Loading

vrmer commented Mar 16, 2022

Unclear which transformers version should be used when testing Tatoeba #85

Unclear which transformers version should be used when testing Tatoeba #85

Comments

vrmer commented Mar 16, 2022

sebastianruder commented Mar 16, 2022

vrmer commented Mar 16, 2022

sebastianruder commented Mar 16, 2022

vrmer commented Mar 16, 2022

sebastianruder commented Mar 16, 2022 • edited Loading

vrmer commented Mar 16, 2022

sebastianruder commented Mar 16, 2022 •

edited

Loading