[Bug]: Training a Model Results in an OSError Related to Model Loading #3167

ynusinovich · 2023-03-28T09:34:18Z

Describe the bug

When I am creating a few shot learning model by finetuning tars-base, the model crashes after training without saving to my local drive like it's supposed to.

To Reproduce

# 1. what label do you want to predict?
label_type = 'label'

# 2. make a label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# 3. start from our existing TARS base model for English
tars = TARSClassifier.load("tars-base")

# 4. switch to a new task (TARS can do multiple tasks so you must define one)
tars.add_and_switch_to_new_task(task_name="classification",
                                label_dictionary=label_dict,
                                label_type=label_type,
                                )

# 5. initialize the text classifier trainer
trainer = ModelTrainer(tars, corpus)

# 6. start the training
trainer.train(base_path='../example_data/models/few_shot_model_flair',  # path to store the model artifacts
              learning_rate=0.02,  # use very small learning rate
              mini_batch_size=1,
              max_epochs=20,  # terminate after 20 epochs
              patience=1
              )

Expected behaivor

I would expect the model to save to the folder.

Logs and Stack traces

HTTPError                                 Traceback (most recent call last)
File ~/Documents/env/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:213, in hf_raise_for_status(response, endpoint_name)
    212 try:
--> 213     response.raise_for_status()
    214 except HTTPError as e:

File ~/Documents/env/lib/python3.9/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
   1020 if http_error_msg:
-> 1021     raise HTTPError(http_error_msg, response=self)

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

RepositoryNotFoundError                   Traceback (most recent call last)
File ~/Documents/env/lib/python3.9/site-packages/transformers/utils/hub.py:409, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash)
    407 try:
    408     # Load from URL or cache if already cached
--> 409     resolved_file = hf_hub_download(
    410         path_or_repo_id,
    411         filename,
    412         subfolder=None if len(subfolder) == 0 else subfolder,
    413         revision=revision,
    414         cache_dir=cache_dir,
...
    434         f"'https://huggingface.co/{path_or_repo_id}' for available revisions."
    435     )

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

Screenshots

No response

Additional Context

The training completed all epochs before crashing.
The code I used was from your tutorial page. It has worked in the past.

Environment

Versions:

Flair

0.12.1

Pytorch

1.13.1

Transformers

4.25.1

GPU

False

The text was updated successfully, but these errors were encountered:

ynusinovich · 2023-03-30T00:34:53Z

It worked once when I moved the code from a Jupyter notebook to a .py file, but now it stopped working again. It fails with the same error in the .py file.

maylad31 · 2023-04-17T15:58:11Z

same here. Did you find any workaround?

alanakbik · 2023-04-19T10:54:00Z

Hello, when is the error thrown? At the end of training?

ynusinovich · 2023-04-20T04:25:44Z

@maylad31 The only thing I was able to do to get it working was to downgrade to flair==0.11.3 torch==1.11.0 transformers==4.27.4. The downgrade also required deleting transformer-smaller-training-vocab since it was no longer compatible with anything.
@alanakbik The error is thrown at the end of the training, when the trained model is saving. It seems like it was trying to save it to the Hugging Face Hub, even though I specified a local folder (which exists).

helpmefindaname · 2023-04-20T09:51:39Z

Hi @ynusinovich can you share the full stacktrace?
The one you shared is cut off and doesn't show any calls within the flair library, hence it is not visible to us which call raises the error.

helpmefindaname · 2023-04-20T15:05:36Z

Okay I think I happen to find the exact same bug, the full stacktrace is the following:

File "C:\Users\bened\PycharmProjects\project\project\classifier\tars.py", line 32, in __init__
    super().__init__(models_path / "tars-base" / "tars-base.pt")
  File "C:\Users\bened\PycharmProjects\project\project\classifier\tars.py", line 14, in __init__
    self.model = TARSClassifier.load(model_name)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\models\tars_model.py", line 919, in load
    return cast("TARSClassifier", super().load(model_path=model_path))
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\models\tars_model.py", line 315, in load
    return cast("FewshotClassifier", super().load(model_path=model_path))
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\nn\model.py", line 538, in load
    return cast("Classifier", super().load(model_path=model_path))
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\nn\model.py", line 164, in load
    state = load_torch_state(model_file)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\file_utils.py", line 360, in load_torch_state
    return torch.load(f, map_location="cpu")
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\embeddings\transformer.py", line 1163, in __setstate__
    embedding = self.create_from_state(saved_config=config, **state)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\embeddings\document.py", line 61, in create_from_state
    return cls(**state)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\embeddings\document.py", line 47, in __init__
    TransformerEmbeddings.__init__(
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\embeddings\transformer.py", line 967, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(model, add_prefix_space=True, **kwargs)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 642, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 486, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\transformers\utils\hub.py", line 424, in cached_file
    raise EnvironmentError(
OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

That found, it is simple to reproduce the error with the following code:

from flair.models import TARSClassifier
model = TARSClassifier.load("tars-base")
model.save("local-tars-base.pt")
new_model = TARSClassifier.load("local-tars-base.pt")  # here the error happens.

Setting a breakpoint in the flair\embeddings\transformer.py(1163)__setstate__() method while loading "tars-base"

shows that the TarsClassifier saves the following HuggingfaceConfig:

BertConfig {
  "_name_or_path": "None",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_hidden_states": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "return_dict": false,
  "transformers_version": "4.28.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

notice, that "_name_or_path": "None", part being wrongly stored (it should be "bert-base-uncased"). This is something we can fix in the loading function.

About using the model, you can hotfix by manually setting the following attributes directly after loading:

model.tars_embeddings.model.config._name_or_path = "bert-base-uncased"
model.tars_embeddings.base_model_name = "bert-base-uncased"
model.tars_embeddings.name = "transformer-bert-base-uncased"

maylad31 · 2023-04-21T08:48:14Z

@ynusinovich thanks for the temporary workaround. The stacktrace shared above is exactly what i get.

ynusinovich added the bug Something isn't working label Mar 28, 2023

ynusinovich closed this as completed Mar 29, 2023

ynusinovich reopened this Mar 30, 2023

helpmefindaname mentioned this issue Apr 20, 2023

[Bug]: TarsModels do redownload embeddings #3207

Closed

helpmefindaname pushed a commit that referenced this issue Apr 21, 2023

gh-3167: fix missing config information

c330b95

helpmefindaname mentioned this issue Apr 21, 2023

Fix tars loading #3212

Merged

alanakbik closed this as completed in #3212 Apr 21, 2023

helpmefindaname mentioned this issue Oct 2, 2023

[Bug]: Challenges in running TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL #3305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Training a Model Results in an OSError Related to Model Loading #3167

[Bug]: Training a Model Results in an OSError Related to Model Loading #3167

ynusinovich commented Mar 28, 2023

ynusinovich commented Mar 30, 2023

maylad31 commented Apr 17, 2023

alanakbik commented Apr 19, 2023

ynusinovich commented Apr 20, 2023

helpmefindaname commented Apr 20, 2023

helpmefindaname commented Apr 20, 2023

maylad31 commented Apr 21, 2023

[Bug]: Training a Model Results in an OSError Related to Model Loading #3167

[Bug]: Training a Model Results in an OSError Related to Model Loading #3167

Comments

ynusinovich commented Mar 28, 2023

Describe the bug

To Reproduce

Expected behaivor

Logs and Stack traces

Screenshots

Additional Context

Environment

Versions:

Flair

Pytorch

Transformers

GPU

ynusinovich commented Mar 30, 2023

maylad31 commented Apr 17, 2023

alanakbik commented Apr 19, 2023

ynusinovich commented Apr 20, 2023

helpmefindaname commented Apr 20, 2023

helpmefindaname commented Apr 20, 2023

maylad31 commented Apr 21, 2023