Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Training a Model Results in an OSError Related to Model Loading #3167

Closed
ynusinovich opened this issue Mar 28, 2023 · 7 comments · Fixed by #3212
Closed

[Bug]: Training a Model Results in an OSError Related to Model Loading #3167

ynusinovich opened this issue Mar 28, 2023 · 7 comments · Fixed by #3212
Labels
bug Something isn't working

Comments

@ynusinovich
Copy link

Describe the bug

When I am creating a few shot learning model by finetuning tars-base, the model crashes after training without saving to my local drive like it's supposed to.

To Reproduce

# 1. what label do you want to predict?
label_type = 'label'

# 2. make a label dictionary
label_dict = corpus.make_label_dictionary(label_type=label_type)

# 3. start from our existing TARS base model for English
tars = TARSClassifier.load("tars-base")

# 4. switch to a new task (TARS can do multiple tasks so you must define one)
tars.add_and_switch_to_new_task(task_name="classification",
                                label_dictionary=label_dict,
                                label_type=label_type,
                                )

# 5. initialize the text classifier trainer
trainer = ModelTrainer(tars, corpus)

# 6. start the training
trainer.train(base_path='../example_data/models/few_shot_model_flair',  # path to store the model artifacts
              learning_rate=0.02,  # use very small learning rate
              mini_batch_size=1,
              max_epochs=20,  # terminate after 20 epochs
              patience=1
              )

Expected behaivor

I would expect the model to save to the folder.

Logs and Stack traces

HTTPError                                 Traceback (most recent call last)
File ~/Documents/env/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:213, in hf_raise_for_status(response, endpoint_name)
    212 try:
--> 213     response.raise_for_status()
    214 except HTTPError as e:

File ~/Documents/env/lib/python3.9/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
   1020 if http_error_msg:
-> 1021     raise HTTPError(http_error_msg, response=self)

HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/None/resolve/main/tokenizer_config.json

The above exception was the direct cause of the following exception:

RepositoryNotFoundError                   Traceback (most recent call last)
File ~/Documents/env/lib/python3.9/site-packages/transformers/utils/hub.py:409, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, use_auth_token, revision, local_files_only, subfolder, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash)
    407 try:
    408     # Load from URL or cache if already cached
--> 409     resolved_file = hf_hub_download(
    410         path_or_repo_id,
    411         filename,
    412         subfolder=None if len(subfolder) == 0 else subfolder,
    413         revision=revision,
    414         cache_dir=cache_dir,
...
    434         f"'https://huggingface.co/{path_or_repo_id}' for available revisions."
    435     )

OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

Screenshots

No response

Additional Context

The training completed all epochs before crashing.
The code I used was from your tutorial page. It has worked in the past.

Environment

Versions:

Flair

0.12.1

Pytorch

1.13.1

Transformers

4.25.1

GPU

False

@ynusinovich ynusinovich added the bug Something isn't working label Mar 28, 2023
@ynusinovich ynusinovich reopened this Mar 30, 2023
@ynusinovich
Copy link
Author

It worked once when I moved the code from a Jupyter notebook to a .py file, but now it stopped working again. It fails with the same error in the .py file.

@maylad31
Copy link

same here. Did you find any workaround?

@alanakbik
Copy link
Collaborator

Hello, when is the error thrown? At the end of training?

@ynusinovich
Copy link
Author

@maylad31 The only thing I was able to do to get it working was to downgrade to flair==0.11.3 torch==1.11.0 transformers==4.27.4. The downgrade also required deleting transformer-smaller-training-vocab since it was no longer compatible with anything.
@alanakbik The error is thrown at the end of the training, when the trained model is saving. It seems like it was trying to save it to the Hugging Face Hub, even though I specified a local folder (which exists).

@helpmefindaname
Copy link
Collaborator

Hi @ynusinovich can you share the full stacktrace?
The one you shared is cut off and doesn't show any calls within the flair library, hence it is not visible to us which call raises the error.

@helpmefindaname
Copy link
Collaborator

Okay I think I happen to find the exact same bug, the full stacktrace is the following:

File "C:\Users\bened\PycharmProjects\project\project\classifier\tars.py", line 32, in __init__
    super().__init__(models_path / "tars-base" / "tars-base.pt")
  File "C:\Users\bened\PycharmProjects\project\project\classifier\tars.py", line 14, in __init__
    self.model = TARSClassifier.load(model_name)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\models\tars_model.py", line 919, in load
    return cast("TARSClassifier", super().load(model_path=model_path))
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\models\tars_model.py", line 315, in load
    return cast("FewshotClassifier", super().load(model_path=model_path))
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\nn\model.py", line 538, in load
    return cast("Classifier", super().load(model_path=model_path))
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\nn\model.py", line 164, in load
    state = load_torch_state(model_file)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\file_utils.py", line 360, in load_torch_state
    return torch.load(f, map_location="cpu")
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\torch\serialization.py", line 809, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\torch\serialization.py", line 1172, in _load
    result = unpickler.load()
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\embeddings\transformer.py", line 1163, in __setstate__
    embedding = self.create_from_state(saved_config=config, **state)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\embeddings\document.py", line 61, in create_from_state
    return cls(**state)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\embeddings\document.py", line 47, in __init__
    TransformerEmbeddings.__init__(
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\flair\embeddings\transformer.py", line 967, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(model, add_prefix_space=True, **kwargs)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 642, in from_pretrained
    tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 486, in get_tokenizer_config
    resolved_config_file = cached_file(
  File "C:\Users\bened\anaconda3\envs\project\lib\site-packages\transformers\utils\hub.py", line 424, in cached_file
    raise EnvironmentError(
OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

That found, it is simple to reproduce the error with the following code:

from flair.models import TARSClassifier
model = TARSClassifier.load("tars-base")
model.save("local-tars-base.pt")
new_model = TARSClassifier.load("local-tars-base.pt")  # here the error happens.

Setting a breakpoint in the flair\embeddings\transformer.py(1163)__setstate__() method while loading "tars-base"

shows that the TarsClassifier saves the following HuggingfaceConfig:

BertConfig {
  "_name_or_path": "None",
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_hidden_states": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "return_dict": false,
  "transformers_version": "4.28.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

notice, that "_name_or_path": "None", part being wrongly stored (it should be "bert-base-uncased"). This is something we can fix in the loading function.

About using the model, you can hotfix by manually setting the following attributes directly after loading:

model.tars_embeddings.model.config._name_or_path = "bert-base-uncased"
model.tars_embeddings.base_model_name = "bert-base-uncased"
model.tars_embeddings.name = "transformer-bert-base-uncased"

@maylad31
Copy link

@ynusinovich thanks for the temporary workaround. The stacktrace shared above is exactly what i get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants