Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Can't load older models using Byte-Pair embeddings since flair 0.14 #3513

Closed
mauryaland opened this issue Jul 26, 2024 · 1 comment
Closed
Assignees
Labels
bug Something isn't working

Comments

@mauryaland
Copy link
Contributor

Describe the bug

This commit f1a4d96 causes an error when trying to load a model using byte-pair embeddings trained with older flair versions.

To Reproduce

from flair.models import SequenceTagger
model = SequenceTagger.load("path/to/your/model") # model trained using byte-pair embeddings

Expected behavior

Model should load properly

Logs and Stack traces

`---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[44], line 1
----> 1 model2 = SequenceTagger.load("new_categories_model.pt")

File ~\AppData\Local\anaconda3\lib\site-packages\flair\models\sequence_tagger_model.py:925, in SequenceTagger.load(cls, model_path)
    921 @classmethod
    922 def load(cls, model_path: Union[str, Path, Dict[str, Any]]) -> "SequenceTagger":
    923     from typing import cast
--> 925     return cast("SequenceTagger", super().load(model_path=model_path))

File ~\AppData\Local\anaconda3\lib\site-packages\flair\nn\model.py:564, in Classifier.load(cls, model_path)
    560 @classmethod
    561 def load(cls, model_path: Union[str, Path, Dict[str, Any]]) -> "Classifier":
    562     from typing import cast
--> 564     return cast("Classifier", super().load(model_path=model_path))

File ~\AppData\Local\anaconda3\lib\site-packages\flair\nn\model.py:190, in Model.load(cls, model_path)
    188 if not isinstance(model_path, dict):
    189     model_file = cls._fetch_model(str(model_path))
--> 190     state = load_torch_state(model_file)
    191 else:
    192     state = model_path

File ~\AppData\Local\anaconda3\lib\site-packages\flair\file_utils.py:384, in load_torch_state(model_file)
    380 # load_big_file is a workaround byhttps://github.com/highway11git
    381 # to load models on some Mac/Windows setups
    382 # see https://github.com/zalandoresearch/flair/issues/351
    383 f = load_big_file(model_file)
--> 384 return torch.load(f, map_location="cpu")

File ~\AppData\Local\anaconda3\lib\site-packages\torch\serialization.py:1026, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
   1024             except RuntimeError as e:
   1025                 raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
-> 1026         return _load(opened_zipfile,
   1027                      map_location,
   1028                      pickle_module,
   1029                      overall_storage=overall_storage,
   1030                      **pickle_load_args)
   1031 if mmap:
   1032     raise RuntimeError("mmap can only be used with files saved with "
   1033                        "`torch.save(_use_new_zipfile_serialization=True), "
   1034                        "please torch.save your checkpoint with this option in order to use mmap.")

File ~\AppData\Local\anaconda3\lib\site-packages\torch\serialization.py:1438, in _load(zip_file, map_location, pickle_module, pickle_file, overall_storage, **pickle_load_args)
   1436 unpickler = UnpicklerWrapper(data_file, **pickle_load_args)
   1437 unpickler.persistent_load = persistent_load
-> 1438 result = unpickler.load()
   1440 torch._utils._validate_loaded_sparse_tensors()
   1441 torch._C._log_api_usage_metadata(
   1442     "torch.load.metadata", {"serialization_id": zip_file.serialization_id()}
   1443 )

File ~\AppData\Local\anaconda3\lib\site-packages\torch\serialization.py:1431, in _load.<locals>.UnpicklerWrapper.find_class(self, mod_name, name)
   1429         pass
   1430 mod_name = load_module_mapping.get(mod_name, mod_name)
-> 1431 return super().find_class(mod_name, name)

AttributeError: Can't get attribute 'BPEmbSerializable' on <module 'flair.embeddings.token' from 'C:\\Users\\amaury.fouret\\AppData\\Local\\anaconda3\\lib\\site-packages\\flair\\embeddings\\token.py'>`

Screenshots

No response

Additional Context

Maybe we could start to add the flair version used to train a model in the model's metadata to fix easily this kind of issue in the future?

Environment

Versions:

Flair

0.14.0

Pytorch

2.2.0+cpu

Transformers

4.37.2

GPU

False

@mauryaland mauryaland added the bug Something isn't working label Jul 26, 2024
@helpmefindaname
Copy link
Collaborator

Hi @mauryaland
you seem to use the old seralisation format.
For this you have 2 options:

  1. use pip install flair[word-embeddings]
  2. with flair==0.13.* load the model and save it again to have it saved in the right format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants