Uploading trained model to HF/saving in HF format locally #34

NtaylorOX · 2023-09-26T10:29:57Z

Great work and lovely repo. However, I am failing to push to HF using the provided load_local_model.py script.

I have a private dataset, and use the pre-training script successfuly via:

python pretrain.py name=amp_b8192_cb_o4_final arch=crammed-bert train=bert-o4  data={my_dataset}

Trained fine - saved fine.

But when running - I just want to try pushing to hub for instance:

python load_local_model.py name=amp_b8192_cb_o4_mimic_final wandb=none impl.push_to_huggingface_hub=True arch=crammed-bert train=bert-o4 dryrun=False +eval=GLUE_sane

I get a whole lot of missing keys when trying to load the state dicts:

RuntimeError: Error(s) in loading state_dict for OptimizedModule:
Missing key(s) in state_dict: "_orig_mod.encoder.embedding.word_embedding.weight", "_orig_mod.encoder.embedding.pos_embedding.scale_factor", "_orig_mod.encoder.embedding.norm.weight", "_orig_mod.encoder.embedding.norm.bias", "_orig_mod.encoder.layers.0.norm1.weight",....

and so on.

Is there anything obvious I am missing when trying to re-load the model?

Another question - is there a straight forward way to convert the current model files to that compatible with the HF transformers library, but locally rather than via hub?

Any help would be much appreciated. Package info below. Python 3.10.


Package                  Version
------------------------ ------------
aiohttp                  3.8.5
aiosignal                1.3.1
antlr4-python3-runtime   4.9.3
asttokens                2.4.0
async-timeout            4.0.3
attrs                    23.1.0
backcall                 0.2.0
certifi                  2023.7.22
charset-normalizer       3.2.0
cmake                    3.27.4.1
comm                     0.1.4
cramming                 0.1.0
datasets                 2.14.5
debugpy                  1.8.0
decorator                5.1.1
dill                     0.3.7
einops                   0.6.1
evaluate                 0.4.0
exceptiongroup           1.1.3
executing                1.2.0
filelock                 3.12.4
frozenlist               1.4.0
fsspec                   2023.6.0
huggingface-hub          0.16.4
hydra-core               1.3.2
idna                     3.4
ipykernel                6.25.2
ipython                  8.15.0
jedi                     0.19.0
Jinja2                   3.1.2
joblib                   1.3.2
jupyter_client           8.3.1
jupyter_core             5.3.1
lit                      16.0.6
MarkupSafe               2.1.3
matplotlib-inline        0.1.6
mpmath                   1.3.0
multidict                6.0.4
multiprocess             0.70.15
nest-asyncio             1.5.7
networkx                 3.1
numpy                    1.25.2
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
omegaconf                2.3.0
packaging                23.1
pandas                   2.1.0
parso                    0.8.3
pexpect                  4.8.0
pickleshare              0.7.5
pip                      22.3.1
platformdirs             3.10.0
prompt-toolkit           3.0.39
psutil                   5.9.5
ptyprocess               0.7.0
pure-eval                0.2.2
pyarrow                  13.0.0
Pygments                 2.16.1
pynvml                   11.5.0
python-dateutil          2.8.2
pytz                     2023.3.post1
PyYAML                   6.0.1
pyzmq                    25.1.1
regex                    2023.8.8
requests                 2.31.0
responses                0.18.0
safetensors              0.3.3
scikit-learn             1.3.0
scipy                    1.11.2
setuptools               65.5.0
six                      1.16.0
stack-data               0.6.2
sympy                    1.12
threadpoolctl            3.2.0
tokenizers               0.13.3
torch                    2.0.1
tornado                  6.3.3
tqdm                     4.66.1
traitlets                5.10.0
transformers             4.33.2
triton                   2.0.0
typing_extensions        4.7.1
tzdata                   2023.3
urllib3                  2.0.4
wcwidth                  0.2.6
wheel                    0.41.2
xxhash                   3.3.0
yarl                     1.9.2
zstandard                0.21.0

The text was updated successfully, but these errors were encountered:

JonasGeiping · 2023-09-29T20:48:23Z

Ah sorry for that, the code is a bit dumb. You see the error messages because the converter expects checkpoints to contain identifiers for compiled model weights, but the checkpoint is saved without these.

If you run the load_local_model.py function with impl.compile_torch=False, it should work.

NtaylorOX · 2023-10-02T11:16:03Z

No problem - and I think you had already included those instructions somewhere, I just overlooked them. So that is on me.

I can close this issue for now, but one quick question. I am presuming with the modified architecture it will not currently work with the AutoClass from the HF library? I mainly ask as I saw the modelcard for your model: https://huggingface.co/JonasGeiping/crammed-bert implies you can.

Guessing part of the model card was autogenerated?

For now, to reload a model using the crammed-bert arch we need to use the codebase provided here?

Thanks again for the help and great repo

JonasGeiping · 2023-10-02T17:04:53Z

You can (if everything works correctly), if you import the cramming package first, as shown in the documentation. It will register the model as an additional AutoModelForMaskedLM.

NtaylorOX · 2023-10-03T13:52:59Z

You are absolutely correct - and apologies for not noticing that and probably wasting your time :). Thanks again

JonasGeiping · 2023-10-03T16:53:01Z

No problem

NtaylorOX · 2023-10-03T16:56:41Z

Sorry to come back again - but have now ran into one (I presume final) issue. When you want to use the AutoModelForSequenceClassification as defined by the crammed library, but loading a model that has been pre-trained using the MLM objective - it does not seem to allow adjusting the num_labels via the normal arguments passing.

e.g.

from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoModelForMaskedLM, AutoConfig
classifier_model = AutoModelForSequenceClassification.from_pretrained("JonasGeiping/crammed-bert", num_labels = 2)

it ends up passing None to torch.linear as it is actually looking for num_labels in the config file rather than arguments.

TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:

Any thoughts? My main desire is to use in a more straight forward fashion.

##########UPDATE#########

For now my crude fix is to replace the num_labels derivation inside crammed_bert.py from

self.num_labels = self.cfg.num_labels

Which uses the config created by the Omegconf class.

to:

self.num_labels = self.config.num_labels

Which uses the config provided from the AutoModel class.

It works - but doesn't seem ideal

JonasGeiping · 2023-10-03T17:09:41Z

Hm that seems like a reasonable fix for now. Really though, the whole translation between the hydra config that the model was originally trained with, and the config that huggingface expects is not so ideal in the long run.

NtaylorOX · 2023-10-03T17:11:59Z

Sure - its no problem really, my use case is quite specific and need to move away from the hydra config is all. It's great work and generally it meshes fine with huggingface.

NtaylorOX closed this as completed Sep 29, 2023

NtaylorOX reopened this Sep 29, 2023

NtaylorOX closed this as completed Oct 2, 2023

NtaylorOX reopened this Oct 3, 2023

JonasGeiping closed this as completed Oct 3, 2023

JonasGeiping mentioned this issue Jan 18, 2024

Finetuning for token classification #40

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uploading trained model to HF/saving in HF format locally #34

Uploading trained model to HF/saving in HF format locally #34

NtaylorOX commented Sep 26, 2023 •

edited

Loading

JonasGeiping commented Sep 29, 2023 •

edited

Loading

NtaylorOX commented Oct 2, 2023

JonasGeiping commented Oct 2, 2023

NtaylorOX commented Oct 3, 2023

JonasGeiping commented Oct 3, 2023

NtaylorOX commented Oct 3, 2023 •

edited

Loading

JonasGeiping commented Oct 3, 2023

NtaylorOX commented Oct 3, 2023

Uploading trained model to HF/saving in HF format locally #34

Uploading trained model to HF/saving in HF format locally #34

Comments

NtaylorOX commented Sep 26, 2023 • edited Loading

JonasGeiping commented Sep 29, 2023 • edited Loading

NtaylorOX commented Oct 2, 2023

JonasGeiping commented Oct 2, 2023

NtaylorOX commented Oct 3, 2023

JonasGeiping commented Oct 3, 2023

NtaylorOX commented Oct 3, 2023 • edited Loading

JonasGeiping commented Oct 3, 2023

NtaylorOX commented Oct 3, 2023

NtaylorOX commented Sep 26, 2023 •

edited

Loading

JonasGeiping commented Sep 29, 2023 •

edited

Loading

NtaylorOX commented Oct 3, 2023 •

edited

Loading