AssertionError: Input embedding matrix must match size: 250000 x 300, found torch.Size([100000, 300]) #1285

zhaolinlee · 2023-09-19T08:45:37Z

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

2023-09-19 16:26:47 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.5.1.json: 328kB [00:00, 1.08Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.5.1.json: 328kB [00:00, 1.08MB/s]
2023-09-19 16:26:48 INFO: "zh" is an alias for "zh-hans"
2023-09-19 16:26:49 INFO: Loading these models for language: zh-hans (Simplified_Chinese):
===================================
| Processor    | Package          |
-----------------------------------
| tokenize     | gsdsimp          |
| pos          | gsdsimp_charlm   |
| lemma        | gsdsimp_nocharlm |
| constituency | ctb-51_charlm    |
| depparse     | gsdsimp_charlm   |
| sentiment    | ren              |
| ner          | ontonotes        |
===================================

2023-09-19 16:26:49 INFO: Using device: cpu
2023-09-19 16:26:49 INFO: Loading: tokenize
2023-09-19 16:26:49 INFO: Loading: pos
2023-09-19 16:26:49 INFO: Loading: lemma
2023-09-19 16:26:50 INFO: Loading: constituency
2023-09-19 16:26:50 INFO: Loading: depparse
2023-09-19 16:26:50 INFO: Loading: sentiment
2023-09-19 16:26:50 INFO: Loading: ner
Traceback (most recent call last):
  File "run_700_Stanford.py", line 8, in <module>
    nlp = stanza.Pipeline('zh', use_gpu=False, ner_pretrain_path=ner_pretrain_path)
  File "C:\Users\lizhaolin\AppData\Roaming\Python\Python38\site-packages\stanza\pipeline\core.py", line 296, in __init__
    self.processors[processor_name] = NAME_TO_PROCESSOR_CLASS[processor_name](config=curr_processor_config,
  File "C:\Users\lizhaolin\AppData\Roaming\Python\Python38\site-packages\stanza\pipeline\processor.py", line 193, in __init__
    self._set_up_model(config, pipeline, device)
  File "C:\Users\lizhaolin\AppData\Roaming\Python\Python38\site-packages\stanza\pipeline\ner_processor.py", line 52, in _set_up_model
    trainer = Trainer(args=args, model_file=model_path, pretrain=pretrain, device=device, foundation_cache=pipeline.foundation_cache)
  File "C:\Users\lizhaolin\AppData\Roaming\Python\Python38\site-packages\stanza\models\ner\trainer.py", line 66, in __init__
    self.load(model_file, pretrain, args, foundation_cache)
  File "C:\Users\lizhaolin\AppData\Roaming\Python\Python38\site-packages\stanza\models\ner\trainer.py", line 161, in load
    self.model = NERTagger(self.args, self.vocab, emb_matrix=emb_matrix, foundation_cache=foundation_cache)
  File "C:\Users\lizhaolin\AppData\Roaming\Python\Python38\site-packages\stanza\models\ner\model.py", line 52, in __init__
    self.init_emb(emb_matrix)
  File "C:\Users\lizhaolin\AppData\Roaming\Python\Python38\site-packages\stanza\models\ner\model.py", line 123, in init_emb
    assert emb_matrix.size() == (vocab_size, dim), \
AssertionError: Input embedding matrix must match size: 250000 x 300, found torch.Size([100000, 300])

Expected behavior

"stanza_resources" has been downloaded when running the code

Environment (please complete the following information):

OS: [Windows11]
Python version: [Python 3.8.6]
Stanza version: [1.5.1]

Additional context
Thank you very much for your help and look forward to your reply.

The text was updated successfully, but these errors were encountered:

AngledLuffa · 2023-09-19T12:37:56Z

Sorry for the inconvenience. That should now be fixed.

…odel: #1285

zhaolinlee added the bug label Sep 19, 2023

AngledLuffa added a commit that referenced this issue Sep 19, 2023

Resources were built pointing to the wrong ZH embedding for the NER m…

82a0215

…odel: #1285

AngledLuffa closed this as completed Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AssertionError: Input embedding matrix must match size: 250000 x 300, found torch.Size([100000, 300]) #1285

AssertionError: Input embedding matrix must match size: 250000 x 300, found torch.Size([100000, 300]) #1285

zhaolinlee commented Sep 19, 2023 •

edited by AngledLuffa

AngledLuffa commented Sep 19, 2023

AssertionError: Input embedding matrix must match size: 250000 x 300, found torch.Size([100000, 300]) #1285

AssertionError: Input embedding matrix must match size: 250000 x 300, found torch.Size([100000, 300]) #1285

Comments

zhaolinlee commented Sep 19, 2023 • edited by AngledLuffa

AngledLuffa commented Sep 19, 2023

zhaolinlee commented Sep 19, 2023 •

edited by AngledLuffa