Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Errors... #4

Open
Aniforka opened this issue May 11, 2024 · 6 comments
Open

Some Errors... #4

Aniforka opened this issue May 11, 2024 · 6 comments

Comments

@Aniforka
Copy link

Aniforka commented May 11, 2024

My notebook:
Windows 11 Pro 23H2
Intel i7-8750H
GeForce GTX 1050Ti (Mobile)
32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors:
TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position'
(and with other models also)

after adding *args and **kwargs to all forwards, another error appeared:
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

@drdsgvo
Copy link

drdsgvo commented May 15, 2024

I got the same error with transformers 4.40.1

@mindkrypted
Copy link

My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors: TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position' (and with other models also)

after adding *args and **kwargs to all forwards, another error appeared: RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

Using a 3090, I left flash attention and was getting the same 'cache_position' error.
As you tried, adding the *args and **kwargs results in the same error message.

Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at from .gemma import GemmaForCausalLM

@drdsgvo
Copy link

drdsgvo commented May 16, 2024

I can confirm all of the above: After fixing the parameters issues the error with the tensor size mismatch appeared.
The parameters issues seem to be explainable by a change in the transformers API interface.

@Aniforka
Copy link
Author

My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors: TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position' (and with other models also)

after adding *args and **kwargs to all forwards, another error appeared: RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

Using a 3090, I left flash attention and was getting the same 'cache_position' error.
As you tried, adding the *args and **kwargs results in the same error message.

Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at from .gemma import GemmaForCausalLM

It feels like the code was either generated by a neural network, or it wasn't tested at all before uploading to github

@web199195
Copy link

In fact, it cann't run。 A lot of errors. happened when run the code。 Parameters and. data dimension is not match.

@mindkrypted
Copy link

Might be a scam project to get some attention either for a grant or investors' money ... Have a look at another project where this guy is being targeted for using research and work from others: https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/discussions/23 -- llama3-V project is stealing a lot of academic work from MiniCPM-Llama3-V 2.5 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants