Some Errors... #4

Aniforka · 2024-05-11T21:14:23Z

My notebook:
Windows 11 Pro 23H2
Intel i7-8750H
GeForce GTX 1050Ti (Mobile)
32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors:
TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position'
(and with other models also)

after adding *args and **kwargs to all forwards, another error appeared:
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

The text was updated successfully, but these errors were encountered:

drdsgvo · 2024-05-15T10:45:23Z

I got the same error with transformers 4.40.1

mindkrypted · 2024-05-16T00:05:07Z

My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors: TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position' (and with other models also)

after adding *args and **kwargs to all forwards, another error appeared: RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

Using a 3090, I left flash attention and was getting the same 'cache_position' error.
As you tried, adding the *args and **kwargs results in the same error message.

Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at from .gemma import GemmaForCausalLM

drdsgvo · 2024-05-16T06:34:55Z

I can confirm all of the above: After fixing the parameters issues the error with the tensor size mismatch appeared.
The parameters issues seem to be explainable by a change in the transformers API interface.

Aniforka · 2024-05-16T07:14:05Z

My notebook: Windows 11 Pro 23H2 Intel i7-8750H GeForce GTX 1050Ti (Mobile) 32GB RAM (2666GHz)

After I removed the mention of flash_atn in gemma.py, I got the following errors: TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position' (and with other models also)

after adding *args and **kwargs to all forwards, another error appeared: RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

Traceback (most recent call last):
   File "d:\Programming\Python\MyGemma2B\1.py", line 42, in <module>
     generated_text = generate(
   File "d:\Programming\Python\MyGemma2B\1.py", line 17, in generate
     outputs = model(input_ids=input_segment.to(model.device), memory=memory, norm_term=norm_term)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 960, in forward
     outputs = self.model(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 783, in forward
     layer_outputs = decoder_layer(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 617, in forward
     _attended = self.self_attn(
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
     return self._call_impl(*args, **kwargs)
   File "C:\Users\Anime\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
     return forward_call(*args, **kwargs)
   File "d:\Programming\Python\MyGemma2B\gemma_modified.py", line 532, in forward
     attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3

All errors occurred after Loading checkpoint shards

Using a 3090, I left flash attention and was getting the same 'cache_position' error.
As you tried, adding the *args and **kwargs results in the same error message.

Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at from .gemma import GemmaForCausalLM

It feels like the code was either generated by a neural network, or it wasn't tested at all before uploading to github

web199195 · 2024-05-30T12:35:40Z

In fact, it cann't run。 A lot of errors. happened when run the code。 Parameters and. data dimension is not match.

mindkrypted · 2024-06-03T05:59:04Z

Might be a scam project to get some attention either for a grant or investors' money ... Have a look at another project where this guy is being targeted for using research and work from others: https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/discussions/23 -- llama3-V project is stealing a lot of academic work from MiniCPM-Llama3-V 2.5 !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some Errors... #4

Some Errors... #4

Aniforka commented May 11, 2024 •

edited

Loading

drdsgvo commented May 15, 2024

mindkrypted commented May 16, 2024

drdsgvo commented May 16, 2024

Aniforka commented May 16, 2024

web199195 commented May 30, 2024

mindkrypted commented Jun 3, 2024

Some Errors... #4

Some Errors... #4

Comments

Aniforka commented May 11, 2024 • edited Loading

drdsgvo commented May 15, 2024

mindkrypted commented May 16, 2024

drdsgvo commented May 16, 2024

Aniforka commented May 16, 2024

web199195 commented May 30, 2024

mindkrypted commented Jun 3, 2024

Aniforka commented May 11, 2024 •

edited

Loading