-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Errors... #4
Comments
I got the same error with transformers 4.40.1 |
Using a 3090, I left flash attention and was getting the same 'cache_position' error. Kinda hard to believe that the solution under ./src was tested before release. Even the import in main.py got a typo in it at |
I can confirm all of the above: After fixing the parameters issues the error with the tensor size mismatch appeared. |
It feels like the code was either generated by a neural network, or it wasn't tested at all before uploading to github |
In fact, it cann't run。 A lot of errors. happened when run the code。 Parameters and. data dimension is not match. |
Might be a scam project to get some attention either for a grant or investors' money ... Have a look at another project where this guy is being targeted for using research and work from others: https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5/discussions/23 -- llama3-V project is stealing a lot of academic work from MiniCPM-Llama3-V 2.5 ! |
My notebook:
Windows 11 Pro 23H2
Intel i7-8750H
GeForce GTX 1050Ti (Mobile)
32GB RAM (2666GHz)
After I removed the mention of flash_atn in gemma.py, I got the following errors:
TypeError: GemmaModel.forward() got an unexpected keyword argument 'cache_position'
(and with other models also)
after adding *args and **kwargs to all forwards, another error appeared:
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 3
All errors occurred after Loading checkpoint shards
The text was updated successfully, but these errors were encountered: