-
Notifications
You must be signed in to change notification settings - Fork 1
Implement DeepSeek3B-MoE-A570M (LM component) #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@sfallah I've got DeepSeek3B-MoE-A570M running with llama-cli. It generates responses, but sometimes it just outputs nonsense. This doesn't happen with the original model on text-only prompt. Something is still off though I have double-checked the configuration and architecture. Probably because of the tokenizer. |
|
@sfallah Fixed the bug. LM is ready now. Back to working on the vision model. Let me know if you need me to focus on any particular part. |
|
@bluebread The original pytorch impl is actually simple: And the same thing with get_rel_pos FYI: I have fixed some issues that I will push before merging. |
|
@sfallah I've implemented these operations in CUDA backend and opened a PR ggml-org#17383 to the main repository. You can get this feature from the op-dsocr-clean branch. |
|
@bluebread I am still investigating/experimenting with this. |
|
@sfallah nice! good idea to work around. where are we at with the vision model? does it runs yet? fyi you can copy-paste the code from examples/eval-callback.cpp and set cb_eval parameter to verify the model runs as expected. |
|
@bluebread FYI: I have been using https://github.com/ggml-org/ggml/blob/master/examples/sam/sam.cpp for testing replacement of ggml_win_part . So I can test the SAM changes isolated this way. |
|
@bluebread https://github.com/sfallah/llama.cpp/blob/sf/deepseek-ocr/tools/mtmd/clip.cpp#L2473 The functions are working. But the clip.cpp still has an issue, so the latest commit still doesn't run. |
|
@sfallah No problem. I'll take a look. |
|
@bluebread |
|
@sfallah It looks like the image encoder is still failing? Am I missing anything? The .gguf files are generated by the current converting script. I would appreciate it if you could provide your test settings. Edit: I am testing on sfallah/sf/deepseek-ocr branch. |
Make sure to read the contributing guidelines before submitting a PR
Implemented DeepSeek3B-MoE-A570M (the LM component of DeepSeek-OCR)
but haven't tested it through.Todo