One million tokens prompt club #24622
fairydreaming
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Since there are more and more models that support 1M tokens context length (DeepSeek V4, MiMo-V2.5, MiniMax M3) let's try to get llama.cpp into better shape by trying 1M tokens long prompts in various models/backends and reporting/fixing any encountered errors. I attached a prompt file with 1048572 dot characters separated with spaces. This should tokenize to 1048572 tokens (checked in DeepSeek V4, to be confirmed in others).
I will start (DeepSeek V4 Flash, CUDA backend, CPU MoE offloading, [WIP] DeepSeek V4 branch):
Let's run it with CUDA compute-sanitizer to identify the source of the problem. I replaced the model with my 4-layer DeepSeek V4 so I won't have to wait hours for result. I also compiled llama.cpp with
cmake .. -DGGML_CUDA=1 -DCMAKE_BUILD_TYPE=Debug -DGGML_CUDA_DEBUG=1to have CUDA debug line info.Now I have a starting point to investigate more and create a bug report.
Inviting @AesSedai to try this with MiMo-V2.5.
prompt-1m.zip
Beta Was this translation helpful? Give feedback.
All reactions