-
Notifications
You must be signed in to change notification settings - Fork 13.8k
model : add LLADA 2.0 diffusion support #17454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Added LLaDA2.0 support
|
okay looked over the code a bit again, the weird tokens seem to be an issue with diffusion cli rendering itself on windows and the n_ubatch stuff seems to be a bit weird and also an issue with diffusion cli itself, ill take a close look tomorrow (; |
am17an
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review all of this. But preferably there should be no LLaDA 2.0 specific stuff in diffusion-cli, it should all be covered by diffusion-cli parameters and maybe gguf parameters. Since it already supports Dream and LLaDA 1.0 without any specific case handling, it should be possible to do this for LLaDA 2.0 as well. As such I don't find it the sampling to be very different from block-based sampling used in LLaDA 1.0
|
Is there a chance we can use LLADA in a masking mode with multiple masks within the input context, rather than just as a standard auto-regressive model? |
Thanks for the feedback! Ill give it a try (; |
Also ill try to implement a general model independent eos stop for such models with blocks, since i couldnt find any functionality that would do the same, should help future models too and ill use a gguf parameter to mark llada2.0 for it (; |
|
Okay i think ill need to do the same for the threshold feature as well as the way it truncates blocks to save time, so ill create parameters in the ggufs that will trigger and allow it to use threshold as well as truncate instead of full standard one. That way it should be fully backwards compatible and model independent (; |
Added diffusion parameters for LLaDA2.0 in GGUF writer.
Removed LLM_ARCH_LLADA2 case from switch statement.
Added hybrid diffusion optimization support to the diffusion parameters and processing logic.
This allows the model to use kv cache outside of the diffusion block speeding the model up when context builds up
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
Refactor confidence calculation and transfer logic for clarity and efficiency.
Add assertion for EOS token early stopping.
| int32_t batch_start_pos; | ||
|
|
||
| // Hybrid Diffusion: Commit previous block to KV cache | ||
| if (params.hybrid_diffusion && block_num > 0 && step == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep all perf improvements like hybrid diffusion (which IMO should be called use_kv_cache or something which can be toggled via hparams) for a separate PR. This PR should just focus on getting the model in correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then i probably will use that flag from the other pr as a toggle for the truncate stuff as well, so this would solve that problem too (;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since that one is also a performance improvement by not calculating the whole block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question though should that also be a cli arg in the other pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, i think there was an issue when not using truncate for some reason the output becomes gibberish? Like with the --diffusion-hybrid flag (which i now use to toggle the truncate as well) it works fine but without it it stops with a "ΓÇ£" token 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wsbagnsv1 as I said earlier, you should keep any kv-cache related stuff for a subsequent PR. Also I would suggest formatting your comments to keep them short and precise.
well the issue with that part is that i dont see a way to get rid of the truncate stuff without changing that part or a custom flag with truncate toggle. I might be missing something but i have no idea what /:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what you're saying, but you can ask a question in Q&A as I'm not too familiar with the kv cache. Nevertheless, your original implementation without any "hybrid diffusion" stuff was closer to being in a state where it could be merged. So my suggestion to you would be to get that in first, before tackling optimisations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well the issue with that part is that i dont see a way to get rid of the truncate stuff without changing that part or a custom flag with truncate toggle. I might be missing something but i have no idea what /:
Leave a comment explaining why you need it and it should be fine to add truncate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well in that case i would need to do the masking stuff in diffusion cli and use a truncate cli arg which would also suck /:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ill gonna ask in Q&A thanks for your feedback though! Maybe they have an idea how to get this done the right way (;
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
I recently created feature request #16973, but decided to implement it myself.
Status: Work in Progress (WIP)
This is my first PR here. I aimed for backwards compatibility, but the implementation currently contains some temporary workarounds (such as the
ubatchone in the diffusion CLI) which I will remove once the underlying issues are resolved.Current State:
Any feedback or assistance would be appreciated!