Conversation
|
In general, it would be great if we could dump the raw latents into a gguf or something. Impl wise it would be like an image (in/out) where the vae is skipped. |
Or embed them as metadata in a preview image. Would be useful for i2i steps, too. |
|
I once considered whether to implement a similar mechanism, but I felt it would expose too much internal content. The loss of detail from reconstructing the latent through VAE encoding is also within an acceptable range. |
|
For anyone else trying: even with @leejet , would it be possible to split the compute graph even further? It's unfortunately out of reach for Vulkan right now. |
|
Did you not use |
@leejet , I did. On ROCm I get:
full log
Vulkan is similar. This is with
full log
Edit: running with
Details
|
The paper showcases advantages of decoding directly from latent:
Up-scaling from pixels is certainly useful but the original focus is still in replacing VAE. I am looking forward to have it implemented! |
|
Update: I was missing Vulkan is crashing with a ggml assert, seemingly independent of memory usage. I'll need to debug it further.
Details |
Summary
Related Issue / Discussion
N/A
Additional Information
Examples
before:

after:

Checklist