ComfyUI-Krea2TexTEncoder

Text Encode (Krea2) — vision-aware text conditioning for the Krea2 / K2 model (kreaturbo.safetensors), whose text encoder is Qwen3-VL-4B (12-layer tap).

Why this node exists

Krea2's text encoder is a vision-language model, so a reference image can be pushed through its vision path to make the conditioning visually aware of that image — a "prompt from a picture" effect. People were reaching for the core TextEncodeQwenImageEdit node to do this, which has two problems for Krea2:

The VAE input does nothing. Krea2's DiT (comfy/ldm/krea2/model.py, SingleStreamDiT) builds its token sequence as [text_tokens, noisy_image_patches] — there is no slot for a reference latent, and Krea2.extra_conds (comfy/model_base.py) never reads reference_latents. So a connected VAE produces a reference_latents entry the model silently discards. (Real pixel-faithful editing would require training a reference-latent pathway into the DiT — it can't be done from a node alone.)
Wrong template with images. With an image attached, the core node falls back to Qwen3-VL's plain image template instead of the Krea2 descriptor template the model was conditioned with.

This node fixes both: it forces the Krea2 descriptor template even with images, and it omits the VAE input entirely. It also accepts an unbounded, auto-growing set of image+mask pairs (connect image1, a fresh image2/mask2 pair appears, and so on).

Masks

Each reference image has an optional companion mask. When connected, the image is cropped to the mask's bounding box before the vision encoder, so the VLM only sees the masked region. Use mask_padding to keep surrounding context: it grows the crop box by that fraction of the image size on each side (0 = tight crop, 0.1 ≈ 10% margin, high values ≈ the whole image). The mask is used only to compute the crop — it is not itself sent to the VLM. This is reference-image masking — not inpainting: Krea2 has no concat/inpaint pathway to regenerate a masked region of the output. (To spatially restrict where the prompt applies in the output, use ComfyUI's standard ConditioningSetMask downstream of this node — that works generically at the sampler level.)

Inputs

Input	Type	Notes
`clip`	CLIP	Load with CLIPLoader → type `krea2`.
`prompt`	STRING	Your text prompt.
`image1…N`	IMAGE	Optional reference images; slots grow as you connect them.
`mask1…N`	MASK	Optional per-image mask; crops `imageN` to the masked bounding box.
`mask_padding`	FLOAT	Context kept around the mask, as a fraction of image size per side (`0` = tight, default).
`system_prompt`	STRING input	Optional. Wire a text node to override the system instruction; unconnected = Krea2's default descriptor. See below.
`vision_position`	CHOICE	Image tokens `before prompt` (default) or `after prompt` in the user turn.
`print_prompt`	BOOLEAN	Print the assembled Qwen3-VL prompt to the ComfyUI console (debug).
`vision_megapixels`	FLOAT	Max size before the vision encoder; references are downscaled to this cap, never upscaled (default `1.0`).

Output: conditioning for the Krea2 sampler. (print_prompt dumps the assembled Qwen3-VL prompt to the console for debugging.)

System prompt (making the prompt interact with the image)

system_prompt is a connectable text input, not a widget — leave it unconnected to use Krea2's trained descriptor (the in-distribution default), or wire a text node to override it. Provide just the instruction text; the node wraps it in the chat-template scaffolding.

By default the VLM is only told to describe the image, so your prompt and the reference sit side by side. To make them interact (à la Qwen-Image-Edit), feed an instruct-style instruction. The package ships a Krea2 System Prompt node preloaded with that instruction — just drop it and wire its output into system_prompt (edit the text if you like). Its default text is:

Describe the key features of the reference image (color, shape, size, texture, objects, background), then explain how the user's instruction should combine with or alter it, and generate a new image meeting the instruction while staying consistent with the reference where appropriate:

This is experimental / out-of-distribution — Krea2 was trained on the fixed descriptor, so results may drift. A/B it against the default.

Output is a standard CONDITIONING for the Krea2 sampler. With no images connected it works as a plain Krea2 text encoder.

Install

Symlink (or copy) this repo into ComfyUI's custom_nodes/, matching the existing setup:

ln -s /media/p5/ComfyUI-Krea2TexTEncoder /media/p5/Comfyui/custom_nodes/ComfyUI-Krea2TexTEncoder

Then restart ComfyUI.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
assets		assets
web		web
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
nodes.py		nodes.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComfyUI-Krea2TexTEncoder

Why this node exists

Masks

Inputs

System prompt (making the prompt interact with the image)

Install

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-Krea2TexTEncoder

Why this node exists

Masks

Inputs

System prompt (making the prompt interact with the image)

Install

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages