feat(mtmd): add Eagle2-VL multimodal support (mmproj + SigLIP pipeline) #17224

YaelGitAccount · 2025-11-12T22:04:13Z

This PR adds initial support for NVIDIA's Eagle2-VL vision-language models in llama.cpp, addressing #16704.

The goal is to enable GGUF conversion of the Eagle2-VL mmproj and run basic multimodal inference via llama-mtmd-cli, while keeping the changes fully isolated to Eagle2-VL and leaving all other models unaffected.

What this PR does

1. GGUF conversion: Eagle2-VL mmproj

Extend convert_hf_to_gguf.py to recognize the Eagle2_5_VLForConditionalGeneration architecture when --mmproj is used.
Parse the Eagle2-VL HF config to extract the projector / hidden sizes needed for the mmproj graph.
Add a dedicated tensor mapping entry in gguf/tensor_mapping.py for the Eagle2-VL mmproj so that:
- All projector weights are mapped deterministically into GGUF.
- Shapes are consistent with the SigLIP image tower and the Qwen2.5 text model used by Eagle2-VL.

2. Runtime: SigLIP → mmproj → text integration in `mtmd` CLIP graph

Extend tools/mtmd/clip.cpp with an Eagle2-VL–specific branch that:
- Wires the SigLIP vision tower output into the Eagle2-VL mmproj.
- Produces projector outputs in the expected [hidden, tokens] layout for the downstream text model.
- Respects the scale / merge behavior described by the Eagle2-VL configuration.
The new path is fully guarded:
- Only triggers when the model is detected as Eagle2-VL mmproj.
- Falls back cleanly for all other CLIP / mmproj models.

3. Scope and safety

No changes to the core llama model, kv-cache, sampling, or quantization logic.
No behavioral changes for non-Eagle models:
- Existing CLIP / mmproj users should see identical behavior.
Runtime guards keep the Eagle2-specific logic isolated to:
- convert_hf_to_gguf.py (mmproj conversion)
- gguf/tensor_mapping.py (mmproj tensor mapping)
- tools/mtmd/clip.cpp (vision → projector → text graph)

Tested models

Conversion and inference were tested end-to-end on:

nvidia/Eagle2-1B
nvidia/Eagle2-2B

For both models:

convert_hf_to_gguf.py --mmproj --outtype f16 produces a GGUF mmproj that loads successfully.
llama-mtmd-cli can:
- Encode an image through the SigLIP tower.
- Run the Eagle2-VL projector.
- Generate text conditioned on the image.

At this stage, 9B support is intentionally left out of scope to keep the diff small and focused.

Notes

The PR does not introduce any new public CLI flags.
Logging is kept minimal and aligned with existing mtmd style.
The Eagle2-VL path is designed so that, if the architecture is not detected, behavior is unchanged.

Co-authors

This work was done together with @YaelLogic as part of a focused effort to add Eagle2-VL multimodal support to llama.cpp.

…branch) Co-authored-by: YaelLogic <y0548591250@gmail.com>

Co-authored-by: YaelGitAccount <ya0504124870@gmail.com>

…rker logs for Eagle2-VL

…ing, and shape validation

…ing, and shape validation\n\n- Scope strictly to Eagle2-VL with config/arch guards\n- Remove temporary debug logs; preserve upstream logging semantics\n- Keep tokenizer/encode path clean; no behavior change for non-vision models

…L logic isolated in clip.cpp

ngxson

I don't feel quite comfortable of patching quite a lot of existing code just to make this model to work. Probably this PR will break many existing models.

Instead, you should add its own conversion class, PROJECTOR_TYPE and cgraph build_... function

ngxson · 2025-11-12T22:38:32Z

convert_hf_to_gguf.py

                if fullatt_block_indexes[i] - fullatt_block_indexes[i - 1] != n_wa_pattern:
                    raise ValueError(f"Invalid fullatt_block_indexes: {fullatt_block_indexes}")
            self.gguf_writer.add_vision_n_wa_pattern(n_wa_pattern)
+        elif model_type in ['eagle_2_5_vl', 'eagle2_vl', 'eagle2_5_vl']:


please add a dedicated conversion class and dedicated projector type

ngxson · 2025-11-12T22:39:08Z

tools/mtmd/clip.cpp

@@ -12,6 +9,7 @@

 #include <cassert>
 #include <cmath>
+#include <cstdio>


unused header

ngxson · 2025-11-12T22:40:08Z

tools/mtmd/clip.cpp

@@ -185,6 +183,11 @@ struct clip_hparams {

    patch_merge_type mm_patch_merge_type = PATCH_MERGE_FLAT;

+    int32_t patch_merge_factor = 1;      
+    std::string patch_merge_mode = "flat";  


use enum patch_merge_type instead

ngxson · 2025-11-12T22:40:32Z

tools/mtmd/clip.cpp

@@ -185,6 +183,11 @@ struct clip_hparams {

    patch_merge_type mm_patch_merge_type = PATCH_MERGE_FLAT;

+    int32_t patch_merge_factor = 1;      


use n_merge instead

ngxson · 2025-11-12T22:42:43Z

tools/mtmd/clip.cpp

+    const size_t plane_sz = (size_t) dst.nx * (size_t) dst.ny;
+    dst.buf.resize(3 * plane_sz); // planar RGB
+
+
+
+    for (int y = 0; y < dst.ny; ++y) {
+        for (int x = 0; x < dst.nx; ++x) {
+            size_t base = (size_t) y * (size_t) dst.nx + (size_t) x;
+            for (int c = 0; c < 3; ++c) {
+                size_t src_idx = 3ull * base + (size_t) c; // interleaved in src
+                float raw = static_cast<float>(src.buf[src_idx]) / 255.0f;
+                float v = (raw - mean[c]) / std[c];
+                size_t dst_idx = (size_t) c * plane_sz + base; // planar in dst
+                dst.buf[dst_idx] = v;
+
+            }
+        }


what happen here?

YaelGitAccount and others added 8 commits November 3, 2025 16:12

feat(gguf): add Eagle2-VL mmproj support (register arch + model_type …

466f909

…branch) Co-authored-by: YaelLogic <y0548591250@gmail.com>

tools/mtmd/clip.cpp: Eagle2_VL mmproj fix

9a2e7b4

Co-authored-by: YaelGitAccount <ya0504124870@gmail.com>

convert_hf_to_gguf.py: Eagle2_VL support update

c5be215

Co-authored-by: YaelGitAccount <ya0504124870@gmail.com>

mtmd: instrumentation + preprocessing parity helper; stats & media ma…

14c56d7

…rker logs for Eagle2-VL

mtmd: remove unused debug-only variables and no-op casts

7bc9c9e

feat(mtmd): integrate Eagle2-VL mmproj — media ingest, projector rout…

d6cb43b

…ing, and shape validation

chore(mtmd): revert mtmd{,-cli}.cpp to upstream/master; keep Eagle2-V…

f83bb7b

…L logic isolated in clip.cpp

YaelGitAccount requested review from CISC and ngxson as code owners November 12, 2025 22:04

ngxson requested changes Nov 12, 2025

View reviewed changes

github-actions bot added examples python python script changes labels Nov 13, 2025

YaelGitAccount closed this Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mtmd): add Eagle2-VL multimodal support (mmproj + SigLIP pipeline) #17224

feat(mtmd): add Eagle2-VL multimodal support (mmproj + SigLIP pipeline) #17224

YaelGitAccount commented Nov 12, 2025

Uh oh!

ngxson left a comment •

edited

Loading

Uh oh!

ngxson Nov 12, 2025

Uh oh!

ngxson Nov 12, 2025

Uh oh!

ngxson Nov 12, 2025

Uh oh!

ngxson Nov 12, 2025

Uh oh!

ngxson Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -185,6 +183,11 @@ struct clip_hparams {

		patch_merge_type mm_patch_merge_type = PATCH_MERGE_FLAT;

		int32_t patch_merge_factor = 1;

feat(mtmd): add Eagle2-VL multimodal support (mmproj + SigLIP pipeline) #17224

feat(mtmd): add Eagle2-VL multimodal support (mmproj + SigLIP pipeline) #17224

Conversation

YaelGitAccount commented Nov 12, 2025

What this PR does

1. GGUF conversion: Eagle2-VL mmproj

2. Runtime: SigLIP → mmproj → text integration in mtmd CLIP graph

3. Scope and safety

Tested models

Notes

Co-authors

Uh oh!

ngxson left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2. Runtime: SigLIP → mmproj → text integration in `mtmd` CLIP graph

ngxson left a comment •

edited

Loading