feat(workflows): add Roboflow text-image-pairs model block by joaomarcoscrs · Pull Request #2261 · roboflow/inference

joaomarcoscrs · 2026-04-23T11:01:57Z

What

New Workflow block roboflow_core/roboflow_text_image_pairs_model@v1 ("Multimodal Model") — the missing sibling to the other Roboflow project-type blocks. Dispatches hosted (or local) inference for fine-tuned text-image-pairs models by model_id. Supported architectures server-side: PaliGemma 2, Florence 2, Qwen 2.5 VL, Qwen 3 VL, Qwen 3.5, SmolVLM2, SmolVLM 256M.

Why

/infer/lmm/{model_id} already serves these models end-to-end. The only gap was a Workflow block wrapper — these project types couldn't be used in workflows despite the backend being ready.

Shape

Inputs: images, model_id, optional prompt, disable_active_learning, active_learning_target_dataset
Output: single response of kind LANGUAGE_MODEL_OUTPUT_KIND — raw pass-through, no parsing
Local path: LMMInferenceRequest (AL controls honored)
Remote path: InferenceHTTPClient.infer_lmm(model_id_in_path=True) with InferenceConfiguration(source="workflow-execution", ...)
Composes with vlm_as_detector / vlm_as_classifier downstream for Florence-2 fine-tunes today

Known limitations (documented in code)

inference_sdk.infer_lmm doesn't yet accept AL kwargs or honor max_batch_size / max_concurrent_requests. AL controls propagate only on the local path; remote path still calls client.configure(...) so source tags telemetry. SDK-side follow-up.

Tested

12/12 unit tests pass
Verified live against public Universe model https://universe.roboflow.com/roboflow-jvuqo/pallet-load-manifest-json/model/3 running on a local inference server built from this branch

Follow-ups (separate PRs)

Extend inference_sdk.infer_lmm to accept AL kwargs + forward InferenceConfiguration.source
Add paligemma / qwen / smolvlm strategies to vlm_as_detector / vlm_as_classifier so non-Florence-2 fine-tunes get structured parsing

Note

Medium Risk
Adds a new workflow model block that issues local or remote LMM/VLM inference requests and exposes active-learning controls, which could affect runtime behavior and external API usage if misconfigured.

Overview
Adds a new Workflow block, roboflow_core/roboflow_text_image_pairs_model@v1, to run Roboflow text-image-pairs (multimodal/VLM) models by model_id and return the raw response as LANGUAGE_MODEL_OUTPUT_KIND.

The block supports both local execution (via ModelManager + LMMInferenceRequest, including active-learning flags) and remote execution (via InferenceHTTPClient.infer_lmm with workflow-specific InferenceConfiguration), and is registered in the workflow loader so it becomes available to workflows. Unit tests were added to validate the new block manifest and output contract.

^{Reviewed by Cursor Bugbot for commit e5010b2. Bugbot is set up for automated code reviews on this repo. Configure here.}

…ge-pairs block

…age-pairs-block # Conflicts: # inference/core/workflows/core_steps/loader.py

…imodal Model"

cursor · 2026-04-23T11:56:19Z

+            prediction = self._model_manager.infer_from_request_sync(
+                model_id=model_id, request=request
+            )
+            predictions.append({"response": prediction.response})


Response may be dict, breaking downstream string-expecting blocks

Medium Severity

LMMInferenceResponse.response is typed Union[str, dict], and the block passes it through raw as the response output. However, the declared output kind LANGUAGE_MODEL_OUTPUT_KIND has internal_data_type="str", and downstream blocks like vlm_as_detector call string2json(raw_json=vlm_output) which runs a regex .findall() on the value — this will fail if response is a dict (e.g. Florence-2 fine-tunes return structured dicts). The Florence-2 block avoids this by wrapping with json.dumps(). Both the local path (prediction.response) and remote path (result.get("response")) are affected.

Additional Locations (1)

inference/core/workflows/core_steps/models/roboflow/text_image_pairs/v1.py#L248-L249

^{Reviewed by Cursor Bugbot for commit af1d82f. Configure here.}

hansent · 2026-04-23T14:14:54Z

don't we have individual blocks for most of those?

joaomarcoscrs · 2026-04-23T14:58:34Z

@hansent we have for the base models, not fine-tunes

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e5010b2. Configure here.}

cursor · 2026-04-23T19:46:10Z

+                prompt=prompt or "",
+                disable_active_learning=disable_active_learning,
+                active_learning_target_dataset=active_learning_target_dataset,
+            )


AL controls silently dropped by LMMInferenceRequest

Medium Severity

LMMInferenceRequest inherits from CVInferenceRequest, which does not define disable_active_learning or active_learning_target_dataset fields. Those fields live on ObjectDetectionInferenceRequest and ClassificationInferenceRequest instead. Because Pydantic v2 defaults to extra='ignore', the two AL kwargs passed to the constructor are silently discarded. The local path therefore does not actually honor active learning controls, despite the code, comments, and PR description claiming it does.

^{Reviewed by Cursor Bugbot for commit e5010b2. Configure here.}

hansent · 2026-04-23T22:20:41Z

@hansent we have for the base models, not fine-tunes

I think on those blocks you can set the model id to your fine tuned model.

not opposed to having a general block like we do for object detection / classification, but I think the aprameter / config on the LMM models is pretty different from model to model sometimes?

joaomarcoscrs added 3 commits April 22, 2026 18:54

feat(workflows): add roboflow text-image-pairs model block

a28d3db

feat(workflows): thread active learning + remote config into text-ima…

1efb57a

…ge-pairs block

Merge remote-tracking branch 'origin/main' into feat/roboflow-text-im…

68af33e

…age-pairs-block # Conflicts: # inference/core/workflows/core_steps/loader.py

joaomarcoscrs self-assigned this Apr 23, 2026

chore(workflows): rename text-image-pairs block display name to "Mult…

af1d82f

…imodal Model"

joaomarcoscrs marked this pull request as ready for review April 23, 2026 11:51

joaomarcoscrs requested review from PawelPeczek-Roboflow, dkosowski87, grzegorz-roboflow, hansent, probicheaux, rafel-roboflow and yeldarby as code owners April 23, 2026 11:51

cursor Bot reviewed Apr 23, 2026

View reviewed changes

Merge branch 'main' into feat/roboflow-text-image-pairs-block

e5010b2

cursor Bot reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflows): add Roboflow text-image-pairs model block#2261

feat(workflows): add Roboflow text-image-pairs model block#2261
joaomarcoscrs wants to merge 5 commits intomainfrom
feat/roboflow-text-image-pairs-block

joaomarcoscrs commented Apr 23, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot Apr 23, 2026

Uh oh!

hansent commented Apr 23, 2026

Uh oh!

joaomarcoscrs commented Apr 23, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 23, 2026

Uh oh!

hansent commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joaomarcoscrs commented Apr 23, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Shape

Known limitations (documented in code)

Tested

Follow-ups (separate PRs)

Uh oh!

cursor Bot Apr 23, 2026

Choose a reason for hiding this comment

Response may be dict, breaking downstream string-expecting blocks

Uh oh!

hansent commented Apr 23, 2026

Uh oh!

joaomarcoscrs commented Apr 23, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 23, 2026

Choose a reason for hiding this comment

AL controls silently dropped by LMMInferenceRequest

Uh oh!

hansent commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joaomarcoscrs commented Apr 23, 2026 •

edited by cursor Bot

Loading