feat(workflows): add Roboflow text-image-pairs model block#2261
feat(workflows): add Roboflow text-image-pairs model block#2261joaomarcoscrs wants to merge 5 commits intomainfrom
Conversation
…age-pairs-block # Conflicts: # inference/core/workflows/core_steps/loader.py
| prediction = self._model_manager.infer_from_request_sync( | ||
| model_id=model_id, request=request | ||
| ) | ||
| predictions.append({"response": prediction.response}) |
There was a problem hiding this comment.
Response may be dict, breaking downstream string-expecting blocks
Medium Severity
LMMInferenceResponse.response is typed Union[str, dict], and the block passes it through raw as the response output. However, the declared output kind LANGUAGE_MODEL_OUTPUT_KIND has internal_data_type="str", and downstream blocks like vlm_as_detector call string2json(raw_json=vlm_output) which runs a regex .findall() on the value — this will fail if response is a dict (e.g. Florence-2 fine-tunes return structured dicts). The Florence-2 block avoids this by wrapping with json.dumps(). Both the local path (prediction.response) and remote path (result.get("response")) are affected.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit af1d82f. Configure here.
|
don't we have individual blocks for most of those? |
|
@hansent we have for the base models, not fine-tunes |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e5010b2. Configure here.
| prompt=prompt or "", | ||
| disable_active_learning=disable_active_learning, | ||
| active_learning_target_dataset=active_learning_target_dataset, | ||
| ) |
There was a problem hiding this comment.
AL controls silently dropped by LMMInferenceRequest
Medium Severity
LMMInferenceRequest inherits from CVInferenceRequest, which does not define disable_active_learning or active_learning_target_dataset fields. Those fields live on ObjectDetectionInferenceRequest and ClassificationInferenceRequest instead. Because Pydantic v2 defaults to extra='ignore', the two AL kwargs passed to the constructor are silently discarded. The local path therefore does not actually honor active learning controls, despite the code, comments, and PR description claiming it does.
Reviewed by Cursor Bugbot for commit e5010b2. Configure here.
I think on those blocks you can set the model id to your fine tuned model. not opposed to having a general block like we do for object detection / classification, but I think the aprameter / config on the LMM models is pretty different from model to model sometimes? |


What
New Workflow block
roboflow_core/roboflow_text_image_pairs_model@v1("Multimodal Model") — the missing sibling to the other Roboflow project-type blocks. Dispatches hosted (or local) inference for fine-tuned text-image-pairs models bymodel_id. Supported architectures server-side: PaliGemma 2, Florence 2, Qwen 2.5 VL, Qwen 3 VL, Qwen 3.5, SmolVLM2, SmolVLM 256M.Why
/infer/lmm/{model_id}already serves these models end-to-end. The only gap was a Workflow block wrapper — these project types couldn't be used in workflows despite the backend being ready.Shape
images,model_id, optionalprompt,disable_active_learning,active_learning_target_datasetresponseof kindLANGUAGE_MODEL_OUTPUT_KIND— raw pass-through, no parsingLMMInferenceRequest(AL controls honored)InferenceHTTPClient.infer_lmm(model_id_in_path=True)withInferenceConfiguration(source="workflow-execution", ...)vlm_as_detector/vlm_as_classifierdownstream for Florence-2 fine-tunes todayKnown limitations (documented in code)
inference_sdk.infer_lmmdoesn't yet accept AL kwargs or honormax_batch_size/max_concurrent_requests. AL controls propagate only on the local path; remote path still callsclient.configure(...)sosourcetags telemetry. SDK-side follow-up.Tested
Follow-ups (separate PRs)
inference_sdk.infer_lmmto accept AL kwargs + forwardInferenceConfiguration.sourcepaligemma/qwen/smolvlmstrategies tovlm_as_detector/vlm_as_classifierso non-Florence-2 fine-tunes get structured parsingNote
Medium Risk
Adds a new workflow model block that issues local or remote LMM/VLM inference requests and exposes active-learning controls, which could affect runtime behavior and external API usage if misconfigured.
Overview
Adds a new Workflow block,
roboflow_core/roboflow_text_image_pairs_model@v1, to run Roboflow text-image-pairs (multimodal/VLM) models bymodel_idand return the rawresponseasLANGUAGE_MODEL_OUTPUT_KIND.The block supports both local execution (via
ModelManager+LMMInferenceRequest, including active-learning flags) and remote execution (viaInferenceHTTPClient.infer_lmmwith workflow-specificInferenceConfiguration), and is registered in the workflowloaderso it becomes available to workflows. Unit tests were added to validate the new block manifest and output contract.Reviewed by Cursor Bugbot for commit e5010b2. Bugbot is set up for automated code reviews on this repo. Configure here.