Add Action Recognition Block by Matvezy · Pull Request #2397 · roboflow/inference

Matvezy · 2026-06-02T08:56:54Z

What does this PR do?

Adds an Action Recognition workflow block that classifies actions in a video stream via an OpenAI-compatible VLM, sending a rolling window of recent frames and getting back a single letter from caller-defined choices. Key inputs: video image, prompt, choices, timing (window_seconds/stride_seconds/sample_fps), and provider settings (base_url/model_name/api_key); outputs letter, output, and error_status.

Type of Change

New feature (non-breaking change that adds functionality)

Testing

Tested locally

I have tested this change locally
I have added/updated tests for this change

Test details:
Unit tests cover manifest validation, the buffering/firing logic (mocked client), and the response parser across the answer shapes providers actually return. Also verified end-to-end locally in workflows with different api endpoints, and football and tennis videos for action recognition.

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

N/A

…er/action-rec Resolve loader.py import conflict (keep both action_recognition and openrouter imports). Collapse Action Recognition v1->v2: drop the old frame-count v1, make the time-domain block v1, and expand its docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Providers behind an OpenAI-compatible API (e.g. DeepInfra-proxied Gemini) do not always honor the response_format json-schema, returning the answer as a bare JSON string ("A"), under a different key ({"answer": "A"}), or in a short sentence. _parse_letter only accepted {"letter": "X"} and used raw[0] as a fallback, so every such answer parsed to None and the block emitted an empty letter on every frame. Harden the parser to accept these shapes (case- insensitive, word-boundary matched to avoid false hits like the B in "Based") and add regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Matvezy · 2026-06-02T09:08:55Z

@grzegorz-roboflow @PawelPeczek-Roboflow this PR is getting the action recognition block across the line, taking basically the latest state me and @probicheaux had it at working and merging. Would love your review whenever you have a chance 🙏

Matvezy and others added 4 commits May 14, 2026 01:09

cleanup

c221e8e

check in changes

dad35ce

Matvezy requested review from PawelPeczek-Roboflow, dkosowski87, grzegorz-roboflow, hansent, probicheaux, rafel-roboflow and yeldarby as code owners June 2, 2026 08:56

style

72bb77f

Matvezy changed the title ~~Peter/action rec~~ Add Action Recognition Block Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Action Recognition Block#2397

Add Action Recognition Block#2397
Matvezy wants to merge 5 commits into
mainfrom
peter/action-rec

Matvezy commented Jun 2, 2026 •

edited

Loading

Uh oh!

Matvezy commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Matvezy commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of Change

Testing

Checklist

Additional Context

Uh oh!

Matvezy commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Matvezy commented Jun 2, 2026 •

edited

Loading