Skip to content

Fix/video classification pipeline video processor#46256

Merged
zucchini-nlp merged 12 commits into
huggingface:mainfrom
J3r3myPerera:fix/video-classification-pipeline-video-processor
Jun 5, 2026
Merged

Fix/video classification pipeline video processor#46256
zucchini-nlp merged 12 commits into
huggingface:mainfrom
J3r3myPerera:fix/video-classification-pipeline-video-processor

Conversation

@J3r3myPerera
Copy link
Copy Markdown
Contributor

Fixes the video classification pipeline that fails for models with only a video processor. This is achieved by adding VideoProcessor classes to the two remaining video classification models that did not have one.

There are four models in MODEL_FOR_VIDEO_CLASSIFICATION_MAPPING_NAMES. Vjepa2 and Videomae already had video processors listed in VIDEO_PROCESSOR_MAPPING_NAMES.
This PR adds the two missing ones:

  • TimesformerVideoProcessor, which uses standard 224×224 ImageNet preprocessing (1/255 rescale, IMAGENET_DEFAULT mean/std).
  • VivitVideoProcessor, which resizes the shortest edge to 256, then centers crops to 224. It includes a rescale_and_normalize override that applies image * (1/127.5) - 1 to match the offset=True behavior of the existing VivitImageProcessor.

Both are registered in VIDEO_PROCESSOR_MAPPING_NAMES and exported from their respective init.py files.

Fixes #41950

  • I confirm that this is not a pure code agent PR.

Before submitting

Who can review?

@zucchini-nlp

…eline

VideoClassificationPipeline previously hardcoded _load_image_processor=True,
causing an OSError when loading models that only have a video processor
(e.g. facebook/vjepa2-vitl-fpc64-256).

- Add _load_video_processor flag support to pipeline base classes
- Add _resolve_video_processor helper in pipelines/__init__.py
- Update VideoClassificationPipeline to load both processors optionally,
  preferring video_processor when available, falling back to image_processor
  for legacy models (VideoMAE, ViViT, TimeSformer)

Fixes huggingface#41950
@zucchini-nlp zucchini-nlp self-requested a review May 28, 2026 11:36
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, thanks a lot!

Left a few comments to deprecate out image processor, and use video processor to its full capacity

Comment thread src/transformers/pipelines/__init__.py Outdated
Comment thread src/transformers/pipelines/__init__.py Outdated
Comment thread src/transformers/pipelines/__init__.py Outdated
Comment thread src/transformers/pipelines/video_classification.py
Comment thread src/transformers/pipelines/video_classification.py Outdated
@J3r3myPerera
Copy link
Copy Markdown
Contributor Author

Great job, thanks a lot!

Left a few comments to deprecate out image processor, and use video processor to its full capacity

Sure I'll fix them up and update the PR.

@J3r3myPerera
Copy link
Copy Markdown
Contributor Author

@zucchini-nlp I have updated the PR with the latest changes as part of the comments left.

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments

Comment thread src/transformers/pipelines/video_classification.py Outdated
Comment thread src/transformers/pipelines/video_classification.py Outdated
@J3r3myPerera
Copy link
Copy Markdown
Contributor Author

Last comments

PR is updated with related to the comments

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took some liberty to address the last comment which was not addressed, will merge after testing

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp enabled auto-merge June 1, 2026 08:34
@zucchini-nlp zucchini-nlp disabled auto-merge June 5, 2026 12:27
@zucchini-nlp zucchini-nlp enabled auto-merge June 5, 2026 12:28
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, timesformer, vivit

@zucchini-nlp zucchini-nlp added this pull request to the merge queue Jun 5, 2026
Merged via the queue into huggingface:main with commit 98eea72 Jun 5, 2026
29 checks passed
@J3r3myPerera J3r3myPerera deleted the fix/video-classification-pipeline-video-processor branch June 6, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

video-classification pipeline looks for image processors

3 participants