Fix audio processors defaulting to 16kHz in apply_chat_template#43660
Closed
jonathan-fulton wants to merge 3 commits intohuggingface:mainfrom
Closed
Fix audio processors defaulting to 16kHz in apply_chat_template#43660jonathan-fulton wants to merge 3 commits intohuggingface:mainfrom
jonathan-fulton wants to merge 3 commits intohuggingface:mainfrom
Conversation
Fixes huggingface#43262 The apply_chat_template() method was always using a hardcoded default of 16kHz for the sampling_rate, even when the processor's feature extractor specifies a different sampling rate (e.g., 24kHz for CSM). This caused audio to be incorrectly resampled when using processors like CsmProcessor: processor = AutoProcessor.from_pretrained('sesame/csm-1b') # processor.feature_extractor.sampling_rate = 24000 batch = processor.apply_chat_template(chat_messages, ...) # Audio was resampled to 16kHz instead of 24kHz The fix checks if the user explicitly provided sampling_rate in kwargs. If not, and if the processor has a feature_extractor with a sampling_rate attribute, that value is used as the default instead of the hardcoded 16000.
Collaborator
|
hey thanks for the PR! It would be great to add a simple test for this case 🙏 |
Adds test_apply_chat_template_audio_uses_processor_sampling_rate to verify that apply_chat_template uses the processor's feature_extractor sampling rate by default, rather than hardcoded 16kHz. Regression test for huggingface#43262
Member
|
It was fixed on |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #43262
Problem
The
apply_chat_template()method always defaults to 16kHz sampling rate, even when the processor's feature extractor specifies a different rate:Root Cause
The default sampling rate (16000) is hardcoded in
ChatTemplateLoadKwargs:This default is used regardless of what the processor's feature extractor actually expects.
Solution
After setting up
mm_load_kwargs, check if:sampling_ratein kwargsfeature_extractorwith asampling_rateattributeIf both conditions are met, use the feature extractor's sampling rate as the default:
This is backwards compatible:
sampling_ratestill get their specified valuefeature_extractoror without asampling_rateattribute fall back to 16kHzWho can review?
@ArthurZucker @itazap