Skip to content

Fix audio processors defaulting to 16kHz in apply_chat_template#43660

Closed
jonathan-fulton wants to merge 3 commits intohuggingface:mainfrom
jonathan-fulton:fix/43262-audio-sampling-rate
Closed

Fix audio processors defaulting to 16kHz in apply_chat_template#43660
jonathan-fulton wants to merge 3 commits intohuggingface:mainfrom
jonathan-fulton:fix/43262-audio-sampling-rate

Conversation

@jonathan-fulton
Copy link
Contributor

What does this PR do?

Fixes #43262

Problem

The apply_chat_template() method always defaults to 16kHz sampling rate, even when the processor's feature extractor specifies a different rate:

processor = AutoProcessor.from_pretrained('sesame/csm-1b')
print(processor.feature_extractor.sampling_rate)  # 24000

batch = processor.apply_chat_template(
    chat_messages,
    return_dict=True,
    return_tensors='pt',
    tokenize=True,
)
print(batch['input_values'].shape)  # Shape indicates 16000 samples (wrong!)

# Workaround: explicitly pass sampling_rate
batch = processor.apply_chat_template(
    chat_messages,
    sampling_rate=processor.feature_extractor.sampling_rate,  # Have to manually specify!
    ...
)

Root Cause

The default sampling rate (16000) is hardcoded in ChatTemplateLoadKwargs:

class ChatTemplateLoadKwargs(TypedDict, total=False):
    sampling_rate: int | None = 16_000  # Hardcoded default
    ...

This default is used regardless of what the processor's feature extractor actually expects.

Solution

After setting up mm_load_kwargs, check if:

  1. The user didn't explicitly provide sampling_rate in kwargs
  2. The processor has a feature_extractor with a sampling_rate attribute

If both conditions are met, use the feature extractor's sampling rate as the default:

if 'sampling_rate' not in kwargs:
    processor_sampling_rate = getattr(getattr(self, 'feature_extractor', None), 'sampling_rate', None)
    if processor_sampling_rate is not None:
        mm_load_kwargs['sampling_rate'] = processor_sampling_rate

This is backwards compatible:

  • Users who explicitly pass sampling_rate still get their specified value
  • Processors without a feature_extractor or without a sampling_rate attribute fall back to 16kHz
  • Processors with proper audio support now automatically use the correct sampling rate

Who can review?

@ArthurZucker @itazap

Fixes huggingface#43262

The apply_chat_template() method was always using a hardcoded default
of 16kHz for the sampling_rate, even when the processor's feature
extractor specifies a different sampling rate (e.g., 24kHz for CSM).

This caused audio to be incorrectly resampled when using processors
like CsmProcessor:

  processor = AutoProcessor.from_pretrained('sesame/csm-1b')
  # processor.feature_extractor.sampling_rate = 24000
  batch = processor.apply_chat_template(chat_messages, ...)
  # Audio was resampled to 16kHz instead of 24kHz

The fix checks if the user explicitly provided sampling_rate in kwargs.
If not, and if the processor has a feature_extractor with a
sampling_rate attribute, that value is used as the default instead
of the hardcoded 16000.
@itazap
Copy link
Collaborator

itazap commented Feb 2, 2026

hey thanks for the PR! It would be great to add a simple test for this case 🙏

Adds test_apply_chat_template_audio_uses_processor_sampling_rate to verify
that apply_chat_template uses the processor's feature_extractor sampling rate
by default, rather than hardcoded 16kHz.

Regression test for huggingface#43262
@zucchini-nlp
Copy link
Member

zucchini-nlp commented Feb 2, 2026

It was fixed on main btw, sorry. The first PR by contributor got stale so we fixed it in #43674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audio processors: apply_chat_template() defaults to 16kHz sampling rate, even if the processor config sets a different value

3 participants