[multimodal] Let Audio take float data blob #14971

pytorchbot · 2025-10-09T22:17:21Z

If the processed audio went through Mel transform, the spectrogram are float values. We should allow Audio class to be able to take this, since multimodal runner pybind API will have to be able to take processed input. Once we have the pybind API we can do something like:

model_id = "mistralai/Voxtral-Mini-3B-2507"
processor = AutoProcessor.from_pretrained(model_id)
audio_url = "https://huggingface.co/datasets/eustlb/audio-samples/resolve/main/dude_where_is_my_car.wav"
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "audio", "url": audio_url},
            {
                "type": "text",
                "text": "What can you tell me about this audio?",
            },
        ],
    },
]
inputs = processor.apply_chat_template(conversation,
    tokenize=True,
    return_dict=True,
    return_tensors="pt")

inputs_combined = [
    make_text_input("<s>[INST][BEGIN_AUDIO]"),
    make_audio_input(inputs["input_features"]),
    make_text_input("\nWhat can you tell me about this audio?[/INST]"),
]
runner = MultimodalRunner("voxtral.pte", "tekken.json", None)
config = GenerationConfig()
config.max_new_tokens = 100
runner.generate(inputs_combined, config)

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

If the processed audio went through Mel transform, the spectrogram are float values. We should allow `Audio` class to be able to take this, since multimodal runner pybind API will have to be able to take processed input. Once we have the pybind API we can do something like: ```python model_id = "mistralai/Voxtral-Mini-3B-2507" processor = AutoProcessor.from_pretrained(model_id) audio_url = "https://huggingface.co/datasets/eustlb/audio-samples/resolve/main/dude_where_is_my_car.wav" conversation = [ { "role": "user", "content": [ {"type": "audio", "url": audio_url}, { "type": "text", "text": "What can you tell me about this audio?", }, ], }, ] inputs = processor.apply_chat_template(conversation, tokenize=True, return_dict=True, return_tensors="pt") inputs_combined = [ make_text_input("<s>[INST][BEGIN_AUDIO]"), make_audio_input(inputs["input_features"]), make_text_input("\nWhat can you tell me about this audio?[/INST]"), ] runner = MultimodalRunner("voxtral.pte", "tekken.json", None) config = GenerationConfig() config.max_new_tokens = 100 runner.generate(inputs_combined, config) ``` ### Summary [PLEASE REMOVE] See [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests) for ExecuTorch PR guidelines. [PLEASE REMOVE] If this PR closes an issue, please add a `Fixes #<issue-id>` line. [PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: <area>" label. For a list of available release notes labels, check out [CONTRIBUTING.md's Pull Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests). ### Test plan [PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable. (cherry picked from commit 8b11418)

pytorch-bot · 2025-10-09T22:17:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14971

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job

As of commit af94fa5 with merge base e0dda90 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-buck / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

pull / unittest-editable / macos / macos-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorchbot requested review from jackzhxng, larryliu0820, lucylq, mergennachin and swolchok as code owners October 9, 2025 22:17

pytorchbot mentioned this pull request Oct 9, 2025

[v1.0.0] Release Tracker #14288

Open

pytorchbot mentioned this pull request Oct 9, 2025

[multimodal] Let Audio take float data blob #14427

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2025

larryliu0820 approved these changes Oct 9, 2025

View reviewed changes

larryliu0820 merged commit 85efbc2 into release/1.0 Oct 10, 2025
120 of 124 checks passed

larryliu0820 deleted the cherry-pick-14427-by-pytorch_bot_bot_ branch October 10, 2025 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[multimodal] Let Audio take float data blob #14971

[multimodal] Let Audio take float data blob #14971

Uh oh!

pytorchbot commented Oct 9, 2025

Uh oh!

pytorch-bot bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[multimodal] Let Audio take float data blob #14971

[multimodal] Let Audio take float data blob #14971

Uh oh!

Conversation

pytorchbot commented Oct 9, 2025

Summary

Test plan

Uh oh!

pytorch-bot bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14971

❌ 2 New Failures, 1 Cancelled Job

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Oct 9, 2025 •

edited

Loading