Skip to content

Conversation

pytorchbot
Copy link
Collaborator

If the processed audio went through Mel transform, the spectrogram are float values. We should allow Audio class to be able to take this, since multimodal runner pybind API will have to be able to take processed input. Once we have the pybind API we can do something like:

model_id = "mistralai/Voxtral-Mini-3B-2507"
processor = AutoProcessor.from_pretrained(model_id)
audio_url = "https://huggingface.co/datasets/eustlb/audio-samples/resolve/main/dude_where_is_my_car.wav"
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "audio", "url": audio_url},
            {
                "type": "text",
                "text": "What can you tell me about this audio?",
            },
        ],
    },
]
inputs = processor.apply_chat_template(conversation,
    tokenize=True,
    return_dict=True,
    return_tensors="pt")

inputs_combined = [
    make_text_input("<s>[INST][BEGIN_AUDIO]"),
    make_audio_input(inputs["input_features"]),
    make_text_input("\nWhat can you tell me about this audio?[/INST]"),
]
runner = MultimodalRunner("voxtral.pte", "tekken.json", None)
config = GenerationConfig()
config.max_new_tokens = 100
runner.generate(inputs_combined, config)

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

If the processed audio went through Mel transform, the spectrogram are
float values. We should allow `Audio` class to be able to take this,
since multimodal runner pybind API will have to be able to take
processed input. Once we have the pybind API we can do something like:

```python
model_id = "mistralai/Voxtral-Mini-3B-2507"
processor = AutoProcessor.from_pretrained(model_id)
audio_url = "https://huggingface.co/datasets/eustlb/audio-samples/resolve/main/dude_where_is_my_car.wav"
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "audio", "url": audio_url},
            {
                "type": "text",
                "text": "What can you tell me about this audio?",
            },
        ],
    },
]
inputs = processor.apply_chat_template(conversation,
    tokenize=True,
    return_dict=True,
    return_tensors="pt")

inputs_combined = [
    make_text_input("<s>[INST][BEGIN_AUDIO]"),
    make_audio_input(inputs["input_features"]),
    make_text_input("\nWhat can you tell me about this audio?[/INST]"),
]
runner = MultimodalRunner("voxtral.pte", "tekken.json", None)
config = GenerationConfig()
config.max_new_tokens = 100
runner.generate(inputs_combined, config)
```

### Summary
[PLEASE REMOVE] See [CONTRIBUTING.md's Pull
Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests)
for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a `Fixes
#<issue-id>` line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be
the upcoming release notes, please add a "Release notes: <area>" label.
For a list of available release notes labels, check out
[CONTRIBUTING.md's Pull
Requests](https://github.com/pytorch/executorch/blob/main/CONTRIBUTING.md#pull-requests).

### Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual
commands you used and note down tests that you have written if
applicable.

(cherry picked from commit 8b11418)
Copy link

pytorch-bot bot commented Oct 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14971

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job

As of commit af94fa5 with merge base e0dda90 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2025
@larryliu0820 larryliu0820 merged commit 85efbc2 into release/1.0 Oct 10, 2025
120 of 124 checks passed
@larryliu0820 larryliu0820 deleted the cherry-pick-14427-by-pytorch_bot_bot_ branch October 10, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants