Skip to content

Add moonshine streaming#43702

Merged
LysandreJik merged 49 commits intomainfrom
add_moonshine
Feb 4, 2026
Merged

Add moonshine streaming#43702
LysandreJik merged 49 commits intomainfrom
add_moonshine

Conversation

@eustlb
Copy link
Contributor

@eustlb eustlb commented Feb 3, 2026

What does this PR do?

Adds UsefulSensors' new ASR model.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

)
self._output_attentions = value

# Set it recursively on the subconfigs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we should document this under the output_attention doc IMO !

Comment on lines +345 to +348
if config.encoder_config.hidden_size != self.config.hidden_size:
self.proj = nn.Linear(config.encoder_config.hidden_size, self.config.hidden_size, bias=False)
else:
self.proj = nn.Identity()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arf, would be nice to be able to avoid that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally agree, but we cannot... (we could have a all ones linear though, but this is minimal)

Comment on lines +49 to +59
class MoonshineStreamingProcessorKwargs(ProcessingKwargs, total=False):
_defaults = {
"audio_kwargs": {
"pad_to_multiple_of": 80,
"padding": True,
},
"common_kwargs": {"return_tensors": "pt"},
}


class MoonshineStreamingProcessor(Wav2Vec2Processor): ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need that no? We should just map to wav2vec2 processor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise we are creating a new processor juste for a change in default kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue is that we need to input to be padded so the behavior should be enforced in the processor, meaning we need to have set this. I do agree having the create a new processor jsut for that is uncovenient but I don't see another way to do it

@eustlb eustlb marked this pull request as ready for review February 3, 2026 12:58
rendered properly in your Markdown viewer.

-->
*This model was released on 2024-10-21 and added to Hugging Face Transformers on 2026-02-03.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model was released on 2024-10-21

👀? I assume this should be updated haha (unless you're a time traveller)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes ahah 😅

@eustlb eustlb changed the title Add moonshine Add moonshine streaming Feb 4, 2026
Copy link
Contributor Author

@eustlb eustlb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last points to adress @keveman and we should be good to go

Comment on lines +63 to +69
processor = AutoProcessor.from_pretrained("UsefulSensors/moonshine-streaming-tiny")
model = MoonshineStreamingForConditionalGeneration.from_pretrained(
"UsefulSensors/moonshine-streaming-tiny",
dtype=torch.float16,
device_map="auto",
attn_implementation="sdpa"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then can you please add it to the hub configs, cf https://huggingface.co/zai-org/GLM-ASR-Nano-2512/blob/main/config.json for example
also cc @Deep-unlearning for the asr leaderboard evals

Comment on lines 238 to 240
class MoonshineStreamingPreTrainedModel(MoonshinePreTrainedModel):
supports_gradient_checkpointing = False # TODO: check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keveman this was in your original PR. Is this necessary (it is setted to True for Mooshine)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tested with this to True, and tests pass, so yes, can be True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: dtype=torch.float16, may I request removing it from doc,get it merged, and after learderboard evals run, test float16 separately, make a different pull request?

@eustlb eustlb enabled auto-merge (squash) February 4, 2026 15:57
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, moonshine, moonshine_streaming, musicgen

@LysandreJik LysandreJik disabled auto-merge February 4, 2026 18:29
@LysandreJik LysandreJik merged commit ace7c37 into main Feb 4, 2026
24 of 26 checks passed
@LysandreJik LysandreJik deleted the add_moonshine branch February 4, 2026 18:29
@LysandreJik
Copy link
Member

Thank you @eustlb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants