Conversation
* sliding window attention in encoder (first 2 and last 2 layers) * different dims for encoder and decoder * projection and position embedding addition in adapter
Not auto generated anymore.
Mostly fixing the config such that the encoder and decoder dimensions can be different.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| ) | ||
| self._output_attentions = value | ||
|
|
||
| # Set it recursively on the subconfigs |
There was a problem hiding this comment.
Ok, we should document this under the output_attention doc IMO !
src/transformers/models/moonshine_streaming/configuration_moonshine_streaming.py
Outdated
Show resolved
Hide resolved
src/transformers/models/moonshine_streaming/modular_moonshine_streaming.py
Outdated
Show resolved
Hide resolved
src/transformers/models/moonshine_streaming/modular_moonshine_streaming.py
Outdated
Show resolved
Hide resolved
| if config.encoder_config.hidden_size != self.config.hidden_size: | ||
| self.proj = nn.Linear(config.encoder_config.hidden_size, self.config.hidden_size, bias=False) | ||
| else: | ||
| self.proj = nn.Identity() |
There was a problem hiding this comment.
arf, would be nice to be able to avoid that!
There was a problem hiding this comment.
totally agree, but we cannot... (we could have a all ones linear though, but this is minimal)
| class MoonshineStreamingProcessorKwargs(ProcessingKwargs, total=False): | ||
| _defaults = { | ||
| "audio_kwargs": { | ||
| "pad_to_multiple_of": 80, | ||
| "padding": True, | ||
| }, | ||
| "common_kwargs": {"return_tensors": "pt"}, | ||
| } | ||
|
|
||
|
|
||
| class MoonshineStreamingProcessor(Wav2Vec2Processor): ... |
There was a problem hiding this comment.
we don't need that no? We should just map to wav2vec2 processor
There was a problem hiding this comment.
otherwise we are creating a new processor juste for a change in default kwargs
There was a problem hiding this comment.
issue is that we need to input to be padded so the behavior should be enforced in the processor, meaning we need to have set this. I do agree having the create a new processor jsut for that is uncovenient but I don't see another way to do it
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
| *This model was released on 2024-10-21 and added to Hugging Face Transformers on 2026-02-03.* |
There was a problem hiding this comment.
This model was released on 2024-10-21
👀? I assume this should be updated haha (unless you're a time traveller)
| processor = AutoProcessor.from_pretrained("UsefulSensors/moonshine-streaming-tiny") | ||
| model = MoonshineStreamingForConditionalGeneration.from_pretrained( | ||
| "UsefulSensors/moonshine-streaming-tiny", | ||
| dtype=torch.float16, | ||
| device_map="auto", | ||
| attn_implementation="sdpa" | ||
| ) |
There was a problem hiding this comment.
then can you please add it to the hub configs, cf https://huggingface.co/zai-org/GLM-ASR-Nano-2512/blob/main/config.json for example
also cc @Deep-unlearning for the asr leaderboard evals
| class MoonshineStreamingPreTrainedModel(MoonshinePreTrainedModel): | ||
| supports_gradient_checkpointing = False # TODO: check | ||
|
|
There was a problem hiding this comment.
@keveman this was in your original PR. Is this necessary (it is setted to True for Mooshine)
There was a problem hiding this comment.
just tested with this to True, and tests pass, so yes, can be True.
There was a problem hiding this comment.
Re: dtype=torch.float16, may I request removing it from doc,get it merged, and after learderboard evals run, test float16 separately, make a different pull request?
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, moonshine, moonshine_streaming, musicgen |
|
Thank you @eustlb |
What does this PR do?
Adds UsefulSensors' new ASR model.