Add xcodec2 model#44178

Open

ebezzam wants to merge 86 commits intohuggingface:mainfrom

ebezzam:add-xcodec2

Contributor

ebezzam commented Feb 20, 2026 •

edited

Loading

What does this PR do?

Re-opening #37868

TODO

recompute expected outputs
passthrough code given new conventions
check for unused code paths / configuration parameters

Original checkpoint: https://huggingface.co/HKUSTAudio/xcodec2
Original modeling code: https://huggingface.co/HKUSTAudio/xcodec2/blob/main/modeling_xcodec2.py

Deep-Unlearning and others added 30 commits

April 29, 2025 16:34


          Add xcodec model

277a96f


          code formatting

349feae


          typo xcodec2 name

e5f1da8


          add xcodec2 in init file

fc0907c


          fix import

ea0acbf


          fix weight_norm init

8542db7


          remove unused import

e98d981


          add convert file

d1cd3ac


          add ModelOutput class

74fa506

nit

02f5c94


          fix device issue

dd0a17c


          fix forward

nit

93dbfad


          doc draft

d4d8c6a


          draft test

c40912e


          match tensor with the orignal implementation

17eb48c


          Add doc file for xcodec2


          finish model doc for xcodec2

a2faa55


          update doc

31319fb


          working xcodec2

8d9f8df


          add test file for xcodec2

e5a1838

nit

473f95a


          xcodec2 use EncodecFeatureExtractor

dd8aace


          Merge branch 'main' into add-xcodec2

f6cf875


          Standardize with Xcodec.

244bdb6


          Merge branch 'main' into add-xcodec2

a84a69f


          Merge branch 'main' into add-xcodec2

9d743e8


          Address some PR comments and standardize.

2e23505


          Remove Sequential.


          Remove weight norm from model definition.

fcbeab7

ebezzam and others added 8 commits

October 3, 2025 15:43


          Another unprotected import.

3d365e5


          Remove more unprotected torches.

6b812a0


          zero_mean_unit_var_norm needed for a test

2bfc30a


          Modify Vocos component to be able to use modular later.


          Update modular

dd3f45f


          Make style happy.

8cfce62


          Merge branch 'main' into add-xcodec2

a76717a


          Regenerate modular, update rope config.

3650d87

HuggingFaceDocBuilderDev commented Feb 27, 2026

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ebezzam and others added 11 commits

March 13, 2026 19:56


          Update init weights

a4cca27


          Add to init weights

e93c846


          FIxed _init_weights

86ee458


          Integration passing!

1a8b461


          Cleanup

a2dbfdb


          Better use of modular, and cleanup.

4129a27


          Merge branch 'main' into add-xcodec2

ab62936


          Update config after mering with main, and other nits.

4ffff44


          Merge branch 'main' into add-xcodec2

b5ec79a


          Address tests and other nits.

642b56c


          Nits, better wav2vec2 bert init

1fbe78d

Contributor Author

ebezzam commented Mar 18, 2026

run-slow: xcodec2

Contributor

github-actions bot commented Mar 18, 2026

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/xcodec2"]
quantizations: []

Contributor

github-actions bot commented Mar 18, 2026

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	6fd5f248	workflow commit (merge commit)
PR	1fbe78dc	branch commit (from PR)
main	24a4dc22	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

ebezzam added 4 commits

March 19, 2026 11:20


          CLean up feature extraction and other nits

8570ded


          doc nits

15e9349

Nit

7bd81c4


          Simplify feature extraction.

a97cc2c

Contributor

github-actions bot commented Mar 19, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, dac, higgs_audio_v2_tokenizer, pe_audio, qwen2_5_omni, seamless_m4t, wav2vec2_bert, xcodec, xcodec2

ebezzam commented

View reviewed changes

Contributor Author

ebezzam left a comment

@eustlb a self-review review for X-Codec2!

Main things:

Unique feature extraction for DAC-like and SeamlessM4T-like input processing, as the model needs both padded audio and spectrogram inputs.
New type of components in modular: Xcodec2FiniteScalarQuantization and Xcodec2ISTFTHead (similar to what we saw in the Vocos PR)
Small tweaks/fixes for models that Xcodec2 depended on for modular

Draft model page: https://huggingface.co/bezzam/xcodec2

src/transformers/models/wav2vec2_bert/modular_wav2vec2_bert.py

                   main_input_name = "input_features"
                   input_modalities = "audio"
                   supports_gradient_checkpointing = True
+                  _no_split_modules = ["Wav2Vec2BertEncoderLayer"]

Contributor Author

ebezzam Mar 19, 2026

To allow loading with device_map="auto"

src/transformers/models/wav2vec2_bert/modular_wav2vec2_bert.py

                   @torch.no_grad()
                   def _init_weights(self, module):
-                      """Initialize the weights"""
+                      super()._init_weights(module)

Contributor Author

ebezzam Mar 19, 2026 •

edited

Loading

XCodec2 uses a pretrained checkpoint of Wav2Vec2-BERT, but Xcodec2's test test_can_init_all_missing_weights was failing because Embedding wasn't initialized. We can rely on the base _init_weights and also remove some initialization from below

src/transformers/models/xcodec2/modular_xcodec2.py

Comment on lines +166 to +168

+                      self.norm1 = nn.GroupNorm(num_groups=32, num_channels=config.hidden_size, eps=1e-6, affine=True)
+                      self.activation1 = nn.SiLU()
+                      self.conv1 = nn.Conv1d(config.hidden_size, config.hidden_size, kernel_size=3, stride=1, padding=1)

Contributor Author

ebezzam Mar 19, 2026

Similar to PeAudioVideoConvBlock1d but slight differences that don't make modular direct here?

src/transformers/models/xcodec2/modular_xcodec2.py

Comment on lines +134 to +139

+              class SnakeBeta(SnakeBeta):
+                  pass
+              class AntiAliasedActivation1d(AntiAliasedActivation1d):
+                  pass

Contributor Author

ebezzam Mar 19, 2026

I thought just importing above would have been enough, but it wasn't generating the classes without this 🤔

src/transformers/models/xcodec2/modular_xcodec2.py

Comment on lines +258 to +268

+                      # Back to audio (ISTFT with "same" padding)
+                      time_frames = torch.fft.irfft(spectrogram_complex, self.n_fft, dim=1, norm="backward")
+                      time_frames = time_frames * self.window[None, :, None]
+                      num_frames = spectrogram_complex.shape[-1]
+                      output_size = (num_frames - 1) * self.hop_length + self.win_length
+                      audio = F.fold(
+                          time_frames,
+                          output_size=(1, output_size),
+                          kernel_size=(1, self.win_length),
+                          stride=(1, self.hop_length),
+                      )[:, 0, 0, self.padding : -self.padding]

Contributor Author

ebezzam Mar 19, 2026

torch.istft doesn't support the custom padding needed here for integrations tests to match expected output

src/transformers/models/xcodec2/modular_xcodec2.py

Comment on lines +296 to +299

+                      hidden_states = self.finite_scalar_quantization.bound(
+                          hidden_states
+                      )  # For consistency with original checkpoint
+                      quantized_out, indices = self.finite_scalar_quantization(hidden_states)

Contributor Author

ebezzam Mar 19, 2026

calling self.finite_scalar_quantization.bound is a bit redundant, as it's called within self.finite_scalar_quantization(hidden_states). But the original modeling did it and it is needed to match expected outputs.

src/transformers/models/xcodec2/modular_xcodec2.py

		return hidden_states + residual


		class Xcodec2FiniteScalarQuantization(nn.Module):

Contributor Author

ebezzam Mar 19, 2026

new component

src/transformers/models/xcodec2/modular_xcodec2.py

		return codes, indices


		class Xcodec2ISTFTHead(nn.Module):

Contributor Author

ebezzam Mar 19, 2026

Similar to what we saw in the Vocos PR

ebezzam requested a review from eustlb

March 19, 2026 12:08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet