SoundStream for Pytorch

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint.

16kHz pretrained model was trained on LibriSpeech train-clean-100 with NVIDIA T4 for about 150 epochs (around 50 hours) in total. The model is not causal.

import torchaudio
import torch

model = torch.hub.load("kaiidams/soundstream-pytorch", "soundstream_16khz")
x, sr = torchaudio.load('input.wav')
x, sr = torchaudio.functional.resample(x, sr, 16000), 16000
with torch.no_grad():
    y = model.encode(x)
    # y = y[:, :, :4]  # if you want to reduce code size.
    z = model.decode(y)
torchaudio.save('output.wav', z, sr)

sample audio

Audio references are sampled from LibriSpeech test-clean.

Reference	SoundStream
audio link	audio link
audio link	audio link
audio link	audio link

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
soundstream.py		soundstream.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoundStream for Pytorch

sample audio

About

Releases 1

Packages

Languages

License

kaiidams/soundstream-pytorch

Folders and files

Latest commit

History

Repository files navigation

SoundStream for Pytorch

sample audio

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages