Skip to content

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint

License

Notifications You must be signed in to change notification settings

kaiidams/soundstream-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoundStream for Pytorch

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint.

16kHz pretrained model was trained on LibriSpeech train-clean-100 with NVIDIA T4 for about 150 epochs (around 50 hours) in total. The model is not causal.

import torchaudio
import torch

model = torch.hub.load("kaiidams/soundstream-pytorch", "soundstream_16khz")
x, sr = torchaudio.load('input.wav')
x, sr = torchaudio.functional.resample(x, sr, 16000), 16000
with torch.no_grad():
    y = model.encode(x)
    # y = y[:, :, :4]  # if you want to reduce code size.
    z = model.decode(y)
torchaudio.save('output.wav', z, sr)

sample audio

Audio references are sampled from LibriSpeech test-clean.

Reference SoundStream
audio link audio link
audio link audio link
audio link audio link

About

Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages