Sopro TTS

SoproTTSv1.5demo.mp4

Sopro TTS

📰 News

2026.02.04 - SoproTTS v1.5 is out: more stable, faster, and smaller (135M parameters). Trained for just $100 on a single GPU, it reaches 250 ms TTFA streaming and 0.05 RTF (~20× realtime) on CPU.

Sopro (from the Portuguese word for “breath/blow”) is a lightweight English text-to-speech model I trained as a side project. Sopro is composed of dilated convs (à la WaveNet) and lightweight cross-attention layers, instead of the common Transformer architecture. Even though Sopro is not SOTA across most voices and situations, I still think it’s a cool project made with a very low budget (trained on a single L40S GPU), and it can be improved with better data.

Some of the main features are:

135M parameters
Streaming
Zero-shot voice cloning
0.05 RTF on CPU (measured on an M3 base model), meaning it generates 32 seconds of audio in 1.77 seconds
3-12 seconds of reference audio for voice cloning

Instructions

I only pinned the minimum dependency versions so you can install the package without having to create a separate env. However, some versions of Torch work best. For example, on my M3 CPU, torch==2.10.0 (without torchvision) achieves ~600 it/s on the AR generation.

(Optional)

conda create -n soprotts python=3.10
conda activate soprotts

From PyPI

pip install -U sopro

From the repo

git clone https://github.com/samuel-vitorino/sopro
cd sopro
pip install -e .

Examples

CLI

soprotts \
  --text "Sopro is a lightweight 135 million parameter text-to-speech model. Some of the main features are streaming, zero-shot voice cloning, and 0.05 real-time factor on the CPU." \
  --ref_audio ref.wav \
  --out out.wav

You have the expected temperature and top_p parameters, alongside:

--style_strength (controls the FiLM strength; increasing it can improve or reduce voice similarity; default 1.2)

Python

Non-streaming

from sopro import SoproTTS

tts = SoproTTS.from_pretrained("samuel-vitorino/sopro", device="cpu")

wav = tts.synthesize(
    "Hello! This is a non-streaming Sopro TTS example.",
    ref_audio_path="ref.wav",
)

tts.save_wav("out.wav", wav)

Streaming

import torch
from sopro import SoproTTS

tts = SoproTTS.from_pretrained("samuel-vitorino/sopro", device="cpu")

chunks = []
for chunk in tts.stream(
    "Hello! This is a streaming Sopro TTS example.",
    ref_audio_path="ref.mp3",
):
    chunks.append(chunk.cpu())

wav = torch.cat(chunks, dim=-1)
tts.save_wav("out_stream.wav", wav)

You can also precalculate the reference to reduce TTFA:

import torch
from sopro import SoproTTS

tts = SoproTTS.from_pretrained("samuel-vitorino/sopro", device="cpu")

ref = tts.prepare_reference(ref_audio_path="ref.mp3")

chunks = []
for chunk in tts.stream(
    "Hello! This is a streaming Sopro TTS example.",
    ref=ref,
):
    chunks.append(chunk.cpu())

wav = torch.cat(chunks, dim=-1)
tts.save_wav("out_stream.wav", wav)

Interactive streaming demo

After you install the sopro package:

pip install -r demo/requirements.txt
uvicorn demo.server:app --host 0.0.0.0 --port 8000

Or with docker:

docker build -t sopro-demo .
docker run --rm -p 8000:8000 sopro-demo

Navigate to http://localhost:8000 on your browser.

Disclaimers

Sopro can be inconsistent, so mess around with the parameters until you get a decent sample.
Voice cloning is highly dependent on mic quality, ambient noise, etc. On more OOD voices it might fail to match the voice well.
Prefer words instead of abbreviations and symbols. For example, “1 + 2” → “1 plus 2”. That said, Sopro can generally read abbreviations like “CPU”, “TTS”, etc.
The streaming version is not bit-exact compared to the non-streaming version. For best quality, prioritize the non-streaming version.
If you use torchaudio to read or write audio, ffmpeg may be required. I recommend just using soundfile.
I will publish the training code once I have time to organize it.

Currently, generation is limited to ~32 seconds (400 frames). You can increase it, but the model generally hallucinates beyond that.

AI was used mainly for creating the web demo, organizing my messy code into this repo, ablations and brainstorming.

I would love to support more languages and continue improving the model. If you like this project, consider buying me a coffee so I can buy more compute: https://buymeacoffee.com/samuelvitorino

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
demo		demo
src/sopro		src/sopro
.gitignore		.gitignore
Dockerfile		Dockerfile
Justfile		Justfile
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sopro TTS

📰 News

Instructions

From PyPI

From the repo

Examples

CLI

Python

Non-streaming

Streaming

Interactive streaming demo

Disclaimers

Training data

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sopro TTS

📰 News

Instructions

From PyPI

From the repo

Examples

CLI

Python

Non-streaming

Streaming

Interactive streaming demo

Disclaimers

Training data

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages