Language: English · Русский · 中文 · 日本語 · Español · Français · Deutsch · Português · 한국어
By Oscar Lumiere
- Register at https://huggingface.co (skip if you already have an account)
- Open https://huggingface.co/stabilityai/stable-audio-open-1.0 → click Agree and access repository
- At https://huggingface.co/settings/tokens → New token → Type: Read → Create
- Run
install.bat→ pick UI language → paste token → pick translator. Done.
Then run.bat opens http://127.0.0.1:7860.
Self-contained Windows app for Stable Audio Open 1.0 by Stability AI. Built for sound design, game audio, video editing and music sketches.
- 9-language UI (English, Russian, Chinese, Japanese, Spanish, French, German, Portuguese, Korean) — switcher in the Gradio footer changes everything on the fly
- Multilingual prompts — write in Cyrillic / CJK / Arabic / Hebrew / Greek / Thai / Devanagari, gets auto-translated to English before generation. Two translator options at install:
opus-mt-mul-en(light, ~300 MB) ornllb-200-distilled-600M(heavy, ~2.4 GB) - ~217 ready presets in 15 categories — Footsteps, Impacts, Movement, UI, Weapons, Ambience, Vehicles, Nature, Music, Cinematic, Magic, Sci-Fi, Horror, Animals, Crowd & Voice
- Multi-variation generation — 1 to 4 audio outputs per click (
num_waveforms_per_prompt), scheduler picker (Default / DPM++ 2M / Euler), random seed button, re-roll button - Session history — last 10 generations as collapsible mini-players
- Batch mode — paste many prompts (one per line), each saved straight to
outputs/Saved/ - ZIP export of
outputs/Saved/in one click - Game-ready output — slugified filenames (
wood_floor_footsteps_01.wav) with auto-incremented per-folder counters; WAV INFO chunk metadata carries title (full prompt), seed, duration, steps, cfg, sample rate, negative — read by Reaper, Audition, Ableton, ffprobe - GPU/CPU live toggle plus
RunOnCPU.batfor low-VRAM systems - Per-session log file in
logs/app-YYYYMMDD-HHMMSS.logwith full tracebacks on errors - One-file installer (
install.bat) — silently sets up Python 3.11.9 inside the project folder, pickstorch+cu128or CPU torch vianvidia-smiauto-detect, downloads model and translator. Truly portable: deleting the project leaves zero traces in the system - Update path —
update.batrefreshes deps and re-syncs models
- Windows 10 / 11 (tested on Windows 11 Pro)
- Python 3.11.x — https://www.python.org/downloads/release/python-3119/ (during install, check Add Python to PATH)
- NVIDIA GPU with CUDA 12.8 (tested on RTX 5080, 16 GB VRAM).
For other GPUs, change the PyTorch index in
install.bat/setup.bat(e.g.cu121for CUDA 12.1). - 15–20 GB free disk during install (after install: ~10–13 GB resident):
- ~5 GB — Stable Audio Open weights
- ~4–5 GB — Python + PyTorch CUDA + dependencies
- 0.3–2.4 GB — translator (your choice)
- ~5 GB — HuggingFace + pip caches + install temp files (mostly recoverable later)
- HuggingFace account + read token (only required for the initial download)
Once — only to download Stable Audio weights. After that the app runs
offline from the local hf-cache/.
Stable Audio Open 1.0 is a gated model — Stability AI requires a registered account and acceptance of the model license before download. So:
- Register on https://huggingface.co (if you don't already have an account)
- Open the model page and accept the license: https://huggingface.co/stabilityai/stable-audio-open-1.0 (click Agree and access repository — without this the download returns 403)
- Create a read token: https://huggingface.co/settings/tokens
install.batwill ask you to paste the token at the right moment
The translator models (opus-mt-mul-en and nllb-200-distilled-600M) are
not gated and need no token.
A single file does everything in one go:
install.bat
What it does:
- Picks the prompts language (1–9: English, Russian, Chinese, Japanese, Spanish, French, German, Portuguese, Korean)
- Verifies that Python 3.11 is on PATH (
py -3.11) - Creates
.venvand upgrades pip / wheel - Installs PyTorch (CUDA 12.8) and the rest of
requirements.txt - Asks for the HF token (or skips if you're already logged in)
- Asks which translator to install:
- 1. LIGHT —
Helsinki-NLP/opus-mt-mul-en- ~300 MB, fast, Apache 2.0 (commercial use OK)
- Medium quality, sometimes mistranslates music terms
- 2. HEAVY —
facebook/nllb-200-distilled-600M- ~2.4 GB, slower, CC-BY-NC 4.0 (non-commercial only)
- Higher quality, recommended for non-English
- 1. LIGHT —
- Saves the choice to
hf-cache/translator.cfg - Downloads Stable Audio (~5 GB) and the chosen translator into
hf-cache/
run.bat
A browser tab opens at http://127.0.0.1:7860.
Quickest path:
- Open
hf-cache/translator.cfgand change tolightorheavy - Run
download.bat— it will fetch the chosen one if it's not in cache run.bat
Or delete hf-cache/translator.cfg and run install.bat again — it will ask anew and
will not re-download dependencies or Stable Audio (already in cache).
install.bat is a wrapper around four smaller scripts; each can be run alone:
| Script | What it does |
|---|---|
setup.bat |
Creates .venv and installs dependencies |
login.bat |
HuggingFace authentication (hf auth login) |
download.bat |
Downloads Stable Audio + the translator from hf-cache/translator.cfg |
run.bat |
Launches the Gradio web UI |
.
├── app.py # Gradio UI + generation
├── translator.py # Translator module (light / heavy)
├── download_model.py # Model downloads from HuggingFace
├── requirements.txt # Python dependencies
├── install.bat # Single-file installer (multilingual)
├── setup.bat # Just venv + dependencies
├── login.bat # Just HF auth
├── download.bat # Just downloads
├── run.bat # Launch the web UI
├── hf-cache/translator.cfg # light / heavy (created by install.bat)
├── README.md # main docs (English)
├── docs/i18n/ # localized README in 8 other languages
├── LICENSE # MIT — for the source code in this repo
├── NOTICE.md # Licenses of the models used
├── hf-cache/ # Local HF cache (NOT in git)
└── outputs/ # Saved audio (NOT in git)
├── intermediate/ # When the "Save all" checkbox is on
└── Saved/ # When you press the "Save" button
You can override these before running run.bat:
| Variable | Default | Purpose |
|---|---|---|
HOST |
127.0.0.1 |
Gradio host |
PORT |
7860 |
Gradio port |
SHARE |
0 |
1 = create a public gradio.live URL |
STABLE_AUDIO_MODEL |
stabilityai/stable-audio-open-1.0 |
Override the model |
This source code is MIT (see LICENSE).
The models each have their own licenses (see NOTICE.md for full text):
| Model | License | Commercial use |
|---|---|---|
| Stable Audio Open 1.0 | Stability AI Community | Up to $1M annual revenue: yes; above: Enterprise license required |
| opus-mt-mul-en (light) | Apache 2.0 | Yes |
| NLLB-200-distilled-600M (heavy) | CC-BY-NC 4.0 | No, non-commercial only |
If you intend to use this commercially, pick the light translator at install.
- Windows 11 Pro
- Python 3.11.9
- NVIDIA RTX 5080 (16 GB VRAM)
- CUDA 12.8
- PyTorch 2.11.0+cu128
- Stable Audio Open 1.0 — Stability AI
- opus-mt-mul-en — Helsinki-NLP / Tatoeba Translation Challenge
- NLLB-200 — Meta AI / FAIR
- Gradio, diffusers, transformers, accelerate — HuggingFace
- Wrapper development — Oscar Lumiere
- Assistance — Claude Code (Anthropic)