Skip to content

navercm418/VoiceForge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ VoiceForge Studio

AI-powered voice line generator for Skyrim SE mods — fully local, completely free.

"Not batch mode. Best of X."


What Is It?

VoiceForge Studio is a local Gradio web app that takes you from raw dialogue text all the way to a finished .fuz file ready to drop into your Skyrim SE mod — no cloud services, no subscriptions, no per-line costs.

It uses Qwen3-TTS for voice cloning and the vanilla Skyrim voice library as a casting reference database. The result is generated dialogue that actually sounds like it belongs in the game.


Why Does It Sound So Good?

Most TTS tools work in batch mode:

  • Dump 100+ WAV files in
  • The tool builds an "average" of that voice
  • Generate your lines
  • Hope it sounds right

VoiceForge Studio works differently. For every line you need to record:

  1. You write the line — e.g. "They passed on a few years back. Peacefully. I'm grateful for that."
  2. The tool semantically searches thousands of vanilla Skyrim voice lines to find ones that feel similar in tone, length, and emotional weight
  3. Qwen3-TTS clones that specific reference for that specific line
  4. You get multiple variants — pick the best performance
  5. One click: LIP file generated, FUZ file packaged, ready for your mod

The vanilla game already contains thousands of perfectly acted reference performances. VoiceForge Studio uses those as a casting library, matched intelligently to each individual line you need.

You're not building a voice model. You're directing a performance, one line at a time.


Features

  • Semantic + fuzzy reference matching — finds the best vanilla voice clip for each line using sentence-transformer embeddings and fuzzy scoring
  • Three search modes:
    • Auto — semantic search on your dialogue line
    • Hybrid — semantic search on a custom reference text
    • Manual — SQL LIKE wildcard search on voice text (* supported)
  • Tag system — tag reference clips (e.g. happy, aggressive, calm, whisper) and filter by include/exclude tags per search
  • Min text length filter — avoid short throwaway reference clips
  • Multiple variants — generate N candidates per line, one per reference WAV
  • LIP & Fuz pipeline — batch select generated files, auto-generate .lip via LipGenerator, package to .fuz via LIPFuzer, drop flat into your destination folder
  • Archive — move generated output to data/archive between sessions
  • Voice type support — all vanilla Skyrim SE voice types

Prerequisites

Before running VoiceForge Studio you will need:

1. Skyrim Special Edition (Steam)

The LipGenerator and LIPFuzer tools ship with Skyrim SE. You'll find them at:

<SteamLibrary>\steamapps\common\Skyrim Special Edition\Tools\LipGen\

Copy both into tools\fuzer\ inside your VoiceForge Studio folder:

  • LipGenerator\LipGenerator.exe
  • LipFuzer\LIPFuzer.exe

2. Voice Database (voiceText.db)

VoiceForge Studio searches vanilla Skyrim voice lines for reference matching. This database must be built from your own game files and is not included in this repository (redistributing Bethesda assets is not permitted).

📋 Setup instructions for building voiceText.db coming soon.
You will need a BSA extractor and the Skyrim SE voice BSAs.

3. Python 3.10+

Recommended: Python 3.10 or 3.11. A CUDA-capable GPU is strongly recommended for Qwen3-TTS generation speed.


Installation

# 1. Clone the repo
git clone https://github.com/yourusername/VoiceForgeStudio.git
cd VoiceForgeStudio

# 2. Create virtual environment
python -m venv venv
venv\Scripts\activate        # Windows
# source venv/bin/activate   # Linux/Mac

# 3. Install dependencies
pip install -r requirements.txt

Folder Structure

VoiceForgeStudio/
├── VoiceForgeStudio.py          # Main app
├── requirements.txt
├── data/
│   ├── system.db                # App database (tags, history, temp)
│   ├── voiceText.db             # Voice line database (you build this)
│   ├── extractedwavs/           # Temp reference WAVs
│   └── archive/                 # Archived generated output
├── generated/                   # Generated voice lines
├── tools/
│   ├── scripts/
│   │   ├── sqlHelper.py
│   │   ├── textMatcher.py
│   │   └── train_custom_voice.py
│   ├── BsaBrowser/
│   │   └── bsaExtract.py
│   └── fuzer/
│       ├── LipGenerator.exe     # From Skyrim SE Tools (not included)
│       └── LIPFuzer.exe         # From Skyrim SE Tools (not included)

Running the App

python VoiceForgeStudio.py

Then open your browser at http://localhost:7860


Workflow

Step 1 — Set Up Your Search

  • Select your Voice Type (e.g. FemaleEvenToned)
  • Set Include/Exclude Tags to filter reference clips by tone
  • Set a Min Text Length to avoid short reference clips
  • Choose your Search Mode

Step 2 — Find Reference WAVs

  • Type your dialogue line in Dialogue Line
  • Set the number of reference matches (1–10)
  • Click Find Reference WAVs
  • Listen to the matched clips — these become your cloning references

Step 3 — Generate

  • Click Generate Voice Line
  • VoiceForge Studio generates one variant per reference WAV
  • Listen to the generated output and pick the best performance

Step 4 — Tag Your References (Optional)

  • Select a reference match from the Ref Match dropdown
  • Assign tags to help future searches find similar clips
  • Click Update Tags

Step 5 — LIP & Fuz

  • Select one or more generated files from the Generated Files dropdown
  • Enter your Destination Folder (your mod's voice directory)
  • Click LIP & Fuz
  • .fuz files land flat in your destination folder, ready to use

Dialogue Line Format

Lines in the Dialogue Line box use pipe-separated format:

LineName|The text you want spoken

Multiple lines:

Greeting01|Good morning. I didn't expect to see you here.
Farewell01|Take care of yourself out there.
Idle01|They passed on a few years back. Peacefully. I'm grateful for that.

💡 Tip: Punctuation is your stage direction. Periods create natural pauses. A sentence like "Peacefully." on its own will land with appropriate weight. Qwen3-TTS reads the rhythm of your text.


Tags

Tags let you categorize reference clips for smarter future searches. Suggested tags:

Tag Meaning
happy Upbeat, warm delivery
sad Somber, low energy
aggressive Intense, confrontational
calm Steady, measured
whisper Quiet, hushed
neverUse Permanently excluded from all searches
doNotUse Excluded from searches (soft block)

Credits & Dependencies

  • Qwen3-TTS — voice cloning model by Alibaba
  • Gradio — UI framework
  • sentence-transformers — semantic search (all-MiniLM-L6-v2)
  • rapidfuzz — fuzzy text matching
  • LipGenerator / LIPFuzer — included with Skyrim Special Edition (Bethesda)

License

MIT License. See LICENSE for details.

⚠️ VoiceForge Studio does not include or redistribute any Bethesda game assets. You must own a legitimate copy of Skyrim Special Edition to use this tool.


Contributing

Pull requests welcome. If you're a Skyrim modder and have ideas for improvements — better reference matching, new search modes, UI enhancements — open an issue or drop a PR.


Built for the Skyrim modding community. Because your characters deserve a real voice.

About

VoiceForge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages