# Kokoro TTS Experiments
Goal for now: run Kokoro out-of-the-box and convert statements to speech.
Later: prepare for Nepali + Nepaliglish adaptation and fine-tuning.

## 1) Install dependencies (run once per environment)

In [None]:
# Linux note: some Kokoro language paths need espeak-ng for fallback.\n
# If missing, install it in terminal: sudo apt-get install -y espeak-ng

## 2) Imports and configuration

In [None]:
from pathlib import Path
import numpy as np
import soundfile as sf
from IPython.display import Audio, display
from kokoro import KPipeline

LANG_CODE = 'a'   # 'a' American English, 'b' British English, 'h' Hindi, etc.\n
VOICE = 'af_heart'
SPEED = 1.0
SAMPLE_RATE = 24000

OUT_DIR = Path('outputs/kokoro')
OUT_DIR.mkdir(parents=True, exist_ok=True)

pipeline = KPipeline(lang_code=LANG_CODE)
print('Ready. Output dir:', OUT_DIR.resolve())

## 3) Helper to synthesize + save WAV

In [None]:
def synthesize_to_wav(text: str, basename: str, voice: str = VOICE, speed: float = SPEED):\n
    parts = []\n
    generator = pipeline(text, voice=voice, speed=speed, split_pattern=r'\\n+')\n
\n
    for i, (graphemes, phonemes, audio) in enumerate(generator):\n
        part_path = OUT_DIR / f'{basename}_part{i:02d}.wav'\n
        sf.write(part_path, audio, SAMPLE_RATE)\n
        parts.append(audio)\n
        print(f'[{i}] {part_path.name}')\n
        print('   text:', graphemes[:120].replace('\n', ' '))\n
        print('   phonemes:', phonemes[:120])\n
\n
    if not parts:\n
        raise RuntimeError('No audio generated.')\n
\n
    merged = np.concatenate(parts)\n
    full_path = OUT_DIR / f'{basename}.wav'\n
    sf.write(full_path, merged, SAMPLE_RATE)\n
    display(Audio(merged, rate=SAMPLE_RATE))\n
    print('Saved:', full_path)\n
    return full_path

## 4) Single statement test

In [None]:
text = "Namaste! This is an out-of-the-box Kokoro TTS test from a Jupyter notebook."\n
synthesize_to_wav(text, basename='single_test')

## 5) Batch: convert multiple statements

In [None]:
statements = [\n
    "Namaste, tapailai kasto cha?",\n
    "Aaja hami Kokoro TTS ko out-of-the-box demo gariraheka chhau.",\n
    "Tomorrow we can start dataset planning for Nepali and Nepaliglish fine-tuning.",\n
]\n
\n
for idx, line in enumerate(statements, start=1):\n
    print(f'\n=== Statement {idx} ===')\n
    synthesize_to_wav(line, basename=f'statement_{idx:02d}')

## 6) Quick knobs to try\n
\n
- Change `VOICE` (example: `af_heart`, `am_adam`)\n
- Change `LANG_CODE` (`a`, `b`, `h`, etc.)\n
- Tune `SPEED` (for example `0.9` or `1.1`)

## 7) Next step for Nepali + Nepaliglish (prep checklist)\n
\n
1. Build a clean paired dataset: `(text, wav)` with speaker consistency.\n
2. Keep two text tracks if possible: native Nepali script and romanized Nepaliglish.\n
3. Normalize text before training (numbers, punctuation, abbreviations).\n
4. Start with adaptation/fine-tuning from Kokoro base instead of training from scratch.\n
5. Evaluate with held-out sentences in both Nepali script and Nepaliglish.