# Vietnamese ASR (ZipFormer-30M RNNT) — Colab

This notebook runs the Vietnamese ZipFormer-30M RNNT model from this repo and exposes a public HTTP API for testing.

What you get:
- CPU-optimized inference (int8 ONNX ZipFormer)
- Quick local test using sample audio in `test_wavs/vietnamese`
- FastAPI server with a public URL via Cloudflared (for external API calls)

If you launched this notebook directly from the repository on Colab, you can run cells top-to-bottom. If not, ensure the repo contents (including ONNX models and `config.json`) are present in the working directory.

In [None]:
#@title 1) Environment setup (ffmpeg + Python deps)
%%bash
set -euxo pipefail

# Ensure apt cache is fresh and install ffmpeg for format conversions
apt-get update -y
apt-get install -y --no-install-recommends ffmpeg

python -V
pip install --upgrade pip wheel setuptools

# Install repo-pinned wheels (CPU torch + sherpa/k2 + sherpa-onnx + FastAPI stack)
if [ -f requirements.txt ]; then
  pip install -r requirements.txt
else
  echo 'requirements.txt not found in CWD. If you are not in the repo root, please cd into the repo directory or clone the repo first.'
  exit 1
fi

# Cloudflared for public URL tunneling
curl -sSL -o /usr/local/bin/cloudflared https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
chmod +x /usr/local/bin/cloudflared

echo '✅ Environment ready'


In [None]:
# 2) Verify imports and list key files
import os, sys, glob, platform
print('Python:', sys.version)
print('Platform:', platform.platform())
print('CWD:', os.getcwd())
print('Has model.py?', os.path.exists('model.py'))
print('Found ONNX encoder?', os.path.exists('encoder-epoch-20-avg-10.int8.onnx') or os.path.exists('encoder-epoch-20-avg-10.onnx'))
print('Found ONNX decoder?', os.path.exists('decoder-epoch-20-avg-10.int8.onnx') or os.path.exists('decoder-epoch-20-avg-10.onnx'))
print('Found ONNX joiner?', os.path.exists('joiner-epoch-20-avg-10.int8.onnx') or os.path.exists('joiner-epoch-20-avg-10.onnx'))
print('Found config.json?', os.path.exists('config.json'))
print('Vietnamese test wavs:', glob.glob('test_wavs/vietnamese/*.wav'))


In [None]:
# 3) Quick local inference on a sample wav
from model import get_pretrained_model, decode
REPO_ID = 'hynt/sherpa-onnx-zipformer-vi-int8-2025-10-16'
DECODING_METHOD = 'modified_beam_search'
NUM_ACTIVE_PATHS = 15
TEST_WAV = 'test_wavs/vietnamese/0.wav'
recognizer = get_pretrained_model(REPO_ID, decoding_method=DECODING_METHOD, num_active_paths=NUM_ACTIVE_PATHS)
print('Recognizer ready')
text = decode(recognizer, TEST_WAV)
print('Recognized text:', text)


## Public API server

This starts the FastAPI server (`api_server.py`) on port 8000 and exposes it via a public URL using Cloudflared.

Endpoints:
- `GET /healthz` — liveness
- `GET /readyz` — model readiness
- `POST /v1/transcribe` — Accepts `multipart/form-data` (key `file`), or JSON with `audio_url` or `audio_base64`.

Auth (optional): set `REQUIRE_API_KEY=true` and provide `Authorization: Bearer <API_KEY>`.

In [None]:
#@title 4) Start FastAPI (background)
import os, threading, time, requests
import uvicorn
os.environ.setdefault('UVICORN_PORT', '8000')
os.environ.setdefault('REQUIRE_API_KEY', 'false')
os.environ.setdefault('API_KEY', '')
os.environ.setdefault('MODEL_REPO_ID', 'hynt/sherpa-onnx-zipformer-vi-int8-2025-10-16')
os.environ.setdefault('DECODING_METHOD', 'modified_beam_search')
os.environ.setdefault('NUM_ACTIVE_PATHS', '15')
os.environ.setdefault('MAX_DURATION_SEC', '60')
def _run():
    uvicorn.run('api_server:app', host='0.0.0.0', port=int(os.environ['UVICORN_PORT']), workers=1)
server_thread = threading.Thread(target=_run, daemon=True)
server_thread.start()
base = f"http://127.0.0.1:{os.environ['UVICORN_PORT']}"
for _ in range(50):
    try:
        r = requests.get(base + '/healthz', timeout=1)
        if r.ok:
            print('✅ Local server up at', base)
            break
    except Exception:
        time.sleep(0.2)


In [None]:
#@title 5) Expose a public URL via Cloudflared
import subprocess, re, time, requests, os
port = os.environ.get('UVICORN_PORT', '8000')
proc = subprocess.Popen(['/usr/local/bin/cloudflared', 'tunnel', '--url', f'http://localhost:{port}', '--no-autoupdate'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
public_url = None
deadline = time.time() + 60
while time.time() < deadline:
    line = proc.stdout.readline()
    if not line:
        time.sleep(0.2)
        continue
    m = re.search(r'(https://[-a-z0-9.]+trycloudflare.com)', line)
    if m:
        public_url = m.group(1)
        break
if not public_url:
    raise RuntimeError('Failed to obtain public URL from cloudflared logs')
print('🌐 Public API base URL:', public_url)
print('Health:', requests.get(public_url + '/healthz', timeout=5).json())
print('Ready :', requests.get(public_url + '/readyz', timeout=10).json())


In [None]:
# 6) Call the API (examples)
import requests, base64, os
API_BASE = public_url
API_KEY = os.environ.get('API_KEY', '')
use_auth = os.environ.get('REQUIRE_API_KEY','false').lower()=='true'
headers = ({'Authorization': f'Bearer {API_KEY}'} if use_auth else {})
# a) Multipart upload
with open('test_wavs/vietnamese/0.wav', 'rb') as f:
    files = {'file': ('0.wav', f, 'audio/wav')}
    r = requests.post(API_BASE + '/v1/transcribe', files=files, headers=headers, timeout=60)
print('Multipart result:', r.status_code, r.json())
# b) JSON: base64-encoded
with open('test_wavs/vietnamese/1.wav', 'rb') as f:
    b64 = base64.b64encode(f.read()).decode('utf-8')
payload = {'audio_base64': b64, 'decoding_method': 'modified_beam_search', 'num_active_paths': 15}
r = requests.post(API_BASE + '/v1/transcribe', json=payload, headers=headers, timeout=60)
print('Base64 result  :', r.status_code, r.json())
print('Use this from your machine:')
print(f"curl -X POST '{API_BASE}/v1/transcribe' -F file=@test_wavs/vietnamese/0.wav")


### Notes
- Default model: `hynt/sherpa-onnx-zipformer-vi-int8-2025-10-16` (int8 ONNX for fast CPU inference).
- You can set `MODEL_REPO_ID`, `DECODING_METHOD`, `NUM_ACTIVE_PATHS`, and `MAX_DURATION_SEC` as env vars before launching the server.
- To enable auth, set `REQUIRE_API_KEY=true` and provide `API_KEY`; then use header `Authorization: Bearer <API_KEY>`.
- Cloudflared public URLs are ephemeral and intended for testing. For production, deploy on a server (see `DEPLOYMENT.md`).