## 1. Setup

### a) Install Deps  
Create conda env `gai-stt-svr` and install gai-stt-svr package. After that, switch the kernel to `gai-stt-svr` before proceeding further.

```bash
conda create -n gai-stt-svr python=3.10.10 -y
eval "$(conda shell.bash hook)" && conda activate gai-stt-svr
cd ../..
poetry install
```

### b) Download Model  

```python
huggingface-cli download openai/whisper-large-v3 \
        --local-dir ~/gai/models/whisper-large-v3 \
        --local-dir-use-symlinks False
```

## 2. Load Test Configuration

In [3]:
from gai.lib.server.singleton_host import SingletonHost
from gai.lib.common.utils import free_mem
from rich.console import Console
console=Console()

config = {
    "type": "stt",
    "generator_name": "whisperv3-huggingface",
    "engine": "LocalWhisper_STT",
    "model_name": "OpenAI Whisper v3",
    "model_path": "models/whisper-large-v3",
    "model_basename": "",
    "max_seq_len": 128,
    "stopping_words": [],
    "hyperparameters": {
        "chunk_length_s": 30,
        "batch_size": 16,
        "max_new_tokens": 128
    },
    "module_name": "gai.stt.server.gai_stt",
    "class_name": "GaiSTT",
    "init_args": [],
    "init_kwargs": {}
    }

from gai.stt.server.gai_stt import GaiSTT

## 3. Load Model Test

In [5]:
# before loading
free_mem()
try:
    with SingletonHost.GetInstanceFromConfig(config) as host:

        # after loading
        free_mem()
except Exception as e:
    raise e
finally:
    # after disposal
    free_mem()


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


## 4. Test

a) Using Path

In [8]:
from pathlib import Path

# before loading
free_mem()
try:
    with SingletonHost.GetInstanceFromConfig(config) as host:
        response = host.generator.create(
            file=Path("./today-is-a-wonderful-day.wav")
            )
        print(response)
        # after loading
        free_mem()
except Exception as e:
    raise e
finally:
    # after disposal
    free_mem()


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


{'text': ' Today is a wonderful day to build something people love.', 'chunks': [{'timestamp': (0.0, 3.14), 'text': ' Today is a wonderful day to build something people love.'}]}


b) Using File

In [12]:
from pathlib import Path

# before loading
free_mem()
try:
    with SingletonHost.GetInstanceFromConfig(config) as host:
        file = open("./today-is-a-wonderful-day.wav", "rb")
        response = host.generator.create(file=file)
        print(response)

        # after loading
        free_mem()
except Exception as e:
    raise e
finally:
    # after disposal
    free_mem()


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


{'text': ' Today is a wonderful day to build something people love.', 'chunks': [{'timestamp': (0.0, 3.14), 'text': ' Today is a wonderful day to build something people love.'}]}


Using bytes

In [None]:
from pathlib import Path

# before loading
free_mem()
try:
    with SingletonHost.GetInstanceFromConfig(config) as host:
        file = open("./today-is-a-wonderful-day.wav", "rb")
        data = file.read()
        response = host.generator.create(file=data)
        print(response)

        # after loading
        free_mem()
except Exception as e:
    raise e
finally:
    # after disposal
    free_mem()


## 4. API

In [2]:
%%bash
curl -X 'POST' \
'http://localhost:12033/gen/v1/audio/transcriptions' \
    -s \
    -H 'accept: application/json' \
    -H 'Content-Type: multipart/form-data' \
    -F 'file=@today-is-a-wonderful-day.wav' \
    -F 'model=openai-whisper'

{"text":" Today is a wonderful day to build something people love.","chunks":[{"timestamp":[0.0,3.14],"text":" Today is a wonderful day to build something people love."}]}