Skip to content

whisper: CrisperWhisper results in grpc: error while marshaling: string field contains invalid UTF-8 #5038

@markuman

Description

@markuman

LocalAI version:
localai/localai:v2.26.0-aio-gpu-nvidia-cuda-12

Environment, CPU architecture, OS, and Version:

uname -a
Linux gpu2 6.13.7-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 13 Mar 2025 18:12:00 +0000 x86_64 GNU/Linux

Describe the bug

Using https://huggingface.co/nyrahealth/CrisperWhisper with local-ai resullts in

Whisper-Error: 500 - {"error":{"code":500,"message":"rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8","type":""}}

To Reproduce

  1. create directories and install dependencies
mkdir CrisperWhisper
mkdir CrisperWhisper-out
pip install huggingface_hub  torch numpy transformers
git clone https://github.com/openai/whisper
git clone https://github.com/ggerganov/whisper.cpp
  1. Download the model
from huggingface_hub import snapshot_download, login

HUGGINGFACE_TOKEN = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

login(token=HUGGINGFACE_TOKEN)

model_id = "nyrahealth/CrisperWhisper"  # Replace with the ID of the model you want to download
snapshot_download(repo_id=model_id, local_dir="CrisperWhisper")
  1. convert model to single file ggml
python whisper.cpp/models/convert-h5-to-ggml.py CrisperWhisper/ whisper/ CrisperWhisper-out/

move CrisperWhisper-out/ ggm file to your local-ai model path.

Expected behavior

Transcribe succeeded without any errors.

Logs

7:53AM INF BackendLoader starting backend=whisper modelID=CrisperWhisper.bin o.model=CrisperWhisper.bin
7:54AM INF Success ip=127.0.0.1 latency="28.449µs" method=GET status=200 url=/readyz
7:55AM INF Success ip=127.0.0.1 latency="14.826µs" method=GET status=200 url=/readyz
7:56AM ERR Server error error="rpc error: code = Internal desc = grpc: error while marshaling: string field contains invalid UTF-8" ip=172.17.0.1 latency=2m39.930538614s method=POST status=500 url=/v1/audio/transcriptions

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions