Skip to content

Use Cases Vision Audio Documents

Mike edited this page May 28, 2026 · 1 revision

Cases: Vision, Audio, Documents

Подходит для локальных multimodal pipelines:

  • OCR и document layout;
  • image captioning / VQA;
  • object detection / segmentation;
  • speech-to-text;
  • TTS.

Для production проще начинать с native, для browser demo - с webgpu или web.

Clone this wiki locally