Skip to content

Tech Stack ‐ Whisper

mingming-ma edited this page Jan 20, 2024 · 1 revision

What is Whisper

OpenAI's Whisper is a neural network-based automatic speech recognition (ASR) system designed to transcribe speech from audio into text. It is built using machine learning techniques and has been trained on a diverse dataset to handle a variety of languages and accents.

Whisper examples

image image

link

Whisper stands out for its impressive speed and adept handling of diverse accents. This automatic speech recognition model processes and transcribes spoken language swiftly, making it ideal for real-time applications like transcription services and voice assistants. Additionally, Whisper showcases robustness in its ability to understand and interpret a wide range of accents, ensuring accurate and inclusive performance across various regional and cultural contexts.

Advantages

  • Multilingual Support: OpenAI's Whisper exhibits remarkable proficiency in handling multiple languages and dialects, showcasing a level of versatility essential for its integration into diverse global applications. Its capacity to understand and generate content across linguistic boundaries makes it an indispensable tool for projects with international scope.
  • Robustness: Whisper stands out for its robust design, demonstrating exceptional performance even in noisy environments and with various accents. This resilience ensures reliable functionality in real-world scenarios, where environmental factors and diverse speech patterns may present challenges to other models.
  • Open Source: A key feature of Whisper is its open-source nature. OpenAI has generously shared both the model and its code with the developer community, fostering a collaborative environment. This open approach not only encourages innovation but also enables developers to adapt and enhance the model to better suit the specific needs of their applications, contributing to the continuous evolution of this cutting-edge technology.

API Call in ChatCraft

src/lib/ai.ts

export const transcribe = async (audio: File) => {
  const { apiKey, apiUrl } = getSettings();
  if (!apiKey) {
    throw new Error("Missing OpenAI API Key");
  }

  const { openai } = createClient(apiKey, apiUrl);
  const transcriptions = new OpenAI.Audio.Transcriptions(openai);
  const transcription = await transcriptions.create({
    file: audio,
    model: "whisper-1",
  });
  return transcription.text;
};

Whisper.cpp

Whisper.cpp provides a C++ implementation for running the Whisper models released by OpenAI for speech recognition. This project allows developers to use the Whisper models in environments where C++ is preferred or required, such as in performance-critical applications or on platforms where Python is not available or ideal. Supported platforms:

  •  Mac OS (Intel and Arm)
  •  iOS
  •  Android
  •  Java
  •  Linux / FreeBSD
  •  WebAssembly
  •  Windows (MSVC and MinGW]
  •  Raspberry Pi
  •  docker