Whisper Self-Hosted LLM

This repo contains artifacts that can be used to build and run the Whisper LLM (Large Language Model) service locally on your laptops using containerization methods. These containerized LLM services can be used to help developers quickly prototype new LLM based applications, without the need for relying on any other externally hosted services.

Introduction to LLMs

A Large Language Model is a type of Artificial Intelligence that is trained on a massive dataset of text and code. This allows the model to learn statistical relationships between words and phrases which in turn allows it to generate text, translate languages, write creative content and answer your questions in an informative way.

Some common LLMs:

GPT3.5, GPT4
Gemini
Llama, Llama2

Setting up LLMs via Self-Hosting

The discussion surrounding LLMs has evolved, transitioning from "Should we utilize LLMs?" to "Should we opt for a self-hosted solution or rely on a proprietary off-the-shelf alternative?" Depending on your use-case, computational needs and engineering architecture availabilities you can decide whether to self-host your LLM.

Some benefits of self-hosting your LLM are:

Greater security, privacy, and compliance
Customization
Avoid vendor lock-in
Save computational costs
Easy to get started for those new to or just starting their journey with LLMs

LLM Use Cases

There are various applications for LLMs such as text generation, speech recognition and RAG applications. For the purpose of this demo, we are considering a speech recognition application:

Speech Recognition

Speech Recognition

We are using the OpenAI's Whisper model for speech recognition. Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs:

tiny
base
small
medium
large

We are using the compressed .ggml model binaries from HuggingFace which can be found here: https://huggingface.co/ggerganov/whisper.cpp.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
data		data
docs		docs
models		models
whisper-model-service		whisper-model-service
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

data

data

docs

docs

models

models

whisper-model-service

whisper-model-service

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Whisper Self-Hosted LLM

Introduction to LLMs

Setting up LLMs via Self-Hosting

LLM Use Cases

Speech Recognition

About

Releases

Packages

Contributors 2

Languages

License

redhat-et/whisper-self-hosted-llm

Folders and files

Latest commit

History

Repository files navigation

Whisper Self-Hosted LLM

Introduction to LLMs

Setting up LLMs via Self-Hosting

LLM Use Cases

Speech Recognition

About

Resources

License

Stars

Watchers

Forks

Languages