A super simple tool for a chatbot with voice control
The whole code contains close to no logic in itself, rather it is mostly glue code between:
- getUserMedia and MediaRecorder to record the user's audio
- OpenAI's Whisper to convert the user audio into a question text
- Google's Gemma as an LLM to compute a answer text
- Huggingface's Transformers python lib to wrap around the LLM, or any model you want to use (just replace the
checkpointstring) - SpeechSynthesis to convert the answer text into audio
As of now it's way too basic to be practically used on a daily basis, but it serves as a POC for future applications (eg: LLM-powered local vocal chat in video games). It's also a surprisingly small repository: 85 lines for the python server, 58 lines for the web app
Install torch with GPU support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118Install dependencies
pip install -r requirements.txtRun the server
python server.pyNavigate to localhost:8080 when ready