Skip to content

A virtual waifu that you can speak to through your mic and it'll speak back to you!

License

Notifications You must be signed in to change notification settings

jaxfry/OneReality

 
 

Repository files navigation

Contributors Forks Stargazers Issues GPL-3.0 License YouTube Discord


Logo

OneReality

Bridging the real and virtual worlds

Demo Video (Lipsynced to Megumin in VTube Studio) · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Roadmap
  4. License
  5. Acknowledgments

About The Project

Demo Click on image for demo video

A virtual waifu / assistant that you can speak to through your mic and it'll speak back to you! Has many features such as:

  • You can speak to her with a mic
  • It can speak back to you in Japanese or English
  • Has short-term memory (can remember things from the current conversation. Multi-conversation memory would take too long to respond and cost too much on the OpenAI API)
  • Can open apps as long as you specify the app path in the code
  • Smarter than you

More features I'm planning to add soon in the roadmap. Also, here's a summary of how it works for those of you who want to know:

First, the Python package SpeechRecognition recognizes what you say into your mic, then that speech is written into an audio (.wav) file, which is sent to OpenAI's Whisper speech-to-text transcription AI, and the transcribed result is printed in the terminal and written in a conversation.jsonl which the vector database hyperdb uses cosine similarity on to find 2 of the closest matches to what you said in the conversation.jsonl and appends that to the prompt to give Megumin context, the response is then passed through multiple NLE RTE and other checks to see if you want to open an app or do something with your smarthome, the prompt is then sent to llama.cpp, and the response from Megumin is printed to the terminal and appended to conversation.jsonl, and finally, the response is spoken by VITS TTS.

(back to top)

Built With

(back to top)

Getting Started

Video tutorial Here's how you can set it up on Windows (probably similar steps on Mac and Linux but I haven't tested them).

Prerequisites

  1. Purchase an OpenAI API key. It's extremely affordable, since it's pay as you go, and you only need it for whisper stt which is like $0.36 per hour of audio transcribed. Anyways, if you're talking to AI Megumin for more than an hour a month, that might be a you problem
  2. Install Python and set it as an environment variable in PATH
  3. Download the prerelease source code
  4. Install WSL2 by opening a cmd in admin and running wsl --install
  5. Set default distro version to WSL2 with wsl --set-default-version 2
  6. Install Ubuntu 22.04.2 WSL2 from Microsoft Store
  7. Create a Tuya cloud project if you want to control your smart devices with the AI, for example you can say 'Hey Megumin, can you turn on my LEDs' it's a bit complicated though and I'll probably make a video on it later because it's hard to explain through text, but here's a guide that should help you out: https://developer.tuya.com/en/docs/iot/device-control-practice?id=Kat1jdeul4uf8

Installation

  1. Check that Ubuntu is WSL2 and not WSL1 by running wsl -l -v in cmd. If it says 1, run wsl --set-version Ubuntu-22.04 2
  2. In the start menu, find the app Ubuntu 22.04.2 LTS and open it
  3. In the terminal that pops up, run git clone https://github.com/Plachtaa/VITS-fast-fine-tuning
  4. Install python3.8 on WSL2 with this guide
  5. Run python3.8 -m pip install cmake
  6. run cd VITS-fast-fine-tuning then run python3.8 -m pip install -r requirements.txt, if you have some building wheels error for pyopenjtalk, try python3.8 -m pip install pyopenjtalk==0.1.3 --no-build-isolation --no-cache-dir, this is a huge problem right now and may or may not work, which is a big part of why this is a prerelease. I'm trying to get this working without WSL2 or pyopenjtalk, but it's not easy
  7. Download an LLM and put it in your OneReality folder. Personally I used wizardlm-1.0-uncensored-llama2-13b.ggmlv3.q3_K_S.bin but it really depends on your hardware
  8. Extract the OneReality-main folder from prerequisites step 3 so that it is only one folder deep and rename the folder to just OneReality
  9. Install the Python dependencies with pip by cding into the folder and running pip install -r requirements.txt in cmd or powershell
  10. Download G_latest.pth and finetune_speaker.json from Huggingface and make a folder called model in the OneReality folder and put the two files in it
  11. Edit the variables in .env
  12. Run OneReality.bat and you're good to go! If you run into any issues, let me know on Discord and I might be able to help you. Once again, it's https://discord.gg/PN48PZEXJS
  13. When you want to stop, say goodbye, bye, or see you somewhere in your sentence because that automatically ends the program, otherwise you can just ctrl + c or close the window

(back to top)

Roadmap

  • Long-term memory
  • Virtual reality / augmented reality / mixed reality integration
  • Gatebox-style hologram
  • Animatronic body
  • Alexa-like smart home control
  • More languages for the AI's voice
    • Japanese
    • English
    • Korean
    • Chinese
    • Spanish
    • Indonesian
  • Mobile version
  • Easier setup
  • Compiling into one exe
  • Localized

(back to top)

License

Distributed under the GNU General Public License v3.0 License. See LICENSE.txt for more information.

(back to top)

Contact and Socials

E-mail: danu0518@gmail.com

YouTube: https://www.youtube.com/@OneReality-tb4ut

Discord: https://discord.gg/PN48PZEXJS

Project Link: https://github.com/DogeLord081/OneReality

(back to top)

Acknowledgments

(back to top)

About

A virtual waifu that you can speak to through your mic and it'll speak back to you!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Batchfile 0.6%