ReVoice is an end-to-end voice conversion system designed to enhance accessibility and usability through seamless integration of speech-to-text transcription and text-to-speech synthesis. This project combines state-of-the-art machine learning models to process audio input and generate high-quality, natural-sounding speech output.
- Speech-to-Text Transcription: Utilizes OpenAI's Whisper model for accurate and efficient transcription of audio files into text.
- Text-to-Speech Synthesis: Leverages Tacotron2 for mel-spectrogram generation and HiFi-GAN for realistic speech waveform synthesis.
- Languages: Python
- Libraries and Tools: Transformers, Librosa, SpeechBrain, PyTorch