Introducing Vedant - a groundbreaking AI-powered podcast guest that revolutionizes the way we consume and engage with audio content. This project leverages cutting-edge technologies to deliver an interactive and dynamic experience, where listeners can pose questions and receive insightful, human-like responses from an advanced language model. Features
Powered by Google's Gemini large language model (LLM), Vedant generates contextually relevant and informative responses to listener questions. Speech Recognition: OpenAI's Whisper speech-to-text model accurately transcribes audio input, enabling seamless voice interactions. Contextual Awareness: The FAISS vector database retrieves relevant information from historical interactions and external knowledge sources, enhancing the contextual understanding of Vedant's responses.
spaCy, a robust NLP library, aids in understanding and processing listener queries. Expressive Speech Synthesis: SunoAI's BARK text-to-speech model converts Vedant's responses into natural-sounding human speech, creating an immersive audio experience. User-Friendly Interface: A Flask-based web application provides a platform for listeners to interact with Vedant through voice or text.
- Google Gemini: Large Language Model for generating human-like responses.
- OpenAI Whisper: Speech-to-text model for accurate audio transcription.
- FAISS: Vector database for efficient information retrieval.
- spaCy: Natural Language Processing library for text processing.
- SunoAI BARK: Text-to-speech model for natural-sounding speech synthesis.
- OpenAI Whisper: Tranformer based Speech-to-Text model by openAI for fast and accurate text synthesis.
- Flask: Web framework for developing the user interface.