A framework to create a RAG assistant for videos.
- Get text from video
- Youtube API (unofficial)
- GCP StT automatic
- Local StT models (abstract class)
- Function to perform embedding (multilingual gecko)
- Lift Vector DB (vector search? or Chroma?)
- RAG prompt (query answer)
- Generate citations to document/chunk (Check grounding? Citation?)
- Get starting time for a chunk
- Wrap up all together
- Implement Semantic Chunking to create chunks (default is len words)
- Optimize StT by adding punctuation (LLM calling)
- Dynamically select the prompt by using the language in the video (worth it? what if query is in a != language?)
- Enable RAG fusion
1 - El ORIGEN de los distintos ACENTOS de Argentina, id: NgbEL2HbXWw