<a target="_blank" href="https://colab.research.google.com/github/sergiopaniego/RAG_local_tutorial/blob/main/whisper_rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# RAG example with an audio

In this example, we use an audio as information source for the conversation. 

**Please, complete the example_rag.ipynb first to get more insight.**

Imagine a situation where you directly communicate through voice with the LLM.

In this case, we use Whisper, a general-purpose speech recognition model by OpenAI to convert an audio to text.

From that text, we can generate a conversation with the LLM model to extract the required information.

<p align="center">
  <img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2023/07/langchain3.png" alt="Langchain Logo" width="20%">
  <img src="https://bookface-images.s3.amazonaws.com/logos/ee60f430e8cb6ae769306860a9c03b2672e0eaf2.png" alt="Ollama Logo" width="20%">
  <img src="https://static-00.iconduck.com/assets.00/openai-icon-2021x2048-4rpe5x7n.png" alt="OpenAI Logo" width="20%">
</p>


Sources:

* Whisper details: https://github.com/openai/whisper
* https://github.com/svpino/llm



## First, we install Whisper and its dependencies

In [None]:
!pip3 install openai-whisper

### We install ffmpeg version depending on the platform where we'll be running the example

In [None]:
# on Ubuntu or Debian
#sudo apt update && sudo apt install ffmpeg

# on Arch Linux
#sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
!brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
#choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
#scoop install ffmpeg

## Given an example audio, we transcribe it to text so we can use it.

The example audio should be downloaded locally. Change the file name with your own example.

In [None]:
import whisper

whisper_model = whisper.load_model("base")

def transcribe_audio(audio_path):
    result = whisper_model.transcribe(audio_path)
    return result['text']

# https://commons.wikimedia.org/wiki/File:Audio_Kevin_Folta.wav
# Audio example
audio_path = "./files/Audio_Kevin_Folta.wav" # CHANGE THIS FILE

# Transcribir el audio
transcribed_text = transcribe_audio(audio_path)

print(transcribed_text)


# The rest of the code is essentially the same as in the first RAG example

We instantiate the model, we then generate the PromptTemplate where the transcribed audio is used as context and we can ask questions.

In [None]:
MODEL = "llama3"
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

In [None]:
from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, answer with "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)

In [None]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

In [None]:
while True:
    print("Say 'exit' or 'quit' to exit the loop")
    question = input('User question: ')
    print(f"Question: {question}")
    if question.lower() in ["exit", "quit"]:
        print("Exiting the conversation. Goodbye!")
        break
    formatted_prompt = prompt.format(context=transcribed_text, question=question)
    response_from_model = model.invoke(formatted_prompt)
    parsed_response = parser.parse(response_from_model)
    print(f"Answer: {parsed_response}")
    print()