<a target="_blank" href="https://colab.research.google.com/github/sergiopaniego/RAG_local_tutorial/blob/main/whisper_rag.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# RAG example with an audio

In this example, we use an audio as information source for the conversation. 

**Please, complete the example_rag.ipynb first to get more insight.**

Imagine a situation where you directly communicate through voice with the LLM.

In this case, we use Whisper, a general-purpose speech recognition model by OpenAI to convert an audio to text.

From that text, we can generate a conversation with the LLM model to extract the required information.

<p align="center">
  <img src="https://cdn.analyticsvidhya.com/wp-content/uploads/2023/07/langchain3.png" alt="Langchain Logo" width="20%">
  <img src="https://bookface-images.s3.amazonaws.com/logos/ee60f430e8cb6ae769306860a9c03b2672e0eaf2.png" alt="Ollama Logo" width="20%">
  <img src="https://static-00.iconduck.com/assets.00/openai-icon-2021x2048-4rpe5x7n.png" alt="OpenAI Logo" width="20%">
</p>


Sources:

* Whisper details: https://github.com/openai/whisper
* https://github.com/svpino/llm

# Requirements

* Ollama installed locally


## First, we install Whisper and its dependencies

In [None]:
!pip3 install langchain
!pip3 install langchain_pinecone
!pip3 install langchain[docarray]
!pip3 install docarray
!pip3 install pypdf

In [None]:
!pip3 install openai-whisper

### We install ffmpeg version depending on the platform where we'll be running the example

In [None]:
# on Ubuntu or Debian
#sudo apt update && sudo apt install ffmpeg

# on Arch Linux
#sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
!brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
#choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
#scoop install ffmpeg

# Select the LLM model to use

The model must be downloaded locally to be used, so if you want to run llama3, you should run:

```

ollama pull llama3

```

Check the list of models available for Ollama here: https://ollama.com/library

## Given an example audio, we transcribe it to text so we can use it.

The example audio should be downloaded locally. Change the file name with your own example.

In [21]:
import whisper

whisper_model = whisper.load_model("base")

def transcribe_audio(audio_path):
    result = whisper_model.transcribe(audio_path)
    return result['text']

# https://commons.wikimedia.org/wiki/File:Audio_Kevin_Folta.wav
# Audio example
audio_path = "./files/Audio_Kevin_Folta.wav" # CHANGE THIS FILE

# Transcribir el audio
transcribed_text = transcribe_audio(audio_path)

print(transcribed_text)

with open("./files/whisper_transcription.txt", "a") as file:
    file.write(transcribed_text)




 Hi, my name is Kevin Fowler and I'm very grateful to have the opportunity to work in public science. I'm very fortunate to have been able to be trained by excellent mentors early in my career to give me a toolbox to be able to unravel important questions that can help us better understand our physical universe. I'm really excited about the opportunity to participate in the science that works around agriculture and find out ways that we can farm more sustainably, working in ways that we can help limit environmental impacts of farming, but also providing profitable ways for farmers to stay in business. We're really excited about the technologies that we create creating better foods for the American consumer and industrialized world consumer, but also getting to people in need so that they can have more selection of better fruits and vegetables. Going forward, I think that the future will depend upon the integration of all technologies at the table and it's really important that we pay a

# We instantiate the model and the embeddings

In [17]:
#MODEL = "gpt-3.5-turbo"
#MODEL = "mixtral:8x7b"
#MODEL = "gemma:7b"
#MODEL = "llama2"
MODEL = "llama3" # https://ollama.com/library/llama3

In [18]:
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings


model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

# We load the transcription previously saved using TextLoader

In [22]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("./files/whisper_transcription.txt")
text_documents = loader.load()
text_documents

[Document(page_content=" Hi, my name is Kevin Fowler and I'm very grateful to have the opportunity to work in public science. I'm very fortunate to have been able to be trained by excellent mentors early in my career to give me a toolbox to be able to unravel important questions that can help us better understand our physical universe. I'm really excited about the opportunity to participate in the science that works around agriculture and find out ways that we can farm more sustainably, working in ways that we can help limit environmental impacts of farming, but also providing profitable ways for farmers to stay in business. We're really excited about the technologies that we create creating better foods for the American consumer and industrialized world consumer, but also getting to people in need so that they can have more selection of better fruits and vegetables. Going forward, I think that the future will depend upon the integration of all technologies at the table and it's really

# We explit the document into chunks

In [24]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_documents = text_splitter.split_documents(text_documents)

[Document(page_content="Hi, my name is Kevin Fowler and I'm very grateful to have the opportunity to work in public", metadata={'source': './files/whisper_transcription.txt'}),
 Document(page_content="to work in public science. I'm very fortunate to have been able to be trained by excellent mentors", metadata={'source': './files/whisper_transcription.txt'}),
 Document(page_content='excellent mentors early in my career to give me a toolbox to be able to unravel important questions', metadata={'source': './files/whisper_transcription.txt'}),
 Document(page_content="important questions that can help us better understand our physical universe. I'm really excited", metadata={'source': './files/whisper_transcription.txt'}),
 Document(page_content="I'm really excited about the opportunity to participate in the science that works around", metadata={'source': './files/whisper_transcription.txt'})]

In [None]:
text_documents

# Store the PDF in a vector space.

From Langchain docs:

`DocArrayInMemorySearch is a document index provided by Docarray that stores documents in memory. It is a great starting point for small datasets, where you may not want to launch a database server.`

The execution time of the following block depends on the complexity and longitude of the PDF provided. Try to keep it small and simple for the example.

In [25]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(text_documents, embedding=embeddings)



In [26]:
retriever = vectorstore.as_retriever()

# Generate the conversation template

In [27]:
from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, answer with "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)

# We instantiate the parser

In [28]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

# We can now extract the information from the audio!

In [30]:
while True:
    print("Say 'exit' or 'quit' to exit the loop")
    question = input('User question: ')
    print(f"Question: {question}")
    if question.lower() in ["exit", "quit"]:
        print("Exiting the conversation. Goodbye!")
        break
    retrieved_context = retriever.invoke(question)
    formatted_prompt = prompt.format(context=retrieved_context, question=question)
    response_from_model = model.invoke(formatted_prompt)
    parsed_response = parser.parse(response_from_model)
    print(f"Answer: {parsed_response}")
    print()

Say 'exit' or 'quit' to exit the loop
Question: What's his name?
Answer: Kevin Fowler.

Say 'exit' or 'quit' to exit the loop
Question: exit
Exiting the conversation. Goodbye!
