## Audio to RAG 
**Retreival Augmented Generation over audio files transcribed with Whisper API**

This Jupyter notebook takes you on a step-by-step journey to:
* convert audio to text using the `OpenAI Whisper API` (audio to text)
* Tokenize the text using the LangChain `RecursiveCharacterTextSplitter`
* Create embeddings from the tokenized text using `Ollama Embeddings`
* Similary search between query and vectorstore `docsearch.similarity_search`
* Create an LangChain LLMChain which takes in context, query, and returns a result (RAG)

### Transcribe the Audio

In [1]:
# Uses Whisper API from OpenAI to generate audio from text
import whisper 

In [2]:
# Load the base model from whisper
model = whisper.load_model("base")

**Add your Audio File**

In [None]:
audio = "BryanThe_Ideal_Republic.ogg"

**Transcribe the audio file**

In [3]:
# Run the transcription and save it to "result"
# Note: audio file is available here: https://commons.wikimedia.org/wiki/File:A_J_Cook_Speech_from_Lansbury%27s_Labour_Weekly.ogg 
result = model.transcribe(audio, fp16=False)
print(result["text"])

 I can conceive of a national detonation which meets the responsibilities of the days and measures up to the possibilities of tomorrow. Before they republic resting securely upon the mountain of eternal truth, a republic applying in practice and propaming to the world the self-evident propositions that all men are created equal, that they are endowed with an aid in a right, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent of the government, behold a republic in which civil and religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand of blipst its foreign neighbors in direct. A republic in which every citizen is a sovereign, but in which no one cares to wear a crown. The whole of the republic standing erects file empires all around or bow beneath the weight of their own armaments. A republic whose flag is loved by other flags are only tears. The whole of the republic, in

### Tokenize & Embed the text 
Tokenizing and Embeddings are created to split the transcription into smaller chunks and create embeddings for each chunk. This allows us to search for similar chunks in the vectorstore. This is a crucial step in RAG (retreival augmented generation), as it allows us to find the most similar chunks to the query, and then generate text based on the context of the query and the most similar chunks.

We use langchain to split the text with the RecursiveCharacterTextSplitter, which splits the text into smaller chunks recursively. We then create embeddings for each chunk using Ollama Embeddings. You can swap these individual components out for whatever text splitter or embeddings you'd prefer.

In [47]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OllamaEmbeddings
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.chains import LLMChain
from langchain.llms import Ollama

In [26]:
# Define the text to split
transcription = result["text"]

In [28]:
transcription

" I can conceive of a national detonation which meets the responsibilities of the days and measures up to the possibilities of tomorrow. Before they republic resting securely upon the mountain of eternal truth, a republic applying in practice and propaming to the world the self-evident propositions that all men are created equal, that they are endowed with an aid in a right, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent of the government, behold a republic in which civil and religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand of blipst its foreign neighbors in direct. A republic in which every citizen is a sovereign, but in which no one cares to wear a crown. The whole of the republic standing erects file empires all around or bow beneath the weight of their own armaments. A republic whose flag is loved by other flags are only tears. The whole of the republic, i

In [29]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
)

texts = splitter.split_text(transcription)

In [30]:
# Print the texts to get an initial look
texts

['I can conceive of a national detonation which meets the responsibilities of the days and measures',
 'days and measures up to the possibilities of tomorrow. Before they republic resting securely upon',
 'securely upon the mountain of eternal truth, a republic applying in practice and propaming to the',
 'propaming to the world the self-evident propositions that all men are created equal, that they are',
 'that they are endowed with an aid in a right, that governments are instituted among men to secure',
 'among men to secure these rights, and that governments derive their just powers from the consent of',
 'from the consent of the government, behold a republic in which civil and religious liberty',
 'religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand of',
 'every hand of blipst its foreign neighbors in direct. A republic in which every citizen is a',
 'every citizen is a sovereign, but in which no one cares to wear a crown. The whole of the

In [31]:
len(texts)

20

**Embeddings**

In [32]:
# define the embeddings 

embeddings = OllamaEmbeddings()

**Add our text to a Vectorstore**

Here, I use FAISS as the vectorstore. Faiss is a library for efficient similarity search and clustering of dense vectors. You can also use alternatives like Pinecone, Qdrant, or Chroma.

In [48]:
# Create the vector store using the texts and embeddings and put it in a Chroma db

docsearch = FAISS.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))])

**Set the LLM Model & Prompt our LLM using Context from the Chroma Vectorstore**

In [68]:
# Define the local LLM Model we will use
llm = Ollama(model='llama2')

**Create the Prompt**

We'll use a qa chain - since we'd like to ask questions and get answers and continue asking questions.

In [69]:
# Prompt 
from langchain import hub
rag_prompt = hub.pull("rlm/rag-prompt")
from langchain.chains.question_answering import load_qa_chain


In [70]:
# Chain
chain = load_qa_chain(llm, chain_type="stuff", prompt=rag_prompt)

**Set a Query**

In [81]:
# Define a query
query = "What are the self-evident propositions in this speech?"

In [82]:
# Find similar documents to the search query 
docs = docsearch.similarity_search(query)

In [83]:
# Display the docs determined to be semantically similar to the query
docs

[Document(page_content='among men to secure these rights, and that governments derive their just powers from the consent of', metadata={'source': '5'}),
 Document(page_content='weight of their own armaments. A republic whose flag is loved by other flags are only tears. The', metadata={'source': '11'}),
 Document(page_content='propaming to the world the self-evident propositions that all men are created equal, that they are', metadata={'source': '3'}),
 Document(page_content='the coming of a universal brotherhood. A republic which shakes roads and dissolves their', metadata={'source': '14'})]

**Generate a Response Using a Chain Completion Request**

In [84]:
# Set a response variable to the output of the chain
response = chain({"input_documents": docs, "question": query}, return_only_outputs=True)

**Display the response**

In [85]:
print(response["output_text"])

 Based on the provided context, the self-evident propositions in the speech are:

1. All men are created equal.
2. Governments derive their just powers from the consent of the governed.
3. The republic is a symbol of the universal brotherhood of mankind.
