# Uncovering Insights in Audio 

We will use two approaches 
1. 'Trad' RAG aka Traditional Retreival Augmented Generation (complete with splitting, tokenizing, embedding, and using for similarity search)
2. Prompt Stuffing (adding the entire transcript within the prompt and using the model to generate answers based on that)

---

## Setup: Transcription

Since both approaches require having a transcription, we'll go ahead and do that first.

### Transcribe the Audio

In [3]:
# Uses Whisper API from OpenAI to generate audio from text
import whisper 

**Load the model**

In [4]:
# Load the model from whisper (i have chosen to use the medium model, as it is more accurate than the base model)
model = whisper.load_model("medium")

**Add your Audio File**

In [5]:
audio = "BryanThe_Ideal_Republic.ogg"

**Transcribe the audio file**

Note: this may take some time (about a minute)...

In [6]:
# Run the transcription and save it to "result"
# Note: the original audio file is available on Wikipedia at this link: https://commons.wikimedia.org/wiki/File:A_J_Cook_Speech_from_Lansbury%27s_Labour_Weekly.ogg 
result = model.transcribe(audio, fp16=False)

# Print the transcription
print(result["text"])

 I can conceive of a national destiny which meets the responsibilities of today and measures up to the possibilities of tomorrow. Behold a republic resting securely upon the mountain of eternal truth. A republic applying in practice and proclaiming to the world the self-evident propositions that all men are created equal, that they are endowed with inalienable rights, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent of the governed. Behold a republic in which civil and religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand uplifted for a neighbor's injury. A republic in which every citizen is a sovereign, but in which no one cares to wear a crown. Behold a republic standing erect while empires all around or bow beneath the weight of their own armaments. A republic whose flag is love while other flags are only fears. Behold a republic increasing in population, in wealt

----

## Load in the Imports for the Entire Notebook

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OllamaEmbeddings
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.chains import LLMChain
from langchain.llms import Ollama

---

## Approach 1: Audio to RAG 
**Retreival Augmented Generation over audio files transcribed with Whisper API**

Now that we've transcribed the audio using the Whisper API, we can use the transcript to create a RAG prompt.

This section of the notebook leads you step-by-step through the process of:
* Tokenizing the text using the LangChain `RecursiveCharacterTextSplitter`
* Creating embeddings from the tokenized text using `Ollama Embeddings`
* Performing a similary search between query and vectorstore `docsearch.similarity_search`
* Creating an LangChain LLMChain which takes in context, query, and returns the answer, considering the context and query together

### 1.1: Split, Tokenize & Embed the text 
Tokenizing and Embeddings are created to split the transcription into smaller chunks and create embeddings for each chunk. This allows us to search for similar chunks in the vectorstore. This is a crucial step in RAG (retreival augmented generation), as it allows us to find the most similar chunks to the query, and then generate text based on the context of the query and the most similar chunks.

We use langchain to split the text with the RecursiveCharacterTextSplitter, which splits the text into smaller chunks recursively. We then create embeddings for each chunk using Ollama Embeddings. You can swap these individual components out for whatever text splitter or embeddings you'd prefer.

In [8]:
# Define the text to split
transcription = result["text"]

# Display the text
transcription

" I can conceive of a national destiny which meets the responsibilities of today and measures up to the possibilities of tomorrow. Behold a republic resting securely upon the mountain of eternal truth. A republic applying in practice and proclaiming to the world the self-evident propositions that all men are created equal, that they are endowed with inalienable rights, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent of the governed. Behold a republic in which civil and religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand uplifted for a neighbor's injury. A republic in which every citizen is a sovereign, but in which no one cares to wear a crown. Behold a republic standing erect while empires all around or bow beneath the weight of their own armaments. A republic whose flag is love while other flags are only fears. Behold a republic increasing in population, in weal

**Split the text**

Choosing different chunk sizes will change the number of chunks that are created. This is a hyperparameter that you can play around with to see what works best for your use case.

In [12]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
)

texts = splitter.split_text(transcription)

In [13]:
# Let's take a look at the texts, notice how there is a bit of overlap between them
print(texts)

# So that we know how many texts we have:
print(f"\nThe length of the texts is {len(texts)}")

['I can conceive of a national destiny which meets the responsibilities of today and measures up to the possibilities of tomorrow. Behold a republic resting securely upon the mountain of eternal truth. A republic applying in practice and proclaiming to the world the self-evident propositions that all men are created equal, that they are endowed with inalienable rights, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent', "derive their just powers from the consent of the governed. Behold a republic in which civil and religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand uplifted for a neighbor's injury. A republic in which every citizen is a sovereign, but in which no one cares to wear a crown. Behold a republic standing erect while empires all around or bow beneath the weight of their own armaments. A republic whose flag is love while other flags are only fears. Behold

**Create the Embeddings**

In [14]:
# define the embeddings 

embeddings = OllamaEmbeddings()

**Add Text to the Vectorstore**

Here, I use FAISS as the vectorstore. Faiss is a library for efficient similarity search and clustering of dense vectors. You can also use alternatives like Pinecone, Qdrant, or Chroma.

In [15]:
# Create the vector store using the texts and embeddings and put it in a vector database

docsearch = FAISS.from_texts(texts, embeddings, metadatas=[{"file": audio,"source": str(i)} for i in range(len(texts))])

**Set the LLM Model & Prompt our LLM using Context from the Chroma Vectorstore**

In [16]:
# Define the local LLM Model we will use
llm = Ollama(model='llama2', temperature=0)

**Create the Prompt**

We'll use a QA chain (this is a chain for performing question-answering tasks with a retrieval component) - since we'd like to ask questions and get answers and continue asking questions.

In [17]:
#import chatprompttemplate 
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

In [18]:
# Create the RAG prompt
rag_prompt = ChatPromptTemplate(
    input_variables=['context', 'question'], 
    messages=[
        HumanMessagePromptTemplate(
            prompt=PromptTemplate(
                input_variables=['context', 'question'], 
                template="""You answer questions about the contents of a transcribed audio file. 
                Use only the provided audio file transcription as context to answer the question. 
                Do not use any additional information.
                If you don't know the answer, just say that you don't know. Do not use external knowledge. 
                Use three sentences maximum and keep the answer concise. 
                Make sure to reference your sources with quotes of the provided context as citations.
                \nQuestion: {question} \nContext: {context} \nAnswer:"""
                )
        )
    ]
)

In [19]:
from langchain.chains.question_answering import load_qa_chain

# Chain
chain = load_qa_chain(llm, chain_type="stuff", prompt=rag_prompt)

### 1.2: Create a "trad rag" aka. Traditional Prompt

This traditional prompt means we take the query and get the top n most similar chunks from the vectorstore, and then use those chunks as the context for the LLM model. This is the traditional way of doing RAG, and is the most common way of doing RAG.

**Set a Query**

In [20]:
# Define a query
query = "What is the idea of the republic?"

In [21]:
# Find similar documents to the search query 
docs = docsearch.similarity_search(query)

In [22]:
# Display the docs determined to be semantically similar to the query
docs

[Document(page_content="progress and the accepted arbiter of the world's dispute. A republic whose history, like the path of the just, is as the shining light that shineth more and more unto the perfect day.", metadata={'file': 'BryanThe_Ideal_Republic.ogg', 'source': '3'}),
 Document(page_content='I can conceive of a national destiny which meets the responsibilities of today and measures up to the possibilities of tomorrow. Behold a republic resting securely upon the mountain of eternal truth. A republic applying in practice and proclaiming to the world the self-evident propositions that all men are created equal, that they are endowed with inalienable rights, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent', metadata={'file': 'BryanThe_Ideal_Republic.ogg', 'source': '0'}),
 Document(page_content="derive their just powers from the consent of the governed. Behold a republic in which civil and religious lib

### 1.3: Generate a response using "Trad RAG"

In [23]:
# Basic Stuff Rag prompt (manually created)

trad_rag_template = """You answer questions about the contents of a transcribed audio file. 
                Use only the provided audio file transcription as context to answer the question. 
                Do not use any additional information.
                If you don't know the answer, just say that you don't know. Do not use external knowledge. 
                Use three sentences maximum and keep the answer concise. 
                Make sure to cite references by referencing quotes of the provided context. Do not use any other knowledge.

                \nQuestion: {question} \nContext: {context} \nAnswer:"""

trad_prompt = PromptTemplate.from_template(trad_rag_template)

query = "What is the idea of the republic?"

trad_rag_prompt = trad_prompt.format(context=docs, question=query)

trad_answer = llm(trad_rag_prompt)

print(trad_answer)

 Based on the provided audio transcription, the idea of the republic is a nation that rests securely upon the mountain of eternal truth and applies in practice the self-evident propositions that all men are created equal, endowed with inalienable rights, and that governments derive their just powers from the consent of the governed. (Bryan, 0:00-0:15)

In this context, a republic is described as a nation where civil and religious liberty stimulate citizens to work hard and strive for excellence, while the law restrains those who would harm their neighbors. The republic is also characterized by the belief that every citizen is a sovereign, but no one desires to wear a crown. (Bryan, 0:15-0:30)

Furthermore, the transcription highlights the idea that a republic stands erect while empires crumble under the weight of their own armaments, and its flag is love instead of fear. The republic is also depicted as increasing in population, wealth, strength, and influence, solving civilization's p

### 1.4: Evaluate the Traditional Response

In [24]:
from langchain.prompts import PromptTemplate

evaluation_template = """
    Rate the answer: " {answer_trad} " to the question "{question} " given only the context provided by an audio file: "{context}". 
    The Rating should be between 1 (lowest score) and 10 (highest score), and contain a max-1 sentence explanation of the rating.
    The rating should be based on the quality of the answer considering that the answer was ONLY based on the context, and nothing else.
    Format the answer as starting with the rating, followed by a newline, followed by the explanation.
    "x/10 
    The question asked about xxx, and the context provided xxxx, and the answer was .... .
    In order to receive a full score, the answer should be ...." """

prompt = PromptTemplate.from_template(evaluation_template)

my_prompt = prompt.format(answer_trad=trad_answer, context=docs, question=query)

print(llm(my_prompt))


 Based on the provided audio transcription, I would rate the answer as an 8 out of 10. The answer provides a clear and detailed description of the idea of the republic according to the provided context, highlighting key aspects such as eternal truth, self-evident principles, civil and religious liberty, sovereignty, love, and progress. The answer also uses appropriate adjectives and phrasing to convey its ideas effectively.

However, to receive a perfect score, the answer could provide more specific examples or quotes from Bryan's speech to support its claims, as well as offer more insight into how these ideas are supposed to be applied in practice. Additionally, the answer could benefit from more detailed explanations of how these aspects of the republic are meant to contribute to its success and influence on the world stage.


---

## Alternative Approach: Abandoning 'Trad' RAG

#### Skip the Embeddings & Vectorstore and Place Entire Transcription into LLM Completion Request

For this test, I will dismiss the standard RAG approach with it's embeddings and vectorstore, and instead place the entire transcription into the LLM Completion Request. This will allow us to see how the LLM performs without the vectorstore and embeddings.

Note, this approach will not work for all use cases, as often transcriptions can be very long. However, considering that we have a short one to play with, it should be enough for some initial tests.

### 2.1: Create a Single Document Document which contains the entire transcription

In [29]:
from langchain.docstore.document import Document

transcript_doc=Document(
                page_content=transcription,
                metadata={"source": audio}
            )


In [30]:
transcript_doc

Document(page_content=" I can conceive of a national destiny which meets the responsibilities of today and measures up to the possibilities of tomorrow. Behold a republic resting securely upon the mountain of eternal truth. A republic applying in practice and proclaiming to the world the self-evident propositions that all men are created equal, that they are endowed with inalienable rights, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent of the governed. Behold a republic in which civil and religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand uplifted for a neighbor's injury. A republic in which every citizen is a sovereign, but in which no one cares to wear a crown. Behold a republic standing erect while empires all around or bow beneath the weight of their own armaments. A republic whose flag is love while other flags are only fears. Behold a republic increasing 

### 2.2: Create a Prompt which contains the entire transcription

In [31]:
# Basic Stuff Rag prompt (manually created)

alt_rag_template = """You answer questions about the contents of a transcribed audio file. 
                Use only the provided audio file transcription as context to answer the question. 
                Do not use any additional information.
                If you don't know the answer, just say that you don't know. Do not use external knowledge. 
                Use three sentences maximum and keep the answer concise. 
                Make sure to cite references by referencing quotes of the provided context. Do not use any other knowledge.
                
                \nQuestion: {question} \nContext: {context} \nAnswer:"""

alt_prompt = PromptTemplate.from_template(alt_rag_template)

query = "What is the idea of the republic?"

alt_rag_prompt = alt_prompt.format(context=transcript_doc, question=query)

answer_alt = llm(alt_rag_prompt)

print(answer_alt)

 Based on the provided transcription, the idea of the republic is a nation that is grounded in eternal truth and applies self-evident propositions to govern its people. The republic is characterized by civil and religious liberty, which stimulate citizens to work hard and lawfully restrain any harmful actions towards others. Additionally, every citizen is considered a sovereign, but no one desires to wear a crown. The republic is also distinct in that it does not rely on fear or armaments for its strength, but rather on love and moral example-making. Furthermore, the republic's history is like a shining light that continues to illuminate the path towards perfection. (Bryan, 1899)


### 2.3: Evaluate the Alternative Response

In [32]:

evaluation_template = """
    Rate the answer: " {answer_alt} " to the question "{question} " given only the context provided by an audio file: "{context}". 
    The Rating should be between 1 (lowest score) and 10 (highest score), and contain a max-1 sentence explanation of the rating.
    The rating should be based on the quality of the answer considering that the answer was ONLY based on the context, and nothing else.
    Format the answer as starting with the rating, followed by a newline, followed by the explanation.
    "x/10 
    The question asked about xxx, and the context provided xxxx, and the answer was .... .
    In order to receive a full score, the answer should be ...." """

prompt = PromptTemplate.from_template(evaluation_template)

my_prompt = prompt.format(answer_alt=answer_alt, context=transcript_doc, question=query)

print(llm(my_prompt))

 Sure! Here is my rating and explanation:

7/10
The answer provides some general information about the idea of a republic based on the provided context. However, it does not go into depth or provide specific details that could help further illuminate the concept of a republic. Additionally, the answer does not provide any evidence or examples to support its claims, which limits its usefulness as an accurate representation of the idea of a republic. To receive a higher score, the answer should provide more specific and detailed information about the concept of a republic, as well as evidence to support its claims.


---

# Traditional RAG vs. Alternative RAG: Compare the Answers to Each Other

In [33]:

compare_answers_evaluation_template = """
    Compare the following two answers: 
    Answer "Trad": "{answer_trad}" 
    Answer "Alt": "{answer_alt}"
    which were provide in response to the question: {question}.
    
    Start by saying which answer you think is better, and then explain why you think so.
    Explain the reasoning of your comparison in a max-1 sentence explanation.
    """

compare_prompt = PromptTemplate.from_template(compare_answers_evaluation_template)

my_prompt = compare_prompt.format(answer_trad=trad_answer, answer_alt=answer_alt, question=query)

print(llm(my_prompt))

 I think Answer "Alt" is better than Answer "Trad". The reason for this is that Answer "Alt" provides more nuanced and detailed explanations of the republic's character, while Answer "Trad" is somewhat vague and broad in its descriptions.

In particular, Answer "Alt" highlights the importance of civil and religious liberty in the republic, as well as the sovereignty of each citizen. It also emphasizes the distinction between the republic's reliance on love and moral example-making versus fear or armaments for its strength. Additionally, it provides a more detailed historical context for the idea of the republic, noting that its history is like a shining light that continues to illuminate the path towards perfection.

In contrast, Answer "Trad" is somewhat generic and does not provide as much depth or detail in its descriptions of the republic. While it highlights the importance of eternal truth and self-evident propositions, it does not clarify how these principles are applied in pract

## Conclusion

My conclusion is that the alternative RAG approach is better than the traditional RAG, as it is able to generate more coherent answers. This is likely because the traditional RAG approach is limited by the vectorstore and embeddings, which are not able to capture the context of the query as well as the alternative approach, which uses the entire transcription as the context.