# Uncovering Insights in Audio 

We will use two approaches 
1. 'Trad' RAG aka Traditional Retreival Augmented Generation (complete with splitting, tokenizing, embedding, and using for similarity search)
2. Prompt Stuffing (adding the entire transcript within the prompt and using the model to generate answers based on that)

---

## Setup: Transcription

Since both approaches require having a transcription, we'll go ahead and do that first.

### Transcribe the Audio

In [2]:
# Uses Whisper API from OpenAI to generate audio from text
import whisper 

**Load the model**

In [3]:
# Load the model from whisper (i have chosen to use the medium model, as it is more accurate than the base model)
model = whisper.load_model("medium")

**Add your Audio File**

In [4]:
audio = "BryanThe_Ideal_Republic.ogg"

**Transcribe the audio file**

Note: this may take some time (about a minute)...

In [5]:
# Run the transcription and save it to "result"
# Note: the original audio file is available on Wikipedia at this link: https://commons.wikimedia.org/wiki/File:A_J_Cook_Speech_from_Lansbury%27s_Labour_Weekly.ogg 
result = model.transcribe(audio, fp16=False)

# Print the transcription
print(result["text"])

 I can conceive of a national destiny which meets the responsibilities of today and measures up to the possibilities of tomorrow. Behold a republic resting securely upon the mountain of eternal truth. A republic applying in practice and proclaiming to the world the self-evident propositions that all men are created equal, that they are endowed with inalienable rights, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent of the governed. Behold a republic in which civil and religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand uplifted for a neighbor's injury. A republic in which every citizen is a sovereign, but in which no one cares to wear a crown. Behold a republic standing erect while empires all around or bow beneath the weight of their own armaments. A republic whose flag is love while other flags are only fears. Behold a republic increasing in population, in wealt

----

## Load in the Imports for the Entire Notebook

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OllamaEmbeddings
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.chains import LLMChain
from langchain.llms import Ollama

---

## Approach 1: Audio to RAG 
**Retreival Augmented Generation over audio files transcribed with Whisper API**

Now that we've transcribed the audio using the Whisper API, we can use the transcript to create a RAG prompt.

This section of the notebook leads you step-by-step through the process of:
* Tokenizing the text using the LangChain `RecursiveCharacterTextSplitter`
* Creating embeddings from the tokenized text using `Ollama Embeddings`
* Performing a similary search between query and vectorstore `docsearch.similarity_search`
* Creating an LangChain LLMChain which takes in context, query, and returns the answer, considering the context and query together

### 1.1: Split, Tokenize & Embed the text 
Tokenizing and Embeddings are created to split the transcription into smaller chunks and create embeddings for each chunk. This allows us to search for similar chunks in the vectorstore. This is a crucial step in RAG (retreival augmented generation), as it allows us to find the most similar chunks to the query, and then generate text based on the context of the query and the most similar chunks.

We use langchain to split the text with the RecursiveCharacterTextSplitter, which splits the text into smaller chunks recursively. We then create embeddings for each chunk using Ollama Embeddings. You can swap these individual components out for whatever text splitter or embeddings you'd prefer.

In [11]:
# Define the text to split
transcription = result["text"]

# Display the text
transcription

" I can conceive of a national destiny which meets the responsibilities of today and measures up to the possibilities of tomorrow. Behold a republic resting securely upon the mountain of eternal truth. A republic applying in practice and proclaiming to the world the self-evident propositions that all men are created equal, that they are endowed with inalienable rights, that governments are instituted among men to secure these rights, and that governments derive their just powers from the consent of the governed. Behold a republic in which civil and religious liberty stimulate all to earnest endeavor, and in which the law restrains every hand uplifted for a neighbor's injury. A republic in which every citizen is a sovereign, but in which no one cares to wear a crown. Behold a republic standing erect while empires all around or bow beneath the weight of their own armaments. A republic whose flag is love while other flags are only fears. Behold a republic increasing in population, in weal

**Split the text**

Here, the text is split into chunks of 100 characters, with overlaps of 20 characters

In [12]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=100,
    chunk_overlap=20,
)

texts = splitter.split_text(transcription)

In [17]:
# Let's take a look at the texts, notice how there is a bit of overlap between them
print(texts)

# So that we know how many texts we have:
print(f"\nThe length of the texts is {len(texts)}")

['I can conceive of a national destiny which meets the responsibilities of today and measures up to', 'and measures up to the possibilities of tomorrow. Behold a republic resting securely upon the', 'securely upon the mountain of eternal truth. A republic applying in practice and proclaiming to the', 'proclaiming to the world the self-evident propositions that all men are created equal, that they', 'equal, that they are endowed with inalienable rights, that governments are instituted among men to', 'among men to secure these rights, and that governments derive their just powers from the consent of', 'from the consent of the governed. Behold a republic in which civil and religious liberty stimulate', 'liberty stimulate all to earnest endeavor, and in which the law restrains every hand uplifted for a', "hand uplifted for a neighbor's injury. A republic in which every citizen is a sovereign, but in", 'a sovereign, but in which no one cares to wear a crown. Behold a republic standing erect

**Create the Embeddings**

In [19]:
# define the embeddings 

embeddings = OllamaEmbeddings()

**Add Text to the Vectorstore**

Here, I use FAISS as the vectorstore. Faiss is a library for efficient similarity search and clustering of dense vectors. You can also use alternatives like Pinecone, Qdrant, or Chroma.

In [28]:
# Create the vector store using the texts and embeddings and put it in a vector database

docsearch = FAISS.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))])

**Set the LLM Model & Prompt our LLM using Context from the Chroma Vectorstore**

In [27]:
# Define the local LLM Model we will use
llm = Ollama(model='llama2', temperature=0)

**Create the Prompt**

We'll use a QA chain (this is a chain for performing question-answering tasks with a retrieval component) - since we'd like to ask questions and get answers and continue asking questions.

In [37]:
#import chatprompttemplate 
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

In [38]:
# Create the RAG prompt
rag_prompt = ChatPromptTemplate(
    input_variables=['context', 'question'], 
    messages=[
        HumanMessagePromptTemplate(
            prompt=PromptTemplate(
                input_variables=['context', 'question'], 
                template="""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. 
                If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
                \nQuestion: {question} \nContext: {context} \nAnswer:"""
                )
        )
    ]
)

In [31]:
from langchain.chains.question_answering import load_qa_chain

# Chain
chain = load_qa_chain(llm, chain_type="stuff", prompt=rag_prompt)

**Set a Query**

In [32]:
# Define a query
query = "What is the idea of the republic?"

In [33]:
# Find similar documents to the search query 
docs = docsearch.similarity_search(query)

In [34]:
# Display the docs determined to be semantically similar to the query
docs

[Document(page_content='among men to secure these rights, and that governments derive their just powers from the consent of', metadata={'source': '5'}),
 Document(page_content='proclaiming to the world the self-evident propositions that all men are created equal, that they', metadata={'source': '3'}),
 Document(page_content='I can conceive of a national destiny which meets the responsibilities of today and measures up to', metadata={'source': '0'}),
 Document(page_content='increasing in population, in wealth, in strength, and in influence, solving the problems of', metadata={'source': '12'})]

**Generate a Response Using a Chain Completion Request**

In [35]:
# Set a response variable to the output of the chain
response = chain({"input_documents": docs, "question": query}, return_only_outputs=True)

**Display the response**

In [36]:
print(response["output_text"])

 Based on the context provided, the idea of the republic is to establish a system of government where power is held by the people, rather than by a monarch or elite group. The Declaration of Independence proclaims that governments derive their just powers from the consent of the governed, and that all men are created equal with inherent rights to life, liberty, and the pursuit of happiness. This idea is central to the concept of democracy and the notion of self-government, where citizens have a voice in how they are governed and can participate in shaping their country's destiny.


In [None]:
# Use the LLM to evaluate the response



## Alternative Approach: Abandoning 'Trad' RAG

#### Skip the Embeddings & Vectorstore and Place Entire Transcription into LLM Completion Request

For this test, I will dismiss the standard RAG approach with it's embeddings and vectorstore, and instead place the entire transcription into the LLM Completion Request. This will allow us to see how the LLM performs without the vectorstore and embeddings.

Note, this approach will not work for all use cases, as often transcriptions can be very long. However, considering that we have a short one to play with, it should be enough for some initial tests.