<a href="https://colab.research.google.com/github/jsg4hw/ML_fraud_classifier/blob/main/RAG_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%pip install feedparser


In [10]:
import feedparser
podcast_atom_link = "https://api.substack.com/feed/podcast/1084089.rss" # latent space podcast RSS feed and atom link
parsed = feedparser.parse(podcast_atom_link) # fetches and parses transcripts from ATOM link into something usable and structured
episode = [ep for ep in parsed.entries if ep['title'] == "RAG Is A Hack - with Jerry Liu from LlamaIndex"][0]

In [11]:
episode_summary = episode['summary']
print(episode_summary[:100])

<p><em>Want to help define </em><em>the AI Engineer stack</em><em>? >800 folks have weighed in on th


Parse the summary into HTML and get the transcript

In [None]:
!brew install libxml2
!brew install libxslt

In [None]:
pip install unstructured

In [12]:
# used to parse the summary from each episode, which is written in html.
from unstructured.partition.html import partition_html
parsed_summary = partition_html(text=''.join(episode_summary))

In [13]:
start_of_transcript = [x.text for x in parsed_summary].index("Transcript") + 1
print(f"First line of the transcript: {start_of_transcript}")

First line of the transcript: 75


BLOCK

In [None]:
pip install llama_index

In [8]:
from llama_index import Document

In [14]:
# uses the Llama index to convert our text into a Document object
documents = [Document(text=t.text) for t in parsed_summary[start_of_transcript:]]

In [16]:
!pip install -q condacolab
import condacolab
condacolab.install()

⏬ Downloading https://github.com/conda-forge/miniforge/releases/download/23.1.0-1/Mambaforge-23.1.0-1-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:17
🔁 Restarting kernel...


In [2]:
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [None]:
!conda install -c pytorch faiss-cpu=1.7.4 mkl=2021 blas=1.0=mkl

In [4]:
# Allows us to search our text data effectively
import faiss
d = 1536 # dimensions of text-ada-embedding-002, the embedding model that we're going to use
faiss_index = faiss.IndexFlatL2(d)

Specifying the embedding model and query model

In [28]:
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding
# Don't forget to specify the OpenAI API Key for both the LLM and the embedding model
from google.colab import userdata
openai_api_key=userdata.get("OPENAI_API_KEY")

embed_model = OpenAIEmbedding(api_key=openai_api_key)
llm = OpenAI(model="gpt-4", api_key=openai_api_key)

In [30]:
from llama_index import ServiceContext, set_global_service_context
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
set_global_service_context(service_context)

Embedding and Querying the Data

In [31]:
from llama_index import VectorStoreIndex, StorageContext
from llama_index.vector_stores.faiss import FaissVectorStore
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True)

Parsing nodes:   0%|          | 0/300 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/300 [00:00<?, ?it/s]

In [32]:
query = "What does Jerry think about RAG?"
response = index.as_query_engine(similarity_top_k=3).query(query) # includes only top 3 results

In [33]:
response.response

"Jerry believes that RAG increases transparency and visibility into documents. He also thinks that in the long term, fine-tuning might memorize some high-level concepts of knowledge, with RAG supplementing aspects that it doesn't know. He suggests that improvements to RAG could involve aspects like chunking and metadata."

In [34]:
# seeing the sources for the response and their scores
for node in response.source_nodes:
    print(f"{node.get_score()} 👉 {node.text}")

0.24235212802886963 👉 Jerry: So, so I think what RAG does is it increases like transparency, visibility into the actual documents, right. [00:26:19]
0.2501530945301056 👉 Jerry: I mean, I think in the longterm, like if like, this is kind of how fine tuning, like RAG evolves. Like I do think there'll be some aspect where fine tuning will probably memorize some high level concepts of knowledge, but then like RAG will just be there to supplement like aspects of that, that aren't work that don't, that, that it doesn't know.
0.28081560134887695 👉 Jerry: To improve rag, like everything that we talked about, like chunking, like metadata, like. [00:57:24]


To do some fine tuning, consider:


*   increasing the number of sources but specifying a similarity threshold (score) of 0.3
*   Make the chunks larger
*   Include the surrounding chunks




**Querying the data with chat and memory**

Now we've built this query interface, but that won't include is the history of our conversation. We can't ask follow up questions and get more information.
For that, we need to include a different chat_mode which specifies the behavior of the chat application. Here are the currently available chat modes (as of the time of this writing):
*    **best** - Turn the query engine into a tool, for use with a ReAct data agent or an OpenAI data agent, depending on what your LLM supports. OpenAI data agents require gpt-3.5-turbo or gpt-4 as they use the function calling API from OpenAI.
*    **openai** - Same as best, but forces an OpenAI data agent.
*    **context** - Retrieve nodes from the index using every user message. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.
*    **condense_question** - Look at the chat history and re-write the user message to be a query for the index. Return the response after reading the response from the query engine.
*    **simple** - A simple chat with the LLM directly, no query engine involved.
react - Same as best, but forces a ReAct data agent.


In [35]:
query = "What does Jerry think about RAG?"
chat_eng = index.as_chat_engine(similarity_top_k=3, chat_mode='context')
response = chat_eng.chat(query)

In [37]:
response.response

"Jerry believes that RAG (Retrieval-Augmented Generation) increases transparency and visibility into documents. He also thinks that in the long term, fine-tuning will likely memorize some high-level concepts of knowledge, with RAG supplementing aspects that it doesn't know. Jerry also suggests that to improve RAG, strategies like chunking and metadata could be considered."

In [38]:
chat_eng.chat_history # displays chat history that's stored as memory for chat

[ChatMessage(role=<MessageRole.USER: 'user'>, content='What does Jerry think about RAG?', additional_kwargs={}),
 ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content="Jerry believes that RAG (Retrieval-Augmented Generation) increases transparency and visibility into documents. He also thinks that in the long term, fine-tuning will likely memorize some high-level concepts of knowledge, with RAG supplementing aspects that it doesn't know. Jerry also suggests that to improve RAG, strategies like chunking and metadata could be considered.", additional_kwargs={})]

In [39]:
query_2 = "How does he think that it will evolve over time?"
response_2 = chat_eng.chat(query_2)

In [41]:
response_2.response

"Jerry thinks that over time, fine-tuning will probably memorize some high-level concepts of knowledge, and RAG will be there to supplement aspects that it doesn't know. This suggests that he believes RAG will evolve to fill in the gaps in knowledge that fine-tuning can't cover."

In [42]:
chat_eng.chat_history

[ChatMessage(role=<MessageRole.USER: 'user'>, content='What does Jerry think about RAG?', additional_kwargs={}),
 ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content="Jerry believes that RAG (Retrieval-Augmented Generation) increases transparency and visibility into documents. He also thinks that in the long term, fine-tuning will likely memorize some high-level concepts of knowledge, with RAG supplementing aspects that it doesn't know. Jerry also suggests that to improve RAG, strategies like chunking and metadata could be considered.", additional_kwargs={}),
 ChatMessage(role=<MessageRole.USER: 'user'>, content='How does he think that it will evolve over time?', additional_kwargs={}),
 ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content="Jerry thinks that over time, fine-tuning will probably memorize some high-level concepts of knowledge, and RAG will be there to supplement aspects that it doesn't know. This suggests that he believes RAG will evolve to fill in th