# Retrieval augmented generation (RAG)

## Loading Documents
A first step in RAG is to load document. You need a loader that supports the document type you are interested in. We use in this example Langchain, because it includes a collection of 60+ libraries for multiple types of documents and formats.

A first example with the `PyPDFLoader` library. Pdf support is direct and a single command is enough.

In [1]:
# For this loading Documents part, you may need these packages installed

#!pip install langchain
#!pip install -U langchain-community

In [2]:
import warnings # optional, disabling warnings about versions and others
warnings.filterwarnings('ignore') # optional, disabling warnings about versions and others

#!pip install pypdf 

from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("docs/War-of-the-Worlds.pdf")
book = loader.load()

In [3]:
# How long is the document we loaded?
len(book)

128

In [4]:
#Looking at a small extract, one page, and a few hundred characters in that page
page = book[5]
print(page.page_content[0:500])

darkness were Ottershaw and Chertsey and a ll their hundreds of people, sleeping in 
peace.  
   He was full of speculation that night a bout the condition of Mars, and scoffed at the 
vulgar idea of its having in- habitants w ho were signalling us. His idea was that 
meteorites might be falling in a heavy shower upon the planet, or that a huge volcanic 
explosion was in progress. He pointed out to me how unlikely it was that organic 
evolution had taken the same direction in the two adjacent pl


In [5]:
#Which page is it, from which document?
page.metadata

{'source': 'docs/War-of-the-Worlds.pdf', 'page': 5}

A second example with a Youtube video. There is a little more work here. The yt_dlp library will need options to know what audio format to download (we won't care much about the video part). Here we use m4a, at 192 kbps. Then the ffmpeg and ffprobe programs will isolate and stream the audio part. We will then use the OpenAI whisper library to covnert the audio into text (speech-to-text).

In [6]:
#! pip install yt_dlp
#! pip install pydub
#!pip install ffmpeg
#!pip install ffprobe
#!pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

import os
import whisper
from yt_dlp import YoutubeDL

# Step 1: Set up the download options
url = "https://www.youtube.com/watch?v=2vkJ7v0x-Fs"
save_dir = "docs/youtube/"
output_template = os.path.join(save_dir, '%(title)s.%(ext)s')

ydl_opts = {
    'format': 'bestaudio/best',
    'outtmpl': output_template,  # Save the file to the specified directory with a title-based name
    'postprocessors': [{
        'key': 'FFmpegExtractAudio',
        'preferredcodec': 'm4a',  # You can change this to mp3 if you prefer
        'preferredquality': '192',
    }],
    'ffmpeg_location': '/opt/homebrew/bin/ffmpeg',  # Specify the location of ffmpeg
}

# Step 2: Download the audio from the YouTube video
with YoutubeDL(ydl_opts) as ydl:
    ydl.download([url])

# Step 3: Find the downloaded file
downloaded_file = [f for f in os.listdir(save_dir) if f.endswith('.m4a')][0]  # Assuming m4a, adjust if using mp3
downloaded_file_path = os.path.join(save_dir, downloaded_file)

# Step 4: Load the Whisper model
model = whisper.load_model("base")  # You can choose 'tiny', 'base', 'small', 'medium', or 'large'

# Step 5: Transcribe the audio file
result = model.transcribe(downloaded_file_path)


[youtube] Extracting URL: https://www.youtube.com/watch?v=2vkJ7v0x-Fs
[youtube] 2vkJ7v0x-Fs: Downloading webpage
[youtube] 2vkJ7v0x-Fs: Downloading ios player API JSON
[youtube] 2vkJ7v0x-Fs: Downloading web creator player API JSON
[youtube] 2vkJ7v0x-Fs: Downloading m3u8 information
[info] 2vkJ7v0x-Fs: Downloading 1 format(s): 251
[download] Destination: docs/youtube/Big Data Architectures.webm
[download] 100% of   22.03MiB in 00:00:02 at 10.86MiB/s    
[ExtractAudio] Destination: docs/youtube/Big Data Architectures.m4a
Deleting original file docs/youtube/Big Data Architectures.webm (pass -k to keep)


In [7]:
# Adding metadata to the transcript, and saving the transcript to a file so we can use it outside of this program.
class Document:
    def __init__(self, source, text, metadata=None):
        self.source = source
        self.page_content = text
        self.metadata = metadata or {}

# Wrap the transcription result in the Document class with metadata
document = Document(
    source=downloaded_file_path,
    text=result['text'], 
    metadata={"source": "youtube", "file_path": downloaded_file_path}
)
#Save the transcript to a text file
transcript_file_path = os.path.join(save_dir, 'transcript.txt')
with open(transcript_file_path, 'w') as f:
    f.write(result['text'])

print(f"Transcript saved to {transcript_file_path}")


Transcript saved to docs/youtube/transcript.txt


In [8]:
# how many characters in this transcript file?
len(document.page_content)

32857

In [9]:
# Print the first 500 characters of the transcript
print(document.page_content[:500])


 In lesson four, we will go deeper into architectures for big data, and we will take a closer look at some of the most popular big data management systems. First, we're going to look at how the big data management system framework looks, and explore the commonalities that pretty much all the big data systems have, as well as some of the key differences between no SQL, MPP, and Hadoop. Next, we're going to take a deep dive into the Hadoop data management system. You will see how we both store dat


## Splitting our documents in chunks
A second step is to split our documents (a 128-page book and 32K-character trasncript file) into smaller chunks. We use Langchain libraries here again.

In [10]:
# We will first use 2 libraries, the first one is the most important, but we'll look at both
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

In [11]:
# Chunks have a character length, and an overlap values. For example (in real life, you are probably closer to 500 to 1000 and 50 to 100 respectively):
chunk_size =20
chunk_overlap = 5

In [12]:
# Let's compare the character splitter and the recursive character splitter to give you an intuition on how splitting works. Let's create a function for each:
rsplit = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)
csplit = CharacterTextSplitter(
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)

In [13]:
# Let's take an example string
text1 = 'abcdefghijklmnopqrstuvwxyz1234567890'

In [14]:
rsplit.split_text(text1)

['abcdefghijklmnopqrst', 'pqrstuvwxyz123456789', '567890']

In [15]:
csplit.split_text(text1)

['abcdefghijklmnopqrstuvwxyz1234567890']

In [16]:
# Character splitter does not do anything, because it considers by default the end of paragraph as the separator. You can specify additional separators. THis is true for any splitter, including REcursive Character. Let's add more separators to rsplit:
# The separateors are by descending order of preference (try the first, if you can't get a chunk within the right size, try the second, etc.)
rsplit = RecursiveCharacterTextSplitter(
    chunk_size=20,
    chunk_overlap=5,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""]
)

In [17]:
text2 = 'a b c d e f g h i j k l m n o p q r s t u v w x y z 1 2 3 4 5 6 7 8 9 0'

In [18]:
rsplit.split_text(text1)

['abcdefghijklmnopqrst', 'pqrstuvwxyz123456789', '567890']

In [19]:
rsplit.split_text(text2)

['a b c d e f g h i j',
 'i j k l m n o p q r',
 'q r s t u v w x y z',
 'y z 1 2 3 4 5 6 7 8',
 '7 8 9 0']

In [20]:
Hamlet = """Truly to speak, and with no addition, \
We go to gain a little patch of ground \
That hath in it no profit but the name. \
To pay five ducats, five, I would not farm it; \
Nor will it yield to Norway or the Pole \
A ranker rate, should it be sold in fee."""

In [21]:
rsplit.split_text(Hamlet)

['Truly to speak, and',
 'and with no',
 'no addition, We go',
 'go to gain a little',
 'patch of ground',
 'That hath in it no',
 'no profit but the',
 'the name. To pay',
 'pay five ducats,',
 'five, I would not',
 'not farm it; Nor',
 'Nor will it yield',
 'to Norway or the',
 'the Pole A ranker',
 'rate, should it be',
 'be sold in fee.']

In [22]:
# Let's go for a more realistic chunk size
rsplit = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""]
)

In [23]:
# Looking at the files, first the pdf
rdoc1 = rsplit.split_documents(book)

In [24]:
len(rdoc1)

956

In [25]:
# the splitted version has more documents (pages) than the original pdf source, 
len(book)

128

In [26]:
#Printing a few splits
for i, doc in enumerate(rdoc1[30:33]):  # Adjust the number 3 to print more or fewer splits
    print(f"--- Split {i + 1} ---")
    print(doc.page_content)
    print()  # Print an empty line for better readability


--- Split 1 ---
small and still, faintly marked with transver se stripes, and slightly flattened from the 
perfect round. But so little it was, so silvery warm--a pin's-head of li ght! It was as if it 
quivered, but really this was the telescope vi brating with the activity of the clockwork 
that kept the planet in view.     As I watched, the planet seemed to grow larger and smaller and to advance and recede, 
but that was simply that my eye was tired. Forty millions of miles it was from us--more

--- Split 2 ---
but that was simply that my eye was tired. Forty millions of miles it was from us--more 
than forty millions of miles of void. Few people realise the im- mensity of vacancy in 
which the dust of the material universe swims.  
   Near it in the field, I re member, were three faint points of  light, three telescopic stars 
infinitely remote, and all around it was th e unfathomable darkness of empty space. You

--- Split 3 ---
infinitely remote, and all around it was th e unfatho

In [27]:
# Splitting the trasncript of the audio file
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

# Step 1: Load the transcript text
transcript_file_path = "docs/youtube/transcript.txt"
with open(transcript_file_path, 'r') as f:
    transcript_text = f.read()

# Step 2: Create a Document object
document = Document(page_content=transcript_text)

# Step 3: Split the transcript into chunks
rdoc2 = rsplit.split_documents([document])

# Step 4 manually assigning the metadata to each split
save_dir = "docs/youtube/"
downloaded_file = [f for f in os.listdir(save_dir) if f.endswith('.m4a')][0]  # Assuming m4a, adjust if using mp3
downloaded_file_path = os.path.join(save_dir, downloaded_file)
for doc in rdoc2:
    doc.metadata = {"source": "youtube", "file_path": downloaded_file_path}


# Step 5: Print the first few splits
for i, doc in enumerate(rdoc2[30:33]):  # Adjust the number 3 to print more or fewer splits
    print(f"--- Split {i + 1} ---")
    print(doc.page_content)
    print()  # Print an empty line for better readability



--- Split 1 ---
how we actually execute analytics jobs on that data that's sitting in HDFS. So on the master node we have a new function, a new demon called the job tracker, and on the slave nodes we have a new one called the task tracker. Now let's say we have an application job that needs to communicate and analyze some data set that's sitting on the slave nodes down below. So the application job executes a Java command on the API, communicating with the name node, and then it tries to communicate down to

--- Split 2 ---
Java command on the API, communicating with the name node, and then it tries to communicate down to the task trackers below. Now one of the big differences between big data architectures and traditional data processing is that we don't try to bring all the data to one place and analyze it. What we do is we send the processing job down to the data and distribute it. You can think of it like having a lot of minions doing the work for you. One analogy might be if you h

In [28]:
# Checking the metadata

# Viewing metadata of the first few splits from rdoc1 (the pdf text)
print("Metadata for rdoc1:")
for i, doc in enumerate(rdoc1[:3]):  # Adjust the number to view more or fewer splits
    print(f"--- Metadata for Split {i + 1} ---")
    print(doc.metadata)  # Print the metadata
    print()  # Print an empty line for better readability

# Viewing metadata of the first few splits from rdoc2 (the video transcript)
print("Metadata for rdoc2:")
for i, doc in enumerate(rdoc2[:3]):  # Adjust the number to view more or fewer splits
    print(f"--- Metadata for Split {i + 1} ---")
    print(doc.metadata)  # Print the metadata
    print()  # Print an empty line for better readability


Metadata for rdoc1:
--- Metadata for Split 1 ---
{'source': 'docs/War-of-the-Worlds.pdf', 'page': 1}

--- Metadata for Split 2 ---
{'source': 'docs/War-of-the-Worlds.pdf', 'page': 1}

--- Metadata for Split 3 ---
{'source': 'docs/War-of-the-Worlds.pdf', 'page': 1}

Metadata for rdoc2:
--- Metadata for Split 1 ---
{'source': 'youtube', 'file_path': 'docs/youtube/Big Data Architectures.m4a'}

--- Metadata for Split 2 ---
{'source': 'youtube', 'file_path': 'docs/youtube/Big Data Architectures.m4a'}

--- Metadata for Split 3 ---
{'source': 'youtube', 'file_path': 'docs/youtube/Big Data Architectures.m4a'}



Recursive character splitting is a very common technique. But if you use an LLM that severly limits the number of input token (or charges you b y the token), you may want to split based on tokens instead of character sequences. This is how to do it.

In [29]:
from langchain.text_splitter import TokenTextSplitter

In [30]:
# Let's define a very small chunk and no overlap, so you can see what a chunk looks like with this method
token_split = TokenTextSplitter(chunk_size=1, chunk_overlap=0)

In [31]:
print(token_split.split_text(Hamlet))

['T', 'ruly', ' to', ' speak', ',', ' and', ' with', ' no', ' addition', ',', ' We', ' go', ' to', ' gain', ' a', ' little', ' patch', ' of', ' ground', ' That', ' hath', ' in', ' it', ' no', ' profit', ' but', ' the', ' name', '.', ' To', ' pay', ' five', ' d', 'uc', 'ats', ',', ' five', ',', ' I', ' would', ' not', ' farm', ' it', ';', ' Nor', ' will', ' it', ' yield', ' to', ' Norway', ' or', ' the', ' Pole', ' A', ' rank', 'er', ' rate', ',', ' should', ' it', ' be', ' sold', ' in', ' fee', '.']


## Storing in Vector Store
The third step is to store your splits in a vector database. There are dozens of solutions. Very popular solutions for local storage include Mongodb, Chroma, Weaviate and Milvus. All large Cloud vendors (Azure, AWS etc.) offer a Cloud vectordb solution. Here we use Chroma, a locally stored, flexible popular choice. 

Before storing our data into the vectordb, we need to convert the text strings into vectors (embedding). We use a tokenizer compatible with the BERT model to first tokenize the text, then embed (convert to vectors).

In [32]:
# Create Ollama embeddings and vector store
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
all_splits = rdoc1 + rdoc2
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)

What do these vectors look like? Let's play with a few examples.

In [33]:
text1 = "i like hotdogs"
text2 = "i like sandwiches"
text3 = "this is a large building"

In [34]:
embedding1 = embeddings.embed_query(text1)

In [35]:
embedding1 = embeddings.embed_query(text1)
embedding2 = embeddings.embed_query(text2)
embedding3 = embeddings.embed_query(text3)

In [36]:
# looking at the first values of the first embedding
print("embedding1 includes", len(embedding1), "values")
print("First few values:", embedding1[:10])

embedding1 includes 768 values
First few values: [-0.19900505244731903, -0.022423911839723587, -3.72220778465271, -0.7225621342658997, 0.05477889999747276, 0.9443159699440002, -1.1486680507659912, 0.5535013675689697, -0.9903378486633301, -0.840915322303772]


How closes are these vectors from one another? There are many ways to compare them, here we use the dot product and the cosine similarity methods, which are common techniques for such comparison.

In [37]:
#Dot product method, comparing text1 to text2 vectors
# Step 1 : creating the normalized vectors (so the product is between 0 and 1)
import numpy as np
norm_a = np.linalg.norm(embedding1)
norm_b = np.linalg.norm(embedding2)
norm_c = np.linalg.norm(embedding3)
normalized_a = embedding1 / norm_a
normalized_b = embedding2 / norm_b
normalized_c = embedding3 / norm_c

#Step 2: comparing text1 and text 2 embeddings, then text1 and text 3 embeddings:
sim_1_2 = np.dot(normalized_a, normalized_b)
sim_1_3 = np.dot(normalized_a, normalized_c)

print("Similarity (with dot product)between sentence 1 and 2:", sim_1_2)
print("Similarity (with dot product) between sentence 1 and 3:", sim_1_3)

Similarity (with dot product)between sentence 1 and 2: 0.7179325440433726
Similarity (with dot product) between sentence 1 and 3: 0.3768511618446678


In [38]:
from numpy import dot
from numpy.linalg import norm

def cosine_similarity(a, b):
    return dot(a, b) / (norm(a) * norm(b))

similarity_1_2 = cosine_similarity(embedding1, embedding2)
similarity_1_3 = cosine_similarity(embedding1, embedding3)

print("Similarity (with cos similarity) between sentence 1 and 2:", similarity_1_2)
print("Similarity (with cos similarity) between sentence 1 and 3:", similarity_1_3)

Similarity (with cos similarity) between sentence 1 and 2: 0.7179325440433726
Similarity (with cos similarity) between sentence 1 and 3: 0.37685116184466794


Now that we have embeddings, let's store them into a Chroma database.

In [39]:
#!pip install --upgrade langchain chromadb
from langchain.vectorstores import Chroma

# Set the environment variable to disable tokenizers parallelism and avoid warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Let's define a directory where we'll store the database beyond this notebook execution (and let's make sure it is emtpy, as I run this notebook often :))
persist_directory = 'docs/chroma/'
!rm -rf ./docs/chroma  # remove old database files if any

In [40]:
vectordb = Chroma.from_documents(
    documents=all_splits,
    embedding=embeddings,
    persist_directory=persist_directory
)

Now let's see if we can perform some similarity search with this database. keep in mind that we are just comparing vectors here, there is no LLM yet to smartly correlate deeper.

In [41]:
question = "Did the spaceship come from the planet Mars?"

In [42]:
docs = vectordb.similarity_search(question,k=5)

In [43]:
len(docs)

5

In [44]:
docs[0].page_content

'for the inhabitants of Mars. The immediate pr essure of necessity has brightened their \nintellects, enlarged their pow ers, and hardened their h earts. And looking across space \nwith instruments, and intelligences such as we have scarcely dreamed of, they see, at its \nnearest distance only 35,000,000 of miles sunward of them, a morning star of hope, our \nown warmer planet, green with vegetation and grey with water, w ith a cloudy atmosphere'

In [45]:
# Let's save the vectordb so we can use it outside of this notebook - note, this is FYI as it is automatically done with Chroma, but not with all other vectordbs!
vectordb.persist()

  vectordb.persist()


## More on Similarity search

The goal of the retrieval phase is to select the most relevant documents. But 'relevant' may mean 'repeating the same most relevant segment', which is suboptimal. 

In [46]:
# deleting leftovers from previous instances, as I run this codebook often
#tempdb.delete_collection()

In [47]:
text3 = [
    """The alien spaceships looked like flying saucers.""",
    """The alien spaceships were round in shape.""",
    """The spaceships were destroying everything.""",
]

In [48]:
tempdb = Chroma.from_texts(text3, embedding=embeddings)

In [49]:
question = "What can you tell me about the alien spaceships?"

In [50]:
tempdb.similarity_search(question, k=2)

[Document(metadata={}, page_content='The alien spaceships were round in shape.'),
 Document(metadata={}, page_content='The alien spaceships looked like flying saucers.')]

Similarity search points to the documents that are closest semantically to the question, which may include a lot of redundant information, and miss some key points, for example that the alien spaceships were destroying everything. Max Marginal Relevance (MMR) search improves Similarity Search by picking the top k as Similarity Search does, but returning the vectors that are farthest from each other (in this top k list), so as to maximize the diversity of information returned.

In [51]:
tempdb.max_marginal_relevance_search(question,k=2, fetch_k=3)

[Document(metadata={}, page_content='The alien spaceships were round in shape.'),
 Document(metadata={}, page_content='The spaceships were destroying everything.')]

You may also want to filter the source where the information should be retrived from. You can achive such filtering manually, or automatically

In [52]:
# Manual filtering method
docs = vectordb.similarity_search(
    question,
    k=3,
    filter={"source":"docs/War-of-the-Worlds.pdf"}
)

In [53]:
docs[0].page_content

'point of view of an observer on Venus. Subsequently a peculiar luminous and sinuous \nmark- ing appeared on the unillumined half of the inner planet, and almost simultaneously a faint dark mark of a si milar sinuous character was detected upon a \nphotograph of the Martian disk. One needs to see the drawings of these ap- pearances in \norder to appreciate fully their remark able resemblance in character.  \nAt any rate, whether we expect another inva sion or not, our views of the human future'

## Retrieving with the LLM in action
The full process consists of asking a question, retrieving the relevant information, then passing the information and the question to the LLM.

In [54]:
#We still need these bricks, so do not run this part of the notebook in isolation
persist_directory = 'docs/chroma/'
embedding = embeddings
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

  vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)


In [55]:
print(vectordb._collection.count())

1038


In [56]:
question = "Did the spaceship come from the planet Mars?"
docs = vectordb.similarity_search(question,k=3)
len(docs)

3

In [57]:
#!pip install ollama
#!ollama serve & ollama pull llama3 & ollama pull nomic-embed-text

In [58]:
#Using Llama3 as the LLM, and Ollama as the wrapper to interact with Llama3
from langchain_community.llms import Ollama
llm = Ollama(model = "llama3")
llm.invoke("Are there aliens on Mars?")

"There is currently no conclusive evidence of the existence of aliens on Mars. However, there are ongoing efforts to search for signs of life, past or present, on the planet.\n\nNASA's Curiosity rover has been exploring Mars since 2012 and has found evidence of ancient lakes, rivers, and even an ocean on Mars in the distant past. This suggests that conditions may have been suitable for life to exist on Mars billions of years ago.\n\nIn 2020, NASA's Perseverance rover discovered methane in the Martian atmosphere, which could be a sign of microbial life. However, it's also possible that the methane is geological in origin, meaning it's not related to living organisms.\n\nThe European Space Agency's Schiaparelli lander and NASA's InSight lander have also been searching for signs of life on Mars. The InSight mission has provided valuable insights into the Martian interior and surface, but so far, there is no conclusive evidence of alien life.\n\nFuture missions, such as the NASA Mars 2020 

In [59]:
#!pip install ollama langchain beautifulsoup4 chromadb gradio -q

In [60]:
# This is "almost" the final code.
import gradio as gr
import ollama
from bs4 import BeautifulSoup as bs
from langchain_community.embeddings import OllamaEmbeddings

# Create Ollama embeddings and vector store
#embeddings = OllamaEmbeddings(model="nomic-embed-text")
#vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)

# Define the function to call the Ollama Llama3 model
def ollama_llm(question, context):
    formatted_prompt = f"Question: {question}\n\nContext: {context}"
    response = ollama.chat(model='llama3', messages=[{'role': 'user', 'content': formatted_prompt}])
    return response['message']['content']

# Define the RAG setup
retriever = vectordb.as_retriever()

def rag_chain(question):
    retrieved_docs = retriever.invoke(question)
    formatted_context = "\n\n".join(doc.page_content for doc in retrieved_docs)
    return ollama_llm(question, formatted_context)

# Define the Gradio interface
def get_important_facts(question):
    return rag_chain(question)

# Create a Gradio app interface
iface = gr.Interface(
  fn=get_important_facts,
  inputs=gr.Textbox(lines=2, placeholder="Enter your question here..."),
  outputs="text",
  title="RAG with Llama3",
  description="Ask questions about the proveded context",
)

# Launch the Gradio app
iface.launch()
# example q: did the aliens eventually go on to land on Venus?

Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.




In [1]:
# Now, Ollama re-injects the previouys question and asnwer into the model with the next question. But other LLMs would forget the previous question (remember Dolly?) You can add memory with a memory module.
import gradio as gr
import ollama
from bs4 import BeautifulSoup as bs
from langchain_community.embeddings import OllamaEmbeddings
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_core.runnables import Runnable

# Define the prompt template
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum. Keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Define memory to store the previous exchanges
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

# Custom Runnable LLM class with debugging
class OllamaLLM(Runnable):
    def __init__(self, llm_fn):
        self.llm_fn = llm_fn

    def invoke(self, input, config=None, **kwargs):
        # If the input is a StringPromptValue or similar object, treat it as a string
        question = str(input)  # Convert the input to a string if necessary
        context = kwargs.get("context", "")  # Retrieve context from kwargs if available

        # Print what was passed to the LLM
        print(f"Question passed to LLM: {question}")
        print(f"Context passed to LLM: {context}")

        # Handle additional kwargs such as stop, if needed
        stop = kwargs.get("stop", None)

        # If 'stop' or other arguments need to be passed to the LLM function, handle them here
        response = self.llm_fn(question, context)
        
        # Print the response from the LLM
        print(f"Response from LLM: {response}")

        return response

    def predict(self, input, **kwargs):
        return self.invoke(input, **kwargs)

    def __call__(self, *args, **kwargs):
        return self.invoke(*args, **kwargs)

# Instantiate the custom LLM class
ollama_llm_instance = OllamaLLM(ollama_llm)

# Define the conversational retrieval chain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=ollama_llm_instance,
    retriever=retriever,
    memory=memory # add the memory module, to pass the previous exchange to the LLM as well
)

# Define the function to get important facts with debugging
def get_important_facts(question):
    # Print what is passed to the retriever
    print(f"Question passed to retriever: {question}")
    
    # Run the chain and capture the memory state
    response = qa_chain.run({"question": question})
    
    # Print what is in memory after the retrieval
    print(f"Memory state: {memory.buffer}")

    return response

# Create a Gradio app interface
iface = gr.Interface(
    fn=get_important_facts,
    inputs=gr.Textbox(lines=2, placeholder="Enter your question here..."),
    outputs="text",
    title="RAG with Llama3",
    description="Ask questions about the provided context",
)

# Launch the Gradio app
iface.launch()





NameError: name 'ollama_llm' is not defined

In [62]:
# test

In [64]:
#get_current_temperature.name

In [65]:
#get_current_temperature.description

In [66]:
#get_current_temperature.args

In [67]:
#get_current_temperature({"latitude": 13, "longitude": 14})

In [68]:
from langchain.agents import tool
import requests
from pydantic import BaseModel, Field

# Define the input schema
class CityInput(BaseModel):
    city: str = Field(..., description="City name to fetch weather data for")

# Tool to get the current weather
@tool(args_schema=CityInput)
def get_current_weather(city: str) -> dict:
    """Fetch current weather for a given city."""
    
    API_KEY = '56e413d9ee2c3d177f933b2b57ff835e'  # Replace with your OpenWeatherMap API key
    BASE_URL = "http://api.openweathermap.org/data/2.5/weather"
    
    # Parameters for the weather request
    params = {
        'q': city,
        'appid': API_KEY,
        'units': 'metric',  # To get the temperature in Celsius
    }

    # Make the request
    response = requests.get(BASE_URL, params=params)
    
    if response.status_code == 200:
        results = response.json()
        temperature = results['main']['temp']
        weather_description = results['weather'][0]['description']
        humidity = results['main']['humidity']
        wind_speed = results['wind']['speed']
        pressure = results['main']['pressure']
    else:
        raise Exception(f"Weather API Request failed with status code: {response.status_code}")
    
    return (
        f"The current temperature in {city} is {temperature}°C with {weather_description}.\n"
        f"Humidity: {humidity}%, Wind Speed: {wind_speed} m/s, Pressure: {pressure} hPa."
    )

# Example usage
# weather = get_current_weather("San Francisco")
# print(weather)


In [69]:
get_current_weather("Richmond")

  get_current_weather("Richmond")


'The current temperature in Richmond is 26.89°C with scattered clouds.\nHumidity: 57%, Wind Speed: 5.14 m/s, Pressure: 1022 hPa.'

In [70]:
#Integrating the weather in the full code

In [73]:
import gradio as gr
import ollama
from bs4 import BeautifulSoup as bs
from langchain_community.embeddings import OllamaEmbeddings
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_core.runnables import Runnable
import requests
from pydantic import BaseModel, Field

# Weather tool function
def get_current_weather(city: str) -> str:
    """Fetch current weather for a given city."""
    
    API_KEY = '56e413d9ee2c3d177f933b2b57ff835e'  # Replace with your OpenWeatherMap API key
    BASE_URL = "http://api.openweathermap.org/data/2.5/weather"
    
    params = {
        'q': city,
        'appid': API_KEY,
        'units': 'metric',  # To get the temperature in Celsius
    }

    response = requests.get(BASE_URL, params=params)
    
    if response.status_code == 200:
        results = response.json()
        temperature = results['main']['temp']
        weather_description = results['weather'][0]['description']
        humidity = results['main']['humidity']
        wind_speed = results['wind']['speed']
        pressure = results['main']['pressure']
    else:
        raise Exception(f"Weather API Request failed with status code: {response.status_code}")
    
    return (
        f"The current temperature in {city} is {temperature}°C with {weather_description}.\n"
        f"Humidity: {humidity}%, Wind Speed: {wind_speed} m/s, Pressure: {pressure} hPa."
    )

# Define the prompt template
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum. Keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Define memory to store the previous exchanges
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

# Custom Runnable LLM class with debugging
class OllamaLLM(Runnable):
    def __init__(self, llm_fn):
        self.llm_fn = llm_fn

    def invoke(self, input, config=None, **kwargs):
        question = str(input)
        context = kwargs.get("context", "")
        print(f"Question passed to LLM: {question}")
        print(f"Context passed to LLM: {context}")
        stop = kwargs.get("stop", None)
        response = self.llm_fn(question, context)
        print(f"Response from LLM: {response}")
        return response

    def predict(self, input, **kwargs):
        return self.invoke(input, **kwargs)

    def __call__(self, *args, **kwargs):
        return self.invoke(*args, **kwargs)

# Instantiate the custom LLM class
ollama_llm_instance = OllamaLLM(ollama_llm)

# Define the conversational retrieval chain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=ollama_llm_instance,
    retriever=retriever,
    memory=memory  # add the memory module to pass the previous exchange to the LLM as well
)

# Function to detect if the question is about the weather in a city
def is_weather_question(question: str) -> bool:
    return "weather" in question.lower() and "in" in question.lower()

# Extract city name from the weather question
def extract_city_from_question(question: str) -> str:
    # Simple heuristic to extract city name
    if "weather in" in question.lower():
        return question.lower().split("weather in")[1].strip().split()[0].capitalize()
    return ""

# Define the function to get important facts with debugging
def get_important_facts(question):
    # Check if the question is about the weather in a city
    if is_weather_question(question):
        city = extract_city_from_question(question)
        if city:
            return get_current_weather(city)
        else:
            return "I couldn't determine the city you're asking about. Please specify the city."
    
    # Otherwise, use the LLM-based chain
    print(f"Question passed to retriever: {question}")
    response = qa_chain.run({"question": question})
    print(f"Memory state: {memory.buffer}")
    return response

# Create a Gradio app interface
iface = gr.Interface(
    fn=get_important_facts,
    inputs=gr.Textbox(lines=2, placeholder="Enter your question here..."),
    outputs="text",
    title="RAG with Llama3",
    description="Ask questions about the provided context",
)

# Launch the Gradio app
iface.launch()


Running on local URL:  http://127.0.0.1:7864

To create a public link, set `share=True` in `launch()`.




Question passed to retriever: Did the aliens eventually land on Venus?
Question passed to LLM: text="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\nLessing has advanced excellent reasons fo r supposing that the Martians have actually \nsucceeded in effecting a landing on the planet  Venus. Seven months ago now, Venus and \nMars were in alignment with the sun; that  is to say, Mars was in opposition from the \npoint of view of an observer on Venus. Subsequently a peculiar luminous and sinuous\n\npoint of view of an observer on Venus. Subsequently a peculiar luminous and sinuous \nmark- ing appeared on the unillumined half of the inner planet, and almost simultaneously a faint dark mark of a si milar sinuous character was detected upon a \nphotograph of the Martian disk. One needs to see the drawings of these ap- pearances in \norder to appreciate fully their remark able

Traceback (most recent call last):
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/anyio/to_thread.

Question passed to retriever: Does part of the novel take place in England?
Question passed to LLM: text='Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.\n\nChat History:\n\nHuman: Did the aliens eventually land on Venus?\nAssistant: The context is a passage from a text about the possibility of extraterrestrial life landing on Earth or other planets. The passage discusses the idea that Martians may have landed on Venus, and mentions the alignment of Mars and Venus with the sun, as well as observations of markings on both planets\' surfaces.\nHuman: Where does the action take place?\nAssistant: The context is a passage of text that describes events and observations related to the approach of Mars (Planet Mars) and its impact on human life. The narrator provides details about daily routines, military actions, and personal experiences during this time period.\nHuman: What is the main city in 

In [75]:
import gradio as gr
import ollama
from bs4 import BeautifulSoup as bs
from langchain_community.embeddings import OllamaEmbeddings
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_core.runnables import Runnable
import requests
from pydantic import BaseModel, Field

# Weather tool function
def get_current_weather(city: str) -> str:
    """Fetch current weather for a given city."""
    
    API_KEY = 'YOUR_OPENWEATHERMAP_API_KEY'  # Replace with your OpenWeatherMap API key
    BASE_URL = "http://api.openweathermap.org/data/2.5/weather"
    
    params = {
        'q': city,
        'appid': API_KEY,
        'units': 'metric',  # To get the temperature in Celsius
    }

    response = requests.get(BASE_URL, params=params)
    
    if response.status_code == 200:
        results = response.json()
        temperature = results['main']['temp']
        weather_description = results['weather'][0]['description']
        humidity = results['main']['humidity']
        wind_speed = results['wind']['speed']
        pressure = results['main']['pressure']
    else:
        raise Exception(f"Weather API Request failed with status code: {response.status_code}")
    
    return (
        f"The current temperature in {city} is {temperature}°C with {weather_description}.\n"
        f"Humidity: {humidity}%, Wind Speed: {wind_speed} m/s, Pressure: {pressure} hPa.\n"
        f"Thanks for asking!"
    )

# Define the prompt template
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum. Keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer.

Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Define memory to store the previous exchanges
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

# Custom Runnable LLM class with debugging
class OllamaLLM(Runnable):
    def __init__(self, llm_fn):
        self.llm_fn = llm_fn

    def invoke(self, input, config=None, **kwargs):
        question = str(input)
        context = kwargs.get("context", "")
        print(f"Question passed to LLM: {question}")
        print(f"Context passed to LLM: {context}")
        stop = kwargs.get("stop", None)
        response = self.llm_fn(question, context)
        print(f"Response from LLM: {response}")
        return response + " Thanks for asking!"

    def predict(self, input, **kwargs):
        return self.invoke(input, **kwargs)

    def __call__(self, *args, **kwargs):
        return self.invoke(*args, **kwargs)

# Instantiate the custom LLM class
ollama_llm_instance = OllamaLLM(ollama_llm)

# Define the conversational retrieval chain
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=ollama_llm_instance,
    retriever=retriever,
    memory=memory  # add the memory module to pass the previous exchange to the LLM as well
)

# Function to detect if the question is about the weather in a city
def is_weather_question(question: str) -> bool:
    return "weather" in question.lower() and "in" in question.lower()

# Extract city name from the weather question
def extract_city_from_question(question: str) -> str:
    # Simple heuristic to extract city name
    if "weather in" in question.lower():
        return question.lower().split("weather in")[1].strip().split()[0].capitalize()
    return ""

# Define the function to get important facts with debugging
def get_important_facts(question):
    # Check if the question is about the weather in a city
    if is_weather_question(question):
        city = extract_city_from_question(question)
        if city:
            return get_current_weather(city)
        else:
            return "I couldn't determine the city you're asking about. Please specify the city."
    
    # Otherwise, use the LLM-based chain to answer the question directly
    print(f"Question passed to retriever: {question}")
    response = qa_chain.run({"question": question})
    print(f"Memory state: {memory.buffer}")
    return response

# Create a Gradio app interface
iface = gr.Interface(
    fn=get_important_facts,
    inputs=gr.Textbox(lines=2, placeholder="Enter your question here..."),
    outputs="text",
    title="RAG with Llama3",
    description="Ask questions about the provided context",
)

# Launch the Gradio app
iface.launch()


Running on local URL:  http://127.0.0.1:7865

To create a public link, set `share=True` in `launch()`.




Question passed to retriever: did the aliens eventually make it to Venus?
Question passed to LLM: text="Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\npoint of view of an observer on Venus. Subsequently a peculiar luminous and sinuous \nmark- ing appeared on the unillumined half of the inner planet, and almost simultaneously a faint dark mark of a si milar sinuous character was detected upon a \nphotograph of the Martian disk. One needs to see the drawings of these ap- pearances in \norder to appreciate fully their remark able resemblance in character.  \nAt any rate, whether we expect another inva sion or not, our views of the human future\n\nLessing has advanced excellent reasons fo r supposing that the Martians have actually \nsucceeded in effecting a landing on the planet  Venus. Seven months ago now, Venus and \nMars were in alignment with the sun; that  is to say

Traceback (most recent call last):
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/gradio/blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/Users/jerhenry/Documents/Workdoc/f-Virtual_Machines/Environments/Pytorch/lib/python3.9/site-packages/anyio/to_thread.