# This is a basic RAG using open source LLMs via Ollama and LangChain

Requirements:
langchain
langchain_community
langchain[docarray]
docarray
python-dotenv

## Load Model

In [30]:
import os
from dotenv import load_dotenv

load_dotenv()

MODEL = "mistral" #Using mistral model but you can muse any model that you load via Ollama


## Test it works

In [47]:
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

model.invoke("Who are you?")

' I am a large language model trained by Mistral AI.'

## Include parser to format the response correctly

In [50]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 
chain.invoke("What is the color of the sky?")

" The color of the sky can appear to be different at various times and in different locations due to factors such as sunlight, pollution, and weather conditions. However, under clear conditions on a sunny day, people typically describe the sky as blue. This is because during the day, the sky appears blue primarily due to a process called Rayleigh scattering, where shorter wavelengths of light (blue and violet) are scattered more than other colors due to the size of the molecules in Earth's atmosphere. The human eye is more sensitive to blue light, and we perceive the sky as blue rather than violet because our vision is slightly color-deficient to violet."

## Create the Prompt to indicate to go to retrived information first

In [33]:
from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = (PromptTemplate.from_template(template))
prompt.format(context="Here is some context", question="Here is a question")

'\nAnswer the question based on the context below. If you can\'t \nanswer the question, reply "I don\'t know".\n\nContext: Here is some context\n\nQuestion: Here is a question\n'

## Create the chain so promt goes into model and model repsonse goes into parser

In [35]:
chain = prompt | model | parser 

chain.invoke({"context": "Jose is 23 years old", "question": "How old is James?"})

" I don't know. The text provided does not give any information about James' age."

## Option A: Load a pdf document

In [49]:
# from langchain_community.document_loaders import PyPDFLoader

# loader = PyPDFLoader("text.pdf") # Include the pdf location in your computer
# pages = loader.load_and_split()
# pages

[Document(metadata={'source': '/Users/josemarimon/Desktop/RAG/RAG 2/Scripts/constitution.pdf', 'page': 0}, page_content='NATIONAL  CONSTITUTION  CENTER   \n   \n \n \n \n \n  \n \nTHE  \nCONSTITUTION  \nof the United  States'),
 Document(metadata={'source': '/Users/josemarimon/Desktop/RAG/RAG 2/Scripts/constitution.pdf', 'page': 1}, page_content='C O N S T I T U T I O N O F T H E U N I T E D S T A T E S   \n \n \n \nWe the People of the United States, in Order to form a \nmore perfect Union, establish Justice, insure domestic \nTranquility, provide for the common defence, promote \nthe general  Welfare, and secure the Blessings of Liberty to \nourselves  and our Posterity,  do ordain  and establish  this \nConstitution for the United States of America  \n \n \nArticle.   I. \nSECTION.  1 \nAll legislative Powers herein granted shall be vested in a \nCongress of the United States, which shall consist of a Sen-  \nate and House of Representatives. \nSECTI ON. 2 \nThe House of Representat

## Option B: Scrape a Website

In [36]:
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)

loader = WebBaseLoader("https://www.imdb.com/title/tt15239678/reviews")  # Include the website url you want to scrape
pages = loader.load_and_split(text_splitter)
pages

[Document(metadata={'source': 'https://www.imdb.com/title/tt15239678/reviews', 'title': 'Dune: Part Two (2024) - Dune: Part Two (2024) - User Reviews - IMDb', 'description': 'Dune: Part Two (2024) on IMDb: Movies, TV, Celebs, and more...', 'language': 'No language found.'}, page_content='Dune: Part Two (2024) - Dune: Part Two (2024) - User Reviews - IMDb'),
 Document(metadata={'source': 'https://www.imdb.com/title/tt15239678/reviews', 'title': 'Dune: Part Two (2024) - Dune: Part Two (2024) - User Reviews - IMDb', 'description': 'Dune: Part Two (2024) on IMDb: Movies, TV, Celebs, and more...', 'language': 'No language found.'}, page_content="MenuMoviesRelease CalendarTop 250 MoviesMost Popular MoviesBrowse Movies by GenreTop Box OfficeShowtimes & TicketsMovie NewsIndia Movie SpotlightTV ShowsWhat's on TV & StreamingTop 250 TV ShowsMost Popular TV ShowsBrowse TV Shows by GenreTV NewsWatchWhat to WatchLatest TrailersIMDb OriginalsIMDb PicksIMDb SpotlightIMDb PodcastsAwards & EventsOscarsE

## Create a vector with the information retireved fomr the documents

In [42]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(pages, embedding=embeddings)

## Turn the vector into a retriever that is used by LangChain

In [43]:
retriever = vectorstore.as_retriever()
retriever.invoke("Text")

[Document(metadata={'source': '/Users/josemarimon/Desktop/RAG/RAG 2/Scripts/constitution.pdf', 'page': 0}, page_content='NATIONAL  CONSTITUTION  CENTER   \n   \n \n \n \n \n  \n \nTHE  \nCONSTITUTION  \nof the United  States'),
 Document(metadata={'source': '/Users/josemarimon/Desktop/RAG/RAG 2/Scripts/constitution.pdf', 'page': 10}, page_content='C O N S T I T U T I O N O F T H E U N I T E D S T A T E S  DELAWARE  \nGeo:  Read \nGunning Bedford  jun \nJohn Dickinson \nRichard Bassett  \nJaco: Broom  \nMARYLAND  \nJames  McHenry  \nDan of St. Thos.  Jenifer \nDanl Carroll  \nVIRGINIA  \nJohn Blair - \nJames  Madison  Jr. \nNORTH  CAROLINA  \nWm.  Blount  \nRichd.  Dobbs  Spaight \nHu Williamson  \nSOUTH  CAROLINA  \nJ. Rutledge  \nCharles  Cotesworth  Pinckney \nCharles Pinckney  \nPierce Butler  \nGEORGIA  \nWilliam  Few \nAbr Baldwin  \n \nAttest  William  Jackson Secretary  In Convention Monday \nSeptember 17th, 1787. \nPresent  \nThe States  of \nNew  Hampshire,  Massachusetts,  Co

## Create the chain with the retriever

In [44]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

## Test some Questions

In [45]:
questions = [
    "What is the summary of Dune 2 reviews?",
    "Did Ellimof like the movie?",
    "What is the best time to travel to Japan?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print()

Question: What is the summary of Dune 2 reviews?
Answer:  I'm sorry for any confusion, but it seems there might be a mistake in your question as you've provided a summary of U.S. Constitution Amendments instead of a summary of Dune 2 reviews. The provided text details the 16th to 19th amendments of the United States Constitution, their passage dates, ratification dates, and some notes about modifications made to various sections of the Constitution.

If you'd like more information about these amendments, feel free to ask! If you were looking for a summary of Dune 2 reviews, I would need access to those reviews first to be able to summarize them.

Question: Did Ellimof like the movie?
Answer:  The text provided does not contain any information about Ellimof or a movie he might have watched, so it is impossible to determine whether he liked the movie or not based on this data.

Question: What is the best time to travel to Japan?
Answer:  The provided text does not contain information abo

In [46]:
questions = [
    "I have throat pain, what ilness I may have?",
    "What is the best plan for 5 days in Kyoto?",
    "When was Amendment XXVI done?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print()

Question: I have throat pain, what ilness I may have?
Answer:  Based on the information you've provided, I don't have enough context to determine the cause of your throat pain. It could be due to a variety of reasons such as viral infections (common cold or flu), bacterial infections like strep throat, allergies, acid reflux, or even more serious conditions. Please consult a healthcare professional for an accurate diagnosis and treatment options.

Question: What is the best plan for 5 days in Kyoto?
Answer:  To create a 5-day itinerary for Kyoto, I'll suggest visiting key historical sites, temples, gardens, and traditional neighborhoods to immerse yourself in Japan's rich culture. Here's a sample itinerary:

Day 1:
- Fushimi Inari Shrine (Open 24 hours) - Explore the iconic Senbon Torii Gates.
- Kiyomizu-dera Temple (8:30 AM - 6 PM) - Enjoy the stunning view of Kyoto from this historic temple.

Day 2:
- Arashiyama Bamboo Grove (Open 24 hours) - Stroll through the enchanting bamboo fore