## Ingesting PDF

In [None]:
#%pip install --q unstructured langchain
#%pip install --q "unstructured[all-docs]"

In [1]:
from langchain_community.document_loaders import PyPDFLoader
#%pip install PyPDF
#%pip install pdfminer.six
#%pip install pillow_heif
#%pip install opencv-python
#%pip install pdf2image
#%pip install unstructured_inference
#%pip install install pytesseract

In [2]:
local_path = "WEF_The_Global_Cooperation_Barometer_2024.pdf"

# Local PDF file uploads
if local_path:
  loader = PyPDFLoader(local_path)
  pages = loader.load_and_split()
else:
  print("Upload a PDF file")

In [3]:
# Preview first page
pages[0].page_content

'The Global Cooperation \nBarometer 2024\nINSIGHT REPORT\nJANUARY 2024In collaboration with \nMcKinsey & Company'

## Vector Embeddings

In [4]:
!ollama pull nomic-embed-text

[?25lpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ¹ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 274 MB                         
pulling c71d239df917... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�  11 KB                         
pulling ce4a164fc046... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�   17 B                         
pulling 31df23ea7daa... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�  420 B                         
verifying sha256 digest â ‹ [?25h[?25l[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1G[A[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–� 274 MB                         
pulling c71d239df917... 100% â–•â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–�  11 KB                     

In [5]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED      
nomic-embed-text:latest	0a109f422b47	274 MB	1 second ago 	
mistral:instruct       	2ae6f6dd7a3d	4.1 GB	7 minutes ago	


In [6]:
#%pip install --q chromadb
#%pip install --q langchain-text-splitters

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [7]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [8]:
# Split and chunk 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(pages)

In [9]:
# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|████████████████████████████████████████████████████████████████| 30/30 [01:11<00:00,  2.37s/it]


## Retrieval

In [10]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain.llms import Ollama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [36]:
#!ollama pull mistral:instruct

[?25lpulling manifest â ™ [?25h[?25l[2K[1Gpulling manifest â ¹ [?25h[?25l[2K[1Gpulling manifest â ¸ [?25h[?25l[2K[1Gpulling manifest â ¼ [?25h[?25l[2K[1Gpulling manifest â ´ [?25h[?25l[2K[1Gpulling manifest â ´ [?25h[?25l[2K[1Gpulling manifest â § [?25h[?25l[2K[1Gpulling manifest â ‡ [?25h[?25l[2K[1Gpulling manifest â ‡ [?25h[?25l[2K[1Gpulling manifest â ‹ [?25h[?25l[2K[1Gpulling manifest 
pulling ff82381e2bea...   0% â–•                â–�    0 B/4.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling ff82381e2bea...   0% â–•                â–�    0 B/4.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling ff82381e2bea...   0% â–•                â–�    0 B/4.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling ff82381e2bea...   0% â–•                â–�    0 B/4.1 GB                  [?25h[?25l[2K[1G[A[2K[1Gpulling manifest 
pulling ff82381e2bea...   0% â–•      

In [11]:
# LLM from Ollama
local_model = "mistral:instruct"
llm = ChatOllama(model= local_model)

In [12]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [13]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [14]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [15]:
chain.invoke(input(""))

what is this document about ?


OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.71s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.06s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.05s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.09s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.09s/it]


' The document appears to be a report titled "The Global Cooperation Barometer 2024" published by the World Economic Forum (WEF). It discusses the importance of global cooperation in light of changing trade dynamics, increasing geopolitical competition, and disparities in global integration. The report offers suggestions for both public and private sector leaders on how they can contribute to fostering global cooperation. These suggestions include practicing "coopetition," using cooperation to build trust, strengthening management capabilities, evaluating board expertise and engagement, building dynamic strategic options, and considering diversification instead of decoupling. The aim is to promote growth, diversity, resilience, improve domestic economies, and ensure that vulnerable people are not left behind.'

In [16]:
chain.invoke("What are the 5 pillars of global cooperation?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.56s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.07s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.07s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.06s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.10s/it]


' The five pillars of global cooperation as mentioned in the document are trade and capital, innovation and technology, climate and natural capital, health and wellness, and peace and security.'

In [17]:
# Delete all collections in the db
vector_db.delete_collection()

# MONOPOLY

In [18]:
local_path = "monopoly.pdf"

# Local PDF file uploads
if local_path:
  loader = PyPDFLoader(local_path)
  pages = loader.load_and_split()
else:
  print("Upload a PDF file")

In [19]:
# Preview first page
pages[0].page_content

'MONOPOLY \nProperty Trading Game from Parker Brothers" \nAGES 8+ \n2 to 8 Players \nContents: Gameboard, 3 dice, tokens, 32 houses, I2 hotels, Chance \nand Community Chest cards, Title Deed cards, play money and a Banker\'s tray. \nNow there\'s a faster way to play MONOPOLY. Choose to play by \nthe classic rules for buying, renting and selling properties or use the \nSpeed Die to get into the action faster. If you\'ve never played the classic \nMONOPOLY game, refer to the Classic Rules beginning on the next page. \nIf you already know how to play and want to use the Speed Die, just \nread the section below for the additional Speed Die rules. \nSPEED DIE RULES \nLearnins how to Play with the S~eed Die IS as \n/ \nfast as playing with i\'t. \n1. When starting the game, hand out an extra $1,000 to each player \n(two $5005 should work). The game moves fast and you\'ll need \nthe extra cash to buy and build. \n2. Do not use the Speed Die until you\'ve landed on or passed over \nGO for the 

In [20]:
# Split and chunk 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(pages)

vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 8/8 [00:21<00:00,  2.63s/it]


In [22]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [23]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [24]:
chain.invoke("How many players can play this game ?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.79s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.06s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.06s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.09s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.12s/it]


' The Monopoly game can be played by 2 to 8 players.'

In [26]:
chain.invoke("How do i build a hotel in MONOPOLY ?")

OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:05<00:00,  5.94s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.07s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.05s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.10s/it]
OllamaEmbeddings: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.12s/it]


" In Monopoly, you can build hotels on properties if you meet certain conditions. Here are the steps to build a hotel:\n\n1. First, you need to own all of the properties of a color group (e.g., all Atlantic Avenue properties or all St. Charles Place properties).\n\n2. Next, you must have at least one house on each property of that color group. The houses should be fully occupied, meaning no player can stay there for free because they landed on a space with a hotel but don't have enough money to pay the rent.\n\n3. Once these conditions are met, you can sell the houses on that color group back to the bank at half their purchase price.\n\n4. Now, you have the opportunity to buy a hotel from the Bank. Each hotel costs double the price of the last house you bought on that same color group. If there were no houses on the color group, the hotel costs $200.\n\n5. After buying the hotel(s), place them on your properties in any order you choose. Hotels provide higher rent than houses, so other 

In [27]:
# Delete all collections in the db
vector_db.delete_collection()