## Ingesting PDF

In [1]:
%pip install --q unstructured langchain
%pip install --q "unstructured[all-docs]"

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [5]:
local_path = "repair_manual.pdf"

# Local PDF file uploads
if local_path:
  loader = UnstructuredPDFLoader(file_path=local_path)
  data = loader.load()
else:
  print("Upload a PDF file")

In [6]:
# Preview first page
data[0].page_content

"MRC Factory Equipment Repair Manual\n\nEquipment 1: Laptop Problem 1: Slow Performance\n\nSteps to fix this problem:\n\n1. Close Unnecessary Programs: Open Task Manager (Ctrl + Shift + Esc) and close\n\nprograms that are consuming high resources.\n\n2. Disk Cleanup: Run Disk Cleanup (search for it in the Start menu) to remove temporary\n\nfiles and system cache.\n\n3. Update System and Drivers: Go to Settings > Update & Security > Windows Update to\n\ncheck for updates, and update all drivers through Device Manager.\n\nProblem 2: Overheating\n\nSteps to fix this problem:\n\n1. Clean the Vents: Use compressed air to clean the dust from the laptop's vents and fans. 2. Check for Background Processes: Close unnecessary background processes using Task Manager.\n\n3. Use a Cooling Pad: Place the laptop on a cooling pad to help dissipate heat.\n\nProblem 3: Battery Draining Quickly\n\nSteps to fix this problem:\n\n1. Reduce Screen Brightness: Lower the screen brightness through the settings 

## Vector Embeddings

In [7]:
!ollama pull nomic-embed-text

[?25lpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest ⠧ [?25h[?25l[2K[1Gpulling manifest ⠇ [?25h[?25l[2K[1Gpulling manifest ⠏ [?25h[?25l[2K[1Gpulling manifest ⠋ [?25h[?25l[2K[1Gpulling manifest ⠙ [?25h[?25l[2K[1Gpulling manifest ⠹ [?25h[?25l[2K[1Gpulling manifest ⠸ [?25h[?25l[2K[1Gpulling manifest ⠼ [?25h[?25l[2K[1Gpulling manifest ⠴ [?25h[?25l[2K[1Gpulling manifest ⠦ [?25h[?25l[2K[1Gpulling manifest 
pulling 970aa74c0a90... 100% ▕████████████████▏ 274 MB                         
pulling c71d239df917... 100% ▕████████████████▏  11 KB                         
pulling ce4a164fc046... 100% ▕████████████████▏   17 B                         
pulling 31df23ea7daa... 100% ▕████████████████▏  420 B                   

In [8]:
!ollama list

NAME                   	ID          	SIZE  	MODIFIED      
mistral:latest         	2ae6f6dd7a3d	4.1 GB	36 hours ago 	
nomic-embed-text:latest	0a109f422b47	274 MB	3 seconds ago	


In [9]:
%pip install --q chromadb
%pip install --q langchain-text-splitters

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [10]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [11]:
# Split and chunk 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [12]:
# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|██████████| 1/1 [00:03<00:00,  3.66s/it]


## Retrieval

In [13]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [14]:
# LLM from Ollama
local_model = "mistral"
llm = ChatOllama(model=local_model)

In [15]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [16]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [17]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [26]:
chain.invoke(input(""))

OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  2.36it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 13.56it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.18it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  9.79it/s]
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  6.38it/s]


' This document appears to be a report titled "Navigating Global Divergences" by the McKinsey Global Institute (MGI), a business and economics research arm of the consulting firm McKinsey & Company. The report discusses challenges faced by the global economy, including economic fragmentation, rising costs associated with it, and potential solutions to these issues. It highlights the role of generative artificial intelligence (AI) in driving future productivity growth as well as the need for sustainable, inclusive growth. The document also mentions various other topics such as climate change, violent conflicts, trade, geopolitics, and the impact of global value chains on economies. The report does not provide specific recommendations or solutions to address these challenges but rather presents data, insights, and perspectives on the current state of the global economy.'

In [18]:
chain.invoke("My laptop is having slow performance how to fix?")

OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  1.29it/s]
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00, 10.28it/s]
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  4.63it/s]
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  8.11it/s]
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1
OllamaEmbeddings: 100%|██████████| 1/1 [00:00<00:00,  5.61it/s]
Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


' To fix slow performance on your laptop, follow these steps based on the provided repair manual:\n\n1. Close Unnecessary Programs: Open Task Manager (Ctrl + Shift + Esc) and close programs that are consuming high resources.\n2. Disk Cleanup: Run Disk Cleanup (search for it in the Start menu) to remove temporary files and free up space on your system.\n3. Update Software: Make sure all software is up-to-date, including your operating system and any installed applications. This can help improve performance and stability.\n4. Consider Additional Solutions: If the issue persists after trying these steps, you may need to consider additional solutions such as adding more RAM or using a solid-state drive (SSD) instead of a traditional hard drive (HDD). In some cases, it might be necessary to reinstall your operating system or even consider upgrading your laptop hardware.'

In [None]:
# Delete all collections in the db
vector_db.delete_collection()