**From PDFs to Powerful Answers**

What if a language model could not only read a dense research paper, but answer your questions about it — intelligently and in context? This project demonstrates exactly that, combining `IBM Watsonx` `LLMs` and `LangChain` to build a `retrieval-augmented system` that makes complex documents searchable, chunkable, and explorable via `natural language`.


**MAIN STRUCTURE**

In [None]:
# included to maintain privacy on my Watson 
import warnings
warnings.filterwarnings("ignore")

**1.0 | PDF Ingestion**

This task fetches and loads a PDF into LangChain for downstream text analysis.

**Ingesting Academic Papers from the Web using LangChain**

*Download and load a PDF into a LangChain document object for structured processing*

In [3]:
import requests
from langchain.document_loaders import PyPDFLoader

url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/WgM1DaUn2SYPcCg_It57tA/A-Comprehensive-Review-of-Low-Rank-Adaptation-in-Large-Language-Models-for-Efficient-Parameter-Tuning-1.pdf"
pdf_path = "lora_review.pdf"

response = requests.get(url)
with open(pdf_path, "wb") as f:
    f.write(response.content)

loader = PyPDFLoader(pdf_path)
documents = loader.load()

print(documents[0].page_content[:1000])


A Comprehensive Review of Low-Rank
Adaptation in Large Language Models for
Efficient Parameter Tuning
September 10, 2024
Abstract
Natural Language Processing (NLP) often involves pre-training large
models on extensive datasets and then adapting them for specific tasks
through fine-tuning. However, as these models grow larger, like GPT-3
with 175 billion parameters, fully fine-tuning them becomes computa-
tionally expensive. We propose a novel method called LoRA (Low-Rank
Adaptation) that significantly reduces the overhead by freezing the orig-
inal model weights and only training small rank decomposition matrices.
This leads to up to 10,000 times fewer trainable parameters and reduces
GPU memory usage by three times. LoRA not only maintains but some-
times surpasses fine-tuning performance on models like RoBERTa, De-
BERTa, GPT-2, and GPT-3. Unlike other methods, LoRA introduces
no extra latency during inference, making it more efficient for practical
applications. All relevant code an

**2.0 | Chunking Raw LaTeX with LangChain**

This task simulates real-world preprocessing by chunking LaTeX content into manageable sections.

**Chunking Raw LaTeX Content for LLM Readability**

*Transforming LaTeX text into clean document chunks using recursive splitting.*

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document

latex_text = """
\\documentclass{article}
\\begin{document}
\\maketitle
\\section{Introduction}
Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent years, LLMs have made significant advances in various natural language processing tasks, including language translation, text generation, and sentiment analysis.

\\subsection{History of LLMs}
The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power available at the time. In the past decade, however, advances in hardware and software have made it possible to train LLMs on massive datasets, leading to significant improvements in performance.

\\subsection{Applications of LLMs}
LLMs have many applications in the industry, including chatbots, content creation, and virtual assistants. They can also be used in academia for research in linguistics, psychology, and computational linguistics.
\\end{document}
"""

# Wrap LaTeX as a LangChain Document
doc = Document(page_content=latex_text)

# Initialize splitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)

# Perform the split
split_docs = splitter.split_documents([doc])

# Display results
for i, d in enumerate(split_docs):
    print(f"\n--- Chunk {i+1} ---\n{d.page_content}")



--- Chunk 1 ---
\documentclass{article}
\begin{document}
\maketitle
\section{Introduction}
Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent years, LLMs have made significant advances in various natural language processing tasks, including language translation, text generation, and sentiment analysis.

--- Chunk 2 ---
\subsection{History of LLMs}
The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power available at the time. In the past decade, however, advances in hardware and software have made it possible to train LLMs on massive datasets, leading to significant improvements in performance.

--- Chunk 3 ---
\subsection{Applications of LLMs}
LLMs have many applications in the industry, including chatbots, content creation, and virtual assistants. They can also be used in academia fo

**3.0 | Generating Embeddings with HuggingFace**

This task demonstrates how to convert queries into dense vector embeddings

**Embedding Natural Language Queries Using HuggingFace Transformers**

*Utilizes the MiniLM model to convert input text into a high-dimensional embedding.*

In [5]:
from langchain.embeddings import HuggingFaceEmbeddings

# Load pre-trained embedding model (can replace with any local or HF model you prefer)
embedder = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

query = "How are you?"

embedding_result = embedder.embed_query(query)

print("First 5 embedding values:", embedding_result[:5])

  embedder = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


First 5 embedding values: [0.007003897335380316, 0.010914130136370659, 0.08746254444122314, 0.086799256503582, 0.02664852701127529]


**3.5 | Secure Credential Handling with `getpass()`**

This task securely captures sensitive IBM Watsonx credentials and loads them into environment variables for API access

**Protecting API Credentials with getpass() for IBM Watsonx**

*Use Python’s `getpass()` to safely manage API keys and project configs without exposing them in source code.*

In [14]:
import os
from getpass import getpass

api_key = getpass("Enter your Watsonx API Key: ")
project_id = getpass("Enter your Watsonx Project ID: ")
w_url = getpass("Enter your Watsonx URL: ")

os.environ["WATSONX_APIKEY"] = api_key
os.environ["WATSONX_PROJECT_ID"] = project_id
os.environ["WATSONX_URL"] = w_url


**4.0 | Vectorizing Policies with Watsonx Embeddings**

This task downloads a text policy file, chunks it, embeds it using IBM Watsonx, and enables semantic search with LangChain & Chroma

**Embedding and Searching Policy Texts with IBM Watsonx + Chroma**

*Download policy documents, split them for retrieval, embed with Watsonx, and perform similarity search queries.*

In [None]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ibm.embeddings import WatsonxEmbeddings
from langchain.vectorstores import Chroma

url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/Ec5f3KYU1CpbKRp1whFLZw/new-Policies.txt"
local_path = "new-Policies.txt"
if not os.path.exists(local_path):
    import requests
    response = requests.get(url)
    with open(local_path, "w", encoding="utf-8") as f:
        f.write(response.text)

loader = TextLoader(local_path)
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
split_docs = splitter.split_documents(docs)

embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url=os.environ["WATSONX_URL"],
    apikey=os.environ["WATSONX_APIKEY"],
    project_id=os.environ["WATSONX_PROJECT_ID"]
)

vector_db = Chroma.from_documents(split_docs, embedding, persist_directory="./vectorstore")

query = "Smoking policy"
results = vector_db.similarity_search(query, k=5)

for i, doc in enumerate(results, 1):
    print(f"\nResult {i}:\n{doc.page_content[:500]}")  # Print only first 500 chars per result



Result 1:
This policy encourages the responsible use of mobile devices in line with legal and ethical standards. Employees are expected to understand and follow these guidelines. The policy is regularly reviewed to stay current with evolving technology and security best practices.

Result 2:
This policy encourages the responsible use of mobile devices in line with legal and ethical standards. Employees are expected to understand and follow these guidelines. The policy is regularly reviewed to stay current with evolving technology and security best practices.

Result 3:
Consequences: Violations of this policy may lead to disciplinary action, including potential termination.

This policy promotes the safe and responsible use of digital communication tools in line with our values and legal obligations. Employees must understand and comply with this policy. Regular reviews will ensure it remains relevant with changing technology and security standards.

4. Mobile Phone Policy

Result 4:
C

**5.0 | Creating a Retriever for Targeted Search**

This task builds a retriever from embedded policy documents using IBM Watsonx and Chroma, enabling precise semantic lookup.

**Building a Custom Retriever with IBM Watsonx & Chroma**

*Convert text into searchable vector representations and query for specific policies using a retriever interface.*


In [12]:
query = "Email policy"

In [None]:
from langchain.vectorstores import Chroma
from langchain_ibm import WatsonxEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import TextLoader
import os

# Load document
loader = TextLoader("new-Policies.txt")
docs = loader.load()

# Split document
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
split_docs = splitter.split_documents(docs)

# Embed
embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url=os.environ["WATSONX_URL"],
    apikey=os.environ["WATSONX_APIKEY"],
    project_id=os.environ["WATSONX_PROJECT_ID"]
)

# Use Chroma to create a retriever
vector_db = Chroma.from_documents(split_docs, embedding, persist_directory="./vectorstore_task5")
retriever = vector_db.as_retriever(search_type="similarity", search_kwargs={"k": 2})

# Run the retriever
query = "Email policy"
results = retriever.get_relevant_documents(query)

# Display the results
for i, res in enumerate(results):
    print(f"\nResult {i+1}:\n{res.page_content[:500]}")


  results = retriever.get_relevant_documents(query)



Result 1:
3. Internet and Email Policy

Our Internet and Email Policy ensures the responsible and secure use of these tools within our organization, recognizing their importance in daily operations and the need for compliance with security, productivity, and legal standards.

Acceptable Use: Company-provided internet and email are primarily for job-related tasks. Limited personal use is permitted during non-work hours as long as it does not interfere with work duties.

Result 2:
3. Internet and Email Policy

Our Internet and Email Policy ensures the responsible and secure use of these tools within our organization, recognizing their importance in daily operations and the need for compliance with security, productivity, and legal standards.

Acceptable Use: Company-provided internet and email are primarily for job-related tasks. Limited personal use is permitted during non-work hours as long as it does not interfere with work duties.


**6.0 | Retrieval-Augmented Generation with IBM Watsonx**

This task demonstrates a full RAG pipeline using Watsonx embeddings and Granite LLMs to answer natural language queries grounded in a research paper.

**RAG Pipeline with IBM Watsonx: Grounded QA Over Research PDFs**

*Combines document embeddings and Granite LLM to answer user queries with evidence-based responses using LangChain’s RetrievalQA.*


In [None]:
import os
import requests
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ibm.embeddings import WatsonxEmbeddings
from langchain_ibm.llms import WatsonxLLM
from langchain.vectorstores import Chroma

pdf_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/WgM1DaUn2SYPcCg_It57tA/A-Comprehensive-Review-of-Low-Rank-Adaptation-in-Large-Language-Models-for-Efficient-Parameter-Tuning-1.pdf"
pdf_path = "review-paper.pdf"

if not os.path.exists(pdf_path):
    response = requests.get(pdf_url)
    with open(pdf_path, "wb") as f:
        f.write(response.content)

loader = PyPDFLoader(pdf_path)
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
split_docs = splitter.split_documents(docs)

embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",  
    url=os.environ["WATSONX_URL"],
    apikey=os.environ["WATSONX_APIKEY"],
    project_id=os.environ["WATSONX_PROJECT_ID"]
)

vector_db = Chroma.from_documents(split_docs, embedding, persist_directory="./rag_bot_vectorstore")

llm = WatsonxLLM(
    model_id="ibm/granite-13b-instruct-v2", 
    url=os.environ["WATSONX_URL"],
    apikey=os.environ["WATSONX_APIKEY"],
    project_id=os.environ["WATSONX_PROJECT_ID"]
)

retriever = vector_db.as_retriever(search_kwargs={"k": 2})
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, return_source_documents=False)

query = "Is eating cheese healthy?"
answer = qa_chain.run(query)

print("\n🔎 Query:", query)
print("🧠 Answer:", answer)


  answer = qa_chain.run(query)



🔎 Query: Is eating cheese healthy?
🧠 Answer:  Yes, cheese is a great source of calcium and protein. It can also help you maintain a healthy
