In [25]:

# ! uv pip install langchain-openai tiktoken rapidocr-onnxruntime python-dotenv langchain-community faiss-cpu

In [26]:
import sys
print(sys.executable)


d:\AI RAG Projects\MultiDocRAG-LLMOPS\.venv\Scripts\python.exe


In [27]:
! uv pip install langchain-ollama 

[2mUsing Python 3.12.12 environment at: D:\AI RAG Projects\MultiDocRAG-LLMOPS\.venv[0m
[2mAudited [1m1 package[0m [2min 25ms[0m[0m


In [28]:
from dotenv import load_dotenv
import os
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama,OllamaEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

In [29]:
load_dotenv()

MODEL_API= os.getenv("OPENROUTER_MODEL_KEY")
MODEL_NAME= os.getenv("OPENROUTER_MODEL_NAME")
MODEL_url= os.getenv("base_url")


model=ChatOpenAI(model=MODEL_NAME,
                 api_key=MODEL_API,
                 base_url=MODEL_url,
                 )

'''
ollama can be used here if needed

model=ChatOllama(model="qwen3:1.7b",validate_model_on_init=True)
print(model.invoke("explain NPU in 5 lines"))

'''



'\nollama can be used here if needed\n\nmodel=ChatOllama(model="qwen3:1.7b",validate_model_on_init=True)\nprint(model.invoke("explain NPU in 5 lines"))\n\n'

In [30]:

messages = [
            (
                "system",
                "You are a QA assistant helping people to grasp the oncepts asked about.",
            ),
            ("human", "explain NPU in 5 lines"),
        ]
msg=model.invoke(messages)
print(msg.content)

**Neural Processing Unit (NPU)** – a specialized hardware accelerator designed to run deep‑learning inference and training workloads far more efficiently than general‑purpose CPUs or GPUs.  
It contains a massive array of MAC (multiply‑accumulate) units and on‑chip memory that execute tensor operations (e.g., convolutions, matrix multiplies) in parallel with minimal data movement.  
NPUs use low‑precision arithmetic (often 8‑bit or mixed‑precision) to boost throughput while keeping power consumption low, making them ideal for edge devices and mobile AI.  
Typical applications include image/video recognition, natural‑language processing, recommendation systems, and autonomous‑driving perception pipelines.  
By offloading AI tasks to an NPU, systems achieve higher performance per watt, reduced latency, and longer battery life compared with CPU‑only solutions.


### DATA INGESTION

#### Parsing

In [31]:
#loading pdf file using file path stored in .env file and parsing it using pdfloader
pdf_file_path =os.getenv("PDF_PATH")
loader=PyPDFLoader(file_path=pdf_file_path,extract_images=True)
documents=loader.load()

In [32]:
#book has 135 pages accurately divided it as per that
len(documents)

135

In [33]:
#removed intro-preface and last acknowledgement page since its of no use
documents=documents[2:134]


In [34]:
len(documents)

132

#### Chunking

In [35]:
splitter=RecursiveCharacterTextSplitter(chunk_size=900,chunk_overlap=90)
text_chunks=splitter.split_documents(documents)

In [36]:
len(text_chunks)

392

In [37]:
print(text_chunks[0].page_content)

1 Introduction
1.1 What is Machine Learning
Machine learning is a subﬁeld of computer science that is concerned with building algorithms
which, to be useful, rely on a collection of examples of some phenomenon. These examples
can come from nature, be handcrafted by humans or generated by another algorithm.
Machine learning can also be deﬁned as the process of solving a practical problem by 1)
gathering a dataset, and 2) algorithmically building a statistical model based on that dataset.
That statistical model is assumed to be used somehow to solve the practical problem.
To save keystrokes, I use the terms “learning” and “machine learning” interchangeably.
1.2 Types of Learning
Learning can be supervised, semi-supervised, unsupervised and reinforcement.
1.2.1 Supervised Learning
In supervised learning1, thedataset is the collection oflabeled examples{(xi, yi)}N
i=1.


#### Embedding


In [40]:
#calling embedding model
embeddings=OllamaEmbeddings(model="embeddinggemma:latest",
                           validate_model_on_init=True,
                           )

#### Vectorstore

In [41]:
#provide the list of documents to embed along with the embedding fucntion which needs to be performed on text
vectorstore=FAISS.from_documents(text_chunks,embeddings)

# Perform similarity search
query = "explain gradient descent?"
docs = vectorstore.similarity_search(query, k=4)

# Display the results
for i, doc in enumerate(docs):
    print(f"Document {i+1}:")
    print(doc.page_content)
    print("-" * 50)

#### Retriever

In [42]:
retriver=vectorstore.as_retriever()

### Query from user side------embedding-------retriver will fetch context----prompt-passing query and context retrived from retriver-----passto llm---response

- always decide input i.e query
- output -i.e stroutputparser
- llm calling model
- chain

In [43]:
template="""You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use ten sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
"""

In [44]:
prompt=ChatPromptTemplate.from_template(template)

In [45]:
parser=StrOutputParser()

In [47]:
#chain
rag_chain=(
    {"context":retriver,"question":RunnablePassthrough()}
    |prompt
    | model
    | parser
)

In [49]:
ans=rag_chain.invoke("explain normalization")

In [52]:
print(ans)

Normalization is a preprocessing step that rescales a numerical feature to a fixed, small interval, typically [0, 1] or [–1, 1].  
For each value x(j) of feature j, the normalized value \(\bar{x}(j)\) is computed as  

\[
\bar{x}(j)=\frac{x(j)-\min(j)}{\max(j)-\min(j)},
\]

where \(\min(j)\) and \(\max(j)\) are the smallest and largest values of that feature in the dataset.  
By mapping all values into the same range, normalization prevents features with large magnitudes from dominating gradient‑based updates.  
It also reduces the risk of numerical overflow or underflow when computers handle very big or very small numbers.  
The technique is especially helpful for algorithms that rely on distance calculations or assume similarly scaled inputs.  
While not strictly required, normalized data often leads to faster convergence during training.  
If a feature’s natural range is, say, 350 to 1450, subtracting 350 and dividing by 1100 produces values in [0, 1].  
Normalization is distinct fr