## Uploaing PDF

In [1]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader
import nltk
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/abdullah/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /Users/abdullah/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


True

In [2]:
doc_path = 'Cloud Computing full.pdf'
model = 'deepseek-r1:1.5b'

In [3]:
if doc_path:
    loader=UnstructuredPDFLoader(file_path=doc_path)
    data = loader.load()
    print("done loading......")
else:
    print("upload the pdf file")

done loading......


## Checking Your data 

In [4]:
content=data[0].page_content
print(content[:100])

Department of CSE

1

JNTU Hyderabad

Cloud Computing

(Professional Elective – IV)

IV Year B.Tech.


## Extract Text from PDF and split into chunk

In [5]:
from langchain_ollama import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [6]:
text_splitter =  RecursiveCharacterTextSplitter(chunk_size=1200,chunk_overlap=300)
chunks = text_splitter.split_documents(data)
print("done splitting...")

done splitting...


## Adding to vector database

In [7]:
import ollama
ollama.pull("nomic-embed-text")

ProgressResponse(status='success', completed=None, total=None, digest=None)

In [8]:
vector_db = Chroma.from_documents(
    documents=chunks,
    embedding=OllamaEmbeddings(model="nomic-embed-text"),
    collection_name="simple-rag"
)
print("done adding to vector database....")
    

done adding to vector database....


## Retrivel the things

In [9]:
from langchain.prompts import ChatPromptTemplate , PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

## setting our model to use

In [10]:
llm = ChatOllama(model=model)

#### a simple technique to generate multiple questions from a single question and then retrieve documents
#### based on those questions, getting the best of both worlds.

In [11]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

In [12]:
retriever= MultiQueryRetriever.from_llm(
    vector_db.as_retriever(),llm,prompt=QUERY_PROMPT
)

## RAG prompt

In [13]:
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt= ChatPromptTemplate.from_template(template)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


In [15]:
res = chain.invoke(input=("what is cloud computing"))
print(res)

<think>
Okay, I need to figure out the answer to the question "What is cloud computing?" based on the provided text. Let me read through the text carefully.

First, I'll go through each section to identify where cloud computing is mentioned. The text starts with an introduction about computing power and how it's essential for business growth. It then talks about traditional computing systems like mainframe computers, servers, and databases.

Next, in the "History" section, it mentions that the first computers were built by British and German companies. Then, it shifts to the rise of cloud computing as a solution for these old systems. The text describes how companies started using virtualized machines on virtual servers across different clouds, which became more efficient compared to buying physical hardware each time.

There's also a section about the "Motivation" part. It explains that before cloud computing, businesses had to pay for hardware and software upfront. This leads to high