In [1]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

In [2]:
path=r'C:\Users\sakth\OneDrive\Desktop\rag2\1900-2025.pdf'

In [3]:
loader = PyPDFLoader(path)
documents = loader.load()
documents[0].page_content

'1900–1914: The Era of Imperialism, Nationalism, and Alliance Building \n1904 – The Russo-Japanese War \n1904–1905 – Russo-Japanese War \nMarked the first time an Asian power defeated a European empire, shifting power dynamics in East Asia and shaking \nEuropean confidence. In 1904, the Russo-Japanese War emerged as a pivotal development in global politics. It was the first \nmajor military victory of an Asian power over a European one in modern times. The war revealed the weaknesses of the \nRussian Empire and contributed to internal unrest that would later culminate in revolution. Japan’s victory elevated its \nstatus as a global power and shifted the balance in East Asia. It also inspired other colonial territories to consider resistance \nagainst European domination. The conflict signaled the decline of European dominance in Asia and marked Japan’s arrival \non the world stage. \nThe early 20th century was marked by intense competition among the great powers of Europe, the United S

In [55]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=10)
docs = splitter.split_documents(documents)
docs[0].page_content

'1900–1914: The Era of Imperialism, Nationalism, and Alliance Building \n1904 – The Russo-Japanese War \n1904–1905 – Russo-Japanese War \nMarked the first time an Asian power defeated a European empire, shifting power dynamics in East Asia and shaking \nEuropean confidence. In 1904, the Russo-Japanese War emerged as a pivotal development in global politics. It was the first \nmajor military victory of an Asian power over a European one in modern times. The war revealed the weaknesses of the \nRussian Empire and contributed to internal unrest that would later culminate in revolution. Japan’s victory elevated its \nstatus as a global power and shifted the balance in East Asia. It also inspired other colonial territories to consider resistance \nagainst European domination. The conflict signaled the decline of European dominance in Asia and marked Japan’s arrival \non the world stage. \nThe early 20th century was marked by intense competition among the great powers of Europe, the United S

In [56]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

In [57]:
embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='sentence-transformers/all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [58]:
db = FAISS.from_documents(docs, embeddings)

In [59]:
query = "world war 1"
doc = db.similarity_search(query)
print(doc[0].page_content) 

In 1914, the assassination of Archduke Franz Ferdinand of Austria-Hungary triggered the outbreak of World War I. This 
single event activated a complex web of alliances, plunging all of Europe into a devastating war. It symbolized the volatile 
mix of nationalism, militarism, and imperial ambitions in early 20th-century Europe. The subsequent war led to 
unprecedented destruction, millions of casualties, and the collapse of major empires. The political landscape of the world 
was irrevocably altered, and it set the stage for future global conflicts. The consequences of the assassination echoed for 
decades. 
World War I was unprecedented in scale and destructiveness, mobilizing millions of soldiers and resulting in approximately 
17 million deaths. Industrialized warfare introduced new technologies such as machine guns, tanks, and chemical weapons. 
The war saw brutal trench fighting on the Western Front and sweeping maneuvers on the Eastern Front. Beyond Europe,


In [60]:
retriever = db.as_retriever(search_kwargs={"k": 10})

In [61]:
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFacePipeline
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM,AutoModelForSeq2SeqLM
import torch

In [62]:
model_id = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

In [63]:
pipe = pipeline(
    "text2text-generation",
    model= model,
    tokenizer=tokenizer,
    max_new_tokens = 1024,
    do_sample=True,
    temperature=0.8,
    top_p=0.98,
    top_k=5
)

Device set to use cpu


In [64]:
local_llm = HuggingFacePipeline(pipeline=pipe)

In [65]:
qa_chain = RetrievalQA.from_chain_type(
    llm=local_llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

In [66]:
while True:
  query = input("User: ")
  if query.lower() in ("exit", "quit"):
    break
  result = qa_chain.invoke(query)
  print("\nAssistant:", result['result'])
  print("-" * 50)

Token indices sequence length is longer than the specified maximum sequence length for this model (2053 > 512). Running this sequence through the model will result in indexing errors



Assistant: conflict
--------------------------------------------------

Assistant: The United States entered the war in 1941 after the Japanese attack on Pearl Harbor
--------------------------------------------------
