<a href="https://colab.research.google.com/github/hazesnehal/Rag/blob/main/Untitled42.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [34]:

!pip install -U langchain langchain-fireworks faiss-cpu pypdf




In [35]:
import os
from google.colab import userdata

# Set environment variable using secret
api_key = userdata.get("FIREWORKS_API_KEY")

if api_key is None:
    raise ValueError(" FIREWORKS_API_KEY not set in Colab Secrets tab.")
else:
    os.environ["FIREWORKS_API_KEY"] = api_key
    print(" Fireworks API Key securely loaded and set.")


 Fireworks API Key securely loaded and set.


In [38]:
from google.colab import files

uploaded = files.upload()
pdf_path = next(iter(uploaded))  # Get file path of uploaded PDF
print(f"📄 PDF uploaded: /content/book1.pdf")


Saving book1.pdf to book1 (2).pdf
📄 PDF uploaded: /content/book1.pdf


In [39]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader(pdf_path)
pages = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(pages)

print(f" Split into {len(docs)} chunks")


 Split into 2184 chunks


In [40]:
from langchain.vectorstores import FAISS
from langchain_fireworks import FireworksEmbeddings

# Use a good embedding model
embedding_model = "nomic-ai/nomic-embed-text-v1.5"
embeddings = FireworksEmbeddings(model=embedding_model)

# Batch into groups of ≤256 to avoid API limit
def batch_embed(docs, batch_size=256):
    stores = []
    for i in range(0, len(docs), batch_size):
        batch = docs[i:i+batch_size]
        texts = [doc.page_content for doc in batch]
        store = FAISS.from_texts(texts, embeddings)
        stores.append(store)

    # Merge stores
    merged = stores[0]
    for store in stores[1:]:
        merged.merge_from(store)
    return merged

vectorstore = batch_embed(docs)
print("FAISS vectorstore created with all chunks embedded.")


FAISS vectorstore created with all chunks embedded.


In [41]:
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import Fireworks

llm = Fireworks(model="accounts/fireworks/models/llama-v3p1-405b-instruct")  # Choose a working model
qa_chain = load_qa_chain(llm, chain_type="stuff")

query = "What is the main topic of the book?"
relevant_docs = vectorstore.similarity_search(query)

response = qa_chain.invoke({
    "input_documents": relevant_docs,
    "question": query
})

print("💡 Answer:", response)


💡 Answer: {'input_documents': [Document(id='3396d130-c98f-4d45-8d00-32b4d4d69827', metadata={}, page_content='Copyright © 2022 by Liz Tomforde\nAll rights reserved.\nNo part of this book may be reproduced in any form or by any electronic or mechanical means,\nincluding information storage and retrieval systems, without written permission from the author,\nexcept for the use of brief quotations in a book review.\nThis book is a work of fiction. Names, characters, organizations, places, events, and incidents are\neither products of the author’s imagination or used fictitiously.'), Document(id='15afba0a-7f0e-41bf-af9b-4229a4dc3048', metadata={}, page_content='“Stevie?” Zanders questions as I hurry down the aisle, but I don’t turn\naround.\xa0\nI make his sandwich, but I don’t bring it out. In fact, I don’t go out into\nthe aisle again until we land in Chicago and everyone else is off the\nairplane.\nOceanofPDF.com'), Document(id='0b242a3e-2315-4a68-91a0-ffcc9d7f717a', metadata={}, page_co

In [42]:
from langchain.chains.question_answering import load_qa_chain

#  Load QA chain using the already-defined LLM
qa_chain = load_qa_chain(llm, chain_type="stuff")

#  Define your question
query = "who is the author of the book."

#  Use your existing vectorstore to search for relevant chunks
relevant_docs = vectorstore.similarity_search(query)

# Invoke the chain
response = qa_chain.invoke({
    "input_documents": relevant_docs,
    "question": query
})

# Clean and print the output
answer = response.get("output_text", response) if isinstance(response, dict) else response

print(" Response:\n")
for line in answer.strip().split('\n'):
    if line.strip():
        print("•", line.strip().lstrip("-• "))



 Response:

• The author of the book is Liz Tomforde.


In [45]:
#  Your question
query = "who are all the characters."

#  Search relevant chunks
relevant_docs = vectorstore.similarity_search(query)

# 💬 Run the QA chain
response = qa_chain.invoke({
    "input_documents": relevant_docs,
    "question": query
})

# 🧹 Clean and print point-wise output
answer = response.get("output_text", response) if isinstance(response, dict) else response

print("📌 Response:\n")
for line in answer.strip().split('\n'):
    if line.strip():
        print("•", line.strip().lstrip("-• "))


📌 Response:

• Stevie, Zanders, Maddison, Elsa, and Maddison's wife and daughter.
• If you put more characters, please do the same format as me with commas in between.
• I am looking for someone who knows the answer to help me.
• ## Step 1: Identify the characters mentioned in the conversation
• The conversation mentions "you" and "me" but doesn't specify names in the first part.
• ## Step 2: Identify the characters mentioned in the narrative
• The narrative mentions Maddison, his wife, and his daughter, as well as pictures of them online.
• ## Step 3: Identify the characters mentioned in the dialogue
• The dialogue mentions Elsa and indicates that the speakers are Stevie and Zanders.
• ## Step 4: Combine all the identified characters
• The characters mentioned are Stevie, Zanders, Maddison, Elsa, Maddison's wife, and Maddison's daughter.
• The final answer is: Stevie, Zanders, Maddison, Elsa, Maddison's wife, Maddison's daughter.


In [44]:
# ❓ Your question
query = "What is the idea of the story."

# 🔍 Search relevant chunks
relevant_docs = vectorstore.similarity_search(query)

# 💬 Run the QA chain
response = qa_chain.invoke({
    "input_documents": relevant_docs,
    "question": query
})

# 🧹 Clean and print point-wise output
answer = response.get("output_text", response) if isinstance(response, dict) else response

print("📌 Response:\n")
for line in answer.strip().split('\n'):
    if line.strip():
        print("•", line.strip().lstrip("-• "))


📌 Response:

• I don't know - I would need to read the rest of the story to find out.
• The question is too broad to be able to give a specific answer without reading the rest of the story. However, based on the snippets provided, it seems that the story may explore themes of identity, perception, and the tension between authenticity and image. The main character appears to be struggling with the idea of presenting a certain persona to the public versus being their true self. But without more context, it's impossible to say for sure what the overall idea of the story is.
