<a href="https://colab.research.google.com/github/shivamdhumal77/LLM_Projects/blob/main/Rag_for_multiple_pdfs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [47]:
pip install faiss-cpu



In [49]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import FakeEmbeddings
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

In [50]:
# Initialize ChatOpenAI instances
llm_1 = ChatOpenAI(
    api_key="ollama",
    base_url="https://sunny-gerri-finsocialdigitalsystem-d9b385fa.koyeb.app/v1",
    model="athene-v2"
)

llm_2 = ChatOpenAI(
    api_key="ollama",
    base_url="https://sunny-gerri-finsocialdigitalsystem-d9b385fa.koyeb.app/v1",
    model="text-davinci-003"
)

In [51]:
# Function to use custom LLM logic
def generate_text(prompt):
    try:
        response = llm_1.predict(prompt)
        print("Generated Text from nemotron-mini:")
        return response
    except Exception as e:
        print(f"Error with first model: {e}")
        print("Switching to the second model...")
        try:
            response = llm_2.predict(prompt)
            print("Generated Text from text-davinci-003:")
            return response
        except Exception as e:
            print(f"Error with second model: {e}")
            return None

In [52]:
def load_and_chunk_pdfs(pdf_paths, chunk_size=5000, chunk_overlap=200):
    all_documents = []
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    for pdf_path in pdf_paths:
        loader = PyPDFLoader(pdf_path)
        documents = loader.load()
        chunks = text_splitter.split_documents(documents)
        all_documents.extend(chunks)

    return all_documents

In [53]:
pip install pypdf



In [58]:
# Create a vector store with dummy embeddings
def create_vector_store(docs):
    embeddings = FakeEmbeddings(size=768)  # Simulate embedding size for FAISS
    vectorstore = FAISS.from_documents(docs, embeddings)
    return vectorstore

# RAG pipeline with custom LLM
def setup_rag_pipeline(vectorstore):
    retriever = vectorstore.as_retriever()

    # Custom RetrievalQA chain
    class CustomRetrievalQA:
        def __init__(self, retriever, llm):
            self.retriever = retriever
            self.llm = llm

        def run(self, query):
            relevant_docs = self.retriever.get_relevant_documents(query)
            context = "\n".join([doc.page_content for doc in relevant_docs])
            prompt = f"Context:\n{context}\n\nQuery: {query}\nAnswer:"
            return self.llm(prompt)

    return CustomRetrievalQA(retriever, generate_text)

# Main function to process the query
def query_multiple_pdfs(pdf_paths, query):
    # Step 1: Load and chunk the PDFs
    docs = load_and_chunk_pdfs(pdf_paths)

    # Step 2: Create vector store
    vectorstore = create_vector_store(docs)

    # Step 3: Setup RAG pipeline
    qa_chain = setup_rag_pipeline(vectorstore)

    # Step 4: Query the RAG pipeline
    print(f"Processing query: {query}")
    response = qa_chain.run(query)
    return response

# Example usage
pdf_paths = ["/content/21 laws of leadership.pdf","/content/Rich Dad Poor Dad ( PDFDrive ).pdf","/content/youre-too-good-to-feel-this-bad-an-orthodox-approach-to-living-an-unorthodox-lif-pr_c3dad2f610c46141c398a386253e2a19.pdf","/content/sun tzu the art of war.pdf"]  # Add paths to your PDFs
query = "summarise everything and give a single approach in 5000 words"
response = query_multiple_pdfs(pdf_paths, query)

# Print the output
if response:
    print("Final Answer:")
    print(response)
else:
    print("Failed to generate an answer.")

Processing query: summarise everything and give a single approach in 5000 words
Generated Text from nemotron-mini:
Final Answer:
### Summary of Context and Core Principles

The provided context spans diverse topics, from ancient Chinese military strategy to complex textual analyses. However, underlying these varied themes is a consistent emphasis on strategic intelligence gathering, execution of plans with precision, maintaining secrecy, and leveraging human resources effectively—whether in the form of spies or key functionalaries. The core principles can be distilled into several key areas: the importance of information, the necessity of trust, the role of loyalty, and the critical need for discretion.

### Strategic Intelligence Gathering

#### Ancient Military Strategy
In ancient Chinese military strategy, as exemplified by Sun Tzu’s "The Art of War," the gathering of intelligence is paramount. Sun Tzu emphasizes the use of spies to gather information about the enemy's movements, st