In [None]:
pip install pymupdf faiss-cpu sentence-transformers langchain groq


Collecting pymupdf
  Downloading pymupdf-1.26.0-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting groq
  Downloading groq-0.25.0-py3-none-any.whl.metadata (15 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-non

In [None]:
import fitz  # for reading PDFs
from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import os
from groq import Groq  # For calling Groq API


**Uploading the file**

In [None]:
from google.colab import files

uploaded = files.upload()

pdf_path = list(uploaded.keys())[0]
print(f"📄 PDF uploaded: {pdf_path}")


Saving leph101.pdf to leph101.pdf
📄 PDF uploaded: leph101.pdf


This function extract_text_from_pdf uses the fitz library (PyMuPDF) to open a PDF file and extract text from it. It iterates through each page of the PDF and appends the extracted text into a single string, which it then returns.

In [None]:
def extract_text_from_pdf(pdf_path):
    doc = fitz.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text()
    return text


This function breaks a long text into smaller chunks using a RecursiveCharacterTextSplitter. Each chunk is of size chunk_size and overlaps with the next one by chunk_overlap characters, helping maintain context between chunks.

In [None]:
def chunk_text(text, chunk_size=500, chunk_overlap=50):
    splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    return splitter.split_text(text)


This function splits a long text into smaller overlapping parts using RecursiveCharacterTextSplitter. Each part is up to 500 characters long by default, with a 50-character overlap between chunks to keep the flow of information smooth.

In [None]:
def build_faiss_index(chunks, embedding_model):
    vectors = embedding_model.encode(chunks)
    index = faiss.IndexFlatL2(vectors.shape[1])  # Creates a FAISS index
    index.add(np.array(vectors))  # Adds all vectors to it
    return index, {i: chunk for i, chunk in enumerate(chunks)}


This function turns text chunks into number vectors using an embedding model. It then builds a FAISS index to quickly find similar chunks. Finally, it returns the index and a dictionary linking each chunk to its number.

In [None]:
def retrieve_top_k_chunks(query, k, embedding_model, index, chunk_map):
    query_vec = embedding_model.encode([query])
    distances, indices = index.search(np.array(query_vec), k)
    return [chunk_map[i] for i in indices[0]]


This function finds the top k most similar text chunks to a given query. It turns the query into a vector, searches the FAISS index for the closest matches, and returns those matching chunks using the chunk map.

In [None]:
from openai import OpenAI

def generate_answer_groq(context, query, api_key):
    client = OpenAI(
        api_key=api_key,
        base_url="https://api.groq.com/openai/v1"
    )

    response = client.chat.completions.create(
        model="llama3-8b-8192",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": f"Context: {context}\n\nQuestion: {query}"}
        ],
        temperature=0.3,
        max_tokens=512
    )

    return response.choices[0].message.content



This function uses the Groq API to get an answer from the LLaMA 3 model. It sends the context and question as a message, asks the model to respond helpfully, and returns the generated answer. The response is controlled by settings like temperature (for creativity) and max_tokens (for length).

In [None]:
pdf_path = pdf_path
query = "What are the main findings of the paper?"
groq_api_key = "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"  # 🔒 Replace this with your Groq API Key


print("📄 Extracting text...")
text = extract_text_from_pdf(pdf_path)

print("🔗 Chunking text...")
chunks = chunk_text(text)

print("🔍 Embedding chunks...")
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
index, chunk_map = build_faiss_index(chunks, embedding_model)

print("🔎 Retrieving relevant chunks...")
relevant_chunks = retrieve_top_k_chunks(query, k=5, embedding_model=embedding_model, index=index, chunk_map=chunk_map)

print("🤖 Generating answer...")
context = "\n\n".join(relevant_chunks)
answer = generate_answer_groq(context, query, groq_api_key)

print("\n💬 Answer:")
print(answer)


📄 Extracting text...
🔗 Chunking text...
🔍 Embedding chunks...
🔎 Retrieving relevant chunks...
🤖 Generating answer...

💬 Answer:
The main findings of the paper are:

1. Coulomb's law: The electric force (F) between two charged objects is proportional to the product of the magnitudes of the charges (q1 and q2) and inversely proportional to the square of the distance (r) between them. Mathematically, this is expressed as F = k \* (q1 \* q2) / r^2, where k is Coulomb's constant.
2. The electric field (E) due to a charged object is independent of the test charge (q) used to measure it. This is because the force (F) is proportional to q, so the ratio F/q does not depend on q.
3. Coulomb used a torsion balance to measure the force between two charged metallic spheres and discovered the inverse square law relation, now known as Coulomb's law.
4. The paper also mentions that Coulomb found the inverse square law of force between unlike and like magnetic poles.

The paper provides a brief overvie

In [None]:
def ask_question(query):
    relevant_chunks = retrieve_top_k_chunks(query, k=5, embedding_model=embedding_model, index=index, chunk_map=chunk_map)
    context = "\n\n".join(relevant_chunks)
    answer = generate_answer_groq(context, query, groq_api_key)
    print(f"\n❓ Question: {query}")
    print("💬 Answer:", answer)



In [None]:
ask_question("Derive the formula for force between two point charges.")
ask_question("How does electric force compare with gravitational force between two protons?")
ask_question("Explain the principle of superposition of electric forces.")



❓ Question: Derive the formula for force between two point charges.
💬 Answer: Based on the given context, we can derive the formula for the force between two point charges using Coulomb's law.

From the given equation:

F12 = F21 = ε0q1q2/r21

We can write the force on q1 due to q2 as:

F12 = ε0q1q2/r21

And the force on q2 due to q1 as:

F21 = ε0q2q1/r12

Since r12 = -r21, we can rewrite F21 as:

F21 = ε0q2q1/(-r21) = -ε0q1q2/r21

The negative sign indicates that the force on q2 due to q1 is in the opposite direction to the force on q1 due to q2.

Now, we can combine the two forces to get the net force on q1:

F1 = F12 - F21 = ε0q1q2/r21 + ε0q1q2/r21 = 2ε0q1q2/r21

Similarly, we can combine the two forces to get the net force on q2:

F2 = F21 - F12 = -ε0q1q2/r21 + ε0q1q2/r21 = -2ε0q1q2/r21

The net force on q1 is in the direction of r21, and the net force on q2 is in the direction of r12.

The magnitude of the force can be written as:

F = |F1| = |F2| = 2ε0q1q2/r21

This is the formu