#  PDF Question Answering using Shivaay API (RAG-like)

This notebook lets you **ask questions from a PDF document** using a pipeline that:
1. Extracts and chunks text from a PDF (local or URL).
2. Filters chunks relevant to your query (via keyword overlap).
3. Sends the query and context to the **Shivaay LLM API**.
4. Returns a grounded, context-based answer.

 API used: `https://api.futurixai.com/api/shivaay/v1/chat/completions`  
 You can ask any question from the PDF (e.g., a report, textbook, whitepaper, etc.)  
 Make sure to install `PyPDF2` before running this notebook:  
```bash
pip install PyPDF2


In [1]:
pip install PyPDF2



###  Step 1: Imports and Config
We import necessary libraries:
- `requests`: to download files or call APIs.
- `PyPDF2`: for extracting text from PDFs.
- `io`, `re`: for handling file content and text cleaning.

We also define our **API key** and the **Shivaay API endpoint**.


In [2]:
import requests
import PyPDF2
import io
import re

API_KEY = #YOUR-API-KEY
API_URL = "https://api.futurixai.com/api/shivaay/v1/chat/completions"


###  Step 2: PDF Text Extraction (Generator)
This function:
- Accepts a **local PDF file** or a **URL**.
- Uses PyPDF2 to extract text **page by page**.
- Cleans whitespace using regex.
- Yields the text **as a generator**, so memory isn't overloaded.


In [3]:
def extract_text_generator(pdf_path_or_url, is_url=False):

    try:
        if is_url:
            response = requests.get(pdf_path_or_url)
            file = io.BytesIO(response.content)
        else:
            file = open(pdf_path_or_url, 'rb')

        reader = PyPDF2.PdfReader(file)

        for page in reader.pages:
            page_text = page.extract_text()
            if page_text:
                cleaned = re.sub(r'\s+', ' ', page_text).strip()
                yield cleaned

    except Exception as e:
        print(f"❌ Error: {e}")
        return


###  Step 3: Chunking the PDF text
This function takes the page-wise generator and:
- Combines text into chunks of `chunk_size` (default 1000 characters).
- Uses **overlap** (default 200 chars) to preserve context across chunks.
- Yields each chunk as a string.


In [4]:
def chunk_generator(text_gen, chunk_size=1000, overlap=200):

    buffer = ""
    for page_text in text_gen:
        buffer += page_text + " "
        while len(buffer) >= chunk_size:
            yield buffer[:chunk_size]
            buffer = buffer[chunk_size - overlap:]
    if buffer:
        yield buffer


###  Step 4: Shivaay API Call with Context
This function sends a user query and the selected document chunks to the Shivaay API:
- Prompts it to answer **only using the document context**.
- Sets low temperature for factual answers.
- Uses `max_tokens=300` (can be changed).


In [5]:
def query_shivaay_with_context(query, context, max_tokens=300):
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}"
    }

    payload = {
        "model": "shivaay",
        "messages": [
            {
                "role": "system",
                "content": f"""Answer questions using ONLY this PDF content:
                {context}

                Respond "Not mentioned in document" for unrelated questions."""
            },
            {
                "role": "user",
                "content": query
            }
        ],
        "temperature": 0.2,
        "max_tokens": max_tokens
    }

    try:
        response = requests.post(API_URL, headers=headers, json=payload, timeout=15)
        response.raise_for_status()
        return response.json()['choices'][0]['message']['content']
    except requests.exceptions.RequestException as e:
        return f"❌ API Error: {str(e)}"


###  Step 5: Extract and Chunk the PDF
We:
- Set the PDF path (either a local file or URL).
- Use our extract function to stream the PDF text.
- Chunk the text using the function above.
- Print how many chunks were created.


In [6]:
pdf_source = "/content/indian army.pdf" #Add path to your pdf file
is_url = pdf_source.startswith(('http://', 'https://'))

print("🔃 Extracting as stream...")
text_gen = extract_text_generator(pdf_source, is_url)

print("🔄 Creating chunks...")
chunks = list(chunk_generator(text_gen))
print(f"✅ Created {len(chunks)} chunks.")


🔃 Extracting as stream...
🔄 Creating chunks...
✅ Created 23 chunks.


###  Step 6: Relevance Filtering
This helper function scores chunks based on keyword overlap with the query.
Returns the top-k most relevant chunks.


In [7]:
def find_relevant_chunks(query, chunks, top_k=3):
    query_words = set(re.findall(r'\w+', query.lower()))
    scored = []
    for chunk in chunks:
        chunk_words = set(re.findall(r'\w+', chunk.lower()))
        common = query_words.intersection(chunk_words)
        score = len(common)
        scored.append((score, chunk))
    scored.sort(reverse=True)
    return [chunk for score, chunk in scored[:top_k] if score > 0]


###  Step 7: User Question and Context Filtering
We:
- Take a user question as input.
- Use a helper (defined below) to find most **relevant chunks**.
- Combine the chunks as context for the API.


In [8]:
user_question = input("💬 Your question about the document: ")
relevant_chunks = find_relevant_chunks(user_question, chunks)
context = "\n\n".join(relevant_chunks)


💬 Your question about the document: Explain the rank system in Indian Army


###  Step 8: Generate Answer from Shivaay
- Sends the user question and context to the API.
- Prints the final answer.
- Optionally shows part of the context that was sent.


In [9]:
print("🤖 Consulting Shivaay...")
answer = query_shivaay_with_context(user_question, context)

print("\n📝 Answer:")
print(answer)

print("\n🔗 Context Preview:")
print(context[:500] + "...")


🤖 Consulting Shivaay...

📝 Answer:
The rank structure in the Indian Army is divided into two main categories: Commissioned Officers and Junior Commissioned Officers/Non-Commissioned Officers (JCOs/NCOs).

For Commissioned Officers, the ranks from highest to lowest are:
- Field Marshal
- General
- Lieutenant General
- Major General
- Brigadier
- Colonel
- Lieutenant Colonel
- Major
- Captain
- Lieutenant

For Junior Commissioned Officers/Non-Commissioned Officers, the ranks from highest to lowest are:
- Subedar Major
- Subedar
- Naib Subedar
- Havildar Major
- Havildar
- Naik
- Lance Naik
- Jawan

These ranks form the hierarchical structure that ensures command and control within the Indian Army.

🔗 Context Preview:
t, major changes made were with the introduction of helicopters, BOFORS Guns and track vehicles such as BMP and T72 tanks. The changes in the Armed Forces were to make it a modern war-fighting machine and to win the wars. Intext Questions 13.1 1. Where did Indians attend mil