<a href="https://colab.research.google.com/github/navneetkrc/Open_LLM_Apps/blob/main/QA_pair_generation_using_GEMINI_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q PyMuPDF

In [None]:
import google.generativeai as genai
from google.colab import userdata
apikey=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=apikey)

In [36]:
generated_text = """## Navigating Samsung Microwave Support: A Comprehensive Guide

Samsung microwaves are ubiquitous in modern kitchens, offering convenience and speed for cooking and reheating. However, like any appliance, they can occasionally encounter issues. Fortunately, Samsung provides extensive support documentation to help users troubleshoot problems and keep their microwaves running smoothly. This guide explores the various resources available and outlines common troubleshooting steps.

**Understanding the Importance of Support Documentation**

Before contacting customer support, consulting the available documentation can often resolve common issues quickly and efficiently. Samsung's support resources typically include:

*   **User Manuals:** These are essential guides provided with every new microwave. They contain detailed instructions on operation, safety precautions, cleaning procedures, and basic troubleshooting.
*   **Online Support Pages:** Samsung's website offers dedicated support pages for each microwave model. These pages often include FAQs, troubleshooting guides, how-to videos, software updates (if applicable), and downloadable manuals.
*   **Samsung Community Forums:** These forums provide a platform for users to connect with each other, share experiences, and find solutions to common problems. Samsung representatives may also participate in these forums.

**Common Microwave Issues and Troubleshooting Steps**

Here are some common problems encountered with Samsung microwaves and the steps users can take to resolve them, often found within the support documentation:

**1. Microwave Not Heating:**

*   **Power Supply:** The most basic check is to ensure the microwave is properly plugged into a functioning power outlet. Check the circuit breaker or fuse box to rule out power supply issues. Inspect the power cord for any visible damage.
*   **Door Interlock System:** Microwaves have safety interlocks that prevent operation when the door is open. Ensure the door is fully closed and that there are no obstructions preventing it from latching correctly. Check the door seals for cleanliness and integrity.
*   **Magnetron:** The magnetron generates the microwaves that heat food. If the above steps don't resolve the issue, the magnetron may be faulty. This usually requires professional repair.
*   **High-Voltage Diode and Capacitor:** These components are part of the high-voltage circuit that powers the magnetron. A faulty diode or capacitor can prevent heating. These components should only be inspected and repaired by qualified technicians due to the high voltages involved.
*   **Control Panel Issues:** A malfunctioning control panel might prevent the microwave from starting or heating. This could be due to a software glitch or a hardware problem.

**2. Microwave Not Turning On:**

*   **Power Supply:** As above, check the power outlet, circuit breaker, and power cord.
*   **Control Panel Lock:** Some microwaves have a control panel lock feature to prevent accidental operation. Check the user manual to see if this feature is enabled and how to disable it.
*   **Internal Fuse:** Some microwaves have internal fuses that protect the electronic components. If the microwave is completely unresponsive, an internal fuse may have blown. This requires professional repair.

**3. Arcing or Sparks Inside the Microwave:**

*   **Metal Objects:** Never place metal objects inside a microwave. This can cause arcing and sparks, which can damage the microwave and pose a fire hazard.
*   **Damaged Waveguide Cover:** The waveguide cover protects the magnetron from food splatters. If this cover is damaged, it can cause arcing. Replace the damaged cover.
*   **Food Debris:** Food debris inside the microwave can also cause arcing. Clean the interior thoroughly.

**4. Turntable Not Rotating:**

*   **Turntable Track:** Ensure the turntable is properly placed on the track. Check for any obstructions that might be preventing it from rotating.
*   **Turntable Motor:** If the turntable is properly placed and there are no obstructions, the turntable motor may be faulty.

**5. Error Codes Displayed:**

*   **Consult the User Manual:** When an error code is displayed, consult the user manual. It will provide a description of the error and suggested troubleshooting steps.
*   **Online Support:** Samsung's online support pages often have a database of error codes and their meanings.

**Utilizing Online Resources Effectively**

When using Samsung's online support resources, having the model number of your microwave is crucial. This information is usually found on a sticker inside the microwave door or on the back of the unit. With the model number, you can access specific information for your microwave, including:

*   **Downloadable User Manuals:** If you've misplaced your physical manual, you can download a digital copy from the website.
*   **Troubleshooting Guides:** These guides provide step-by-step instructions for resolving common problems.
*   **FAQs:** Frequently asked questions can often provide quick answers to common queries.
*   **How-to Videos:** Visual guides can be particularly helpful for tasks like cleaning or replacing parts.

**Contacting Samsung Support**

If the available documentation doesn't resolve the issue, contacting Samsung support is the next step. They can provide further assistance and, if necessary, arrange for repairs. When contacting support, be prepared to provide the following information:

*   **Model Number:** This allows the support representative to access specific information about your microwave.
*   **Serial Number:** This helps track your microwave's history.
*   **Description of the Problem:** Be as specific as possible when describing the issue you are experiencing.

By utilizing Samsung's comprehensive support documentation and following the troubleshooting steps outlined above, users can often resolve common microwave issues quickly and efficiently, minimizing downtime and maximizing the lifespan of their appliance. Remember to always prioritize safety and consult a qualified technician for any repairs involving high-voltage components or internal mechanisms.
"""

In [37]:
import google.generativeai as genai
content = generated_text

# apikey=userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=apikey)

model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content(f"generate question-answer pairs from content {content}")
print(response.text)

## Samsung Microwave Support: Question-Answer Pairs

Here are some question-answer pairs based on the provided text:

**General Support:**

* **Q: Where can I find support documentation for my Samsung microwave?**
* **A:** You can find support documentation in your user manual, on Samsung's online support pages (specific to your model number), and in Samsung community forums.

* **Q: Why is it important to consult support documentation before contacting customer support?**
* **A:** Consulting the documentation often resolves common issues quickly and efficiently, saving you time and potentially avoiding the need for a service call.

* **Q: What information is crucial when using Samsung's online support resources?**
* **A:**  Your microwave's model number is crucial to access specific information and troubleshooting guides.


**Troubleshooting Specific Issues:**

* **Q: My Samsung microwave isn't heating. What should I check?**
* **A:** First, check the power supply (outlet, circuit bre

In [38]:
import google.generativeai as genai

# apikey=userdata.get('GOOGLE_API_KEY')  # You'll need to set your API key
# genai.configure(api_key=apikey)

content = generated_text

def generate_qa_pairs(text):
    """
    Generates question-answer pairs from a given text using a generative AI model.

    This method utilizes Google's Gemini model to extract potential questions and their corresponding
    answers directly from the provided text content. The goal is to create a set of informative
    Q&A pairs that accurately reflect the information within the text.

    Args:
        text: The input string containing the content from which to generate questions and answers.

    Returns:
        str: The generated question-answer pairs formatted as a string.
    """
    model = genai.GenerativeModel('gemini-1.5-flash')
    prompt = f"Generate question-answer pairs based on the following text:\n\n{text}"
    response = model.generate_content(prompt)
    return response.text

def generate_structured_qa(text):
    """
    Generates more structured and detailed question-answer pairs from a given text.

    This method aims to create question-answer pairs that are more focused and insightful.
    It instructs the AI model to specifically identify problems and their corresponding solutions
    within the text, presenting them in a clear and structured format. This approach leads to
    more targeted and helpful Q&A compared to a general extraction.

    Args:
        text: The input string containing the content to analyze.

    Returns:
        str: The generated question-answer pairs, often including more detailed answers and
             potential follow-up information, formatted as a string.
    """
    model = genai.GenerativeModel('gemini-1.5-flash')
    prompt = f"""
    Based on the following text, generate question-answer pairs. Focus on identifying
    potential problems and the corresponding solutions mentioned in the text. Format
    the output with clear questions and answers.

    Text:
    {text}
    """
    response = model.generate_content(prompt)
    return response.text

def generate_step_by_step_qa(text):
    """
    Generates question-answer pairs focusing on extracting procedural steps from the text.

    This method is designed to identify and present sequential steps outlined in the input text.
    It prompts the AI model to understand the order of operations and frame questions around
    these steps. This is particularly useful for extracting instructions or troubleshooting
    procedures from technical documentation.

    Args:
        text: The input string containing the content with sequential steps.

    Returns:
        str: The generated question-answer pairs focusing on steps, formatted as a string.
    """
    model = genai.GenerativeModel('gemini-1.5-flash')
    prompt = f"""
    From the following text, extract the step-by-step instructions for fixing a
    heating issue in a Samsung microwave. Generate question-answer pairs where
    the question asks what the next step is at a particular point, or what a specific
    step involves.

    Text:
    {text}
    """
    response = model.generate_content(prompt)
    return response.text

# Example Usage:
print("--- Simple Question-Answer Pairs ---")
simple_qa = generate_qa_pairs(content)
print(simple_qa)

print("\n--- Structured Question-Answer Pairs ---")
structured_qa = generate_structured_qa(content)
print(structured_qa)

print("\n--- Step-by-Step Question-Answer Pairs ---")
step_qa = generate_step_by_step_qa(content)
print(step_qa)

--- Simple Question-Answer Pairs ---
## Samsung Microwave Support: Question-Answer Pairs

**I. Understanding Support Documentation:**

**Q1: What are the primary resources Samsung provides for microwave support?**
**A1:** Samsung offers user manuals, online support pages (including FAQs, troubleshooting guides, how-to videos, and software updates), and community forums.

**Q2: Why is it important to consult support documentation before contacting customer support?**
**A2:** Consulting documentation often resolves common issues quickly and efficiently, saving time and potentially avoiding the need for a service call.

**II. Common Microwave Issues and Troubleshooting:**

**Q3: My Samsung microwave isn't heating. What are some possible causes?**
**A3:**  Possible causes include power supply issues (outlet, circuit breaker, cord damage), a faulty door interlock system, a malfunctioning magnetron, problems with the high-voltage diode or capacitor, or control panel issues.

**Q4: My microwa

In [39]:
import google.generativeai as genai

# Configure API key for Google Generative AI
# apikey=userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=apikey)

# Content to generate question-answer pairs from
content=generated_text
# Specify the number of QA pairs you want
num_pairs = 3

# Create the GenerativeModel instance
model = genai.GenerativeModel('gemini-1.5-flash')

# Generate content with the specified prompt
response = model.generate_content(
    f"Generate {num_pairs} question-answer pairs from the following content. Provide the output in the form Q: question text\n A: answer text\n\nContent: {content}"
)

# Print the generated content after removing Markdown formatting
output = response.text.replace('*', '').replace('#', '')
print(output)


Q: What are the key resources provided by Samsung to support its microwave users?
A: Samsung offers user manuals, online support pages (including FAQs, troubleshooting guides, how-to videos, and software updates), and community forums where users can connect and share solutions.

Q: What are the troubleshooting steps for a Samsung microwave that is not heating?
A: First, check the power supply (outlet, breaker, cord). Then, ensure the door is fully closed and the door seals are clean.  If these steps fail, the magnetron, high-voltage diode, capacitor, or control panel may be faulty, requiring professional repair.

Q: How can I effectively use Samsung's online support resources?
A:  Knowing your microwave's model number is crucial. This allows you to access specific downloadable manuals, troubleshooting guides, FAQs, and how-to videos for your model on the Samsung website.



In [40]:
import google.generativeai as genai
import json

# Configure API key for Google Generative AI (ensure this is set up correctly)
# apikey = userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=apikey)

def generate_qa_pairs(content: str, num_pairs: int = 3, format_output: str = 'plain') -> str | list[dict]:
    """
    Generates question-answer pairs from the given text content using a generative AI model.

    This function leverages Google's Gemini model to analyze the provided text and extract
    meaningful question-answer pairs. It offers flexibility in terms of the number of pairs
    generated and the output format (plain text or JSON).

    Args:
        content: The string containing the text from which to generate question-answer pairs.
        num_pairs: An integer specifying the desired number of question-answer pairs to generate.
                   Defaults to 3.
        format_output: A string indicating the desired output format.
                       - 'plain': Outputs the question-answer pairs in a human-readable plain text format.
                       - 'json': Outputs the question-answer pairs as a JSON array of objects.
                       Defaults to 'plain'.

    Returns:
        str | list[dict]: The generated question-answer pairs. Returns a string if `format_output` is 'plain',
                    and a list of dictionaries if `format_output` is 'json'.

    Raises:
        TypeError: If `content` is not a string or `num_pairs` is not an integer.
        ValueError: If `num_pairs` is not a positive integer or `format_output` is not 'plain' or 'json'.
        RuntimeError: If an error occurs during the interaction with the generative AI model.

    Example Usage:
        # Generate 5 QA pairs in plain text format
        plain_output = generate_qa_pairs("The capital of France is Paris.", num_pairs=5, format_output='plain')
        print(plain_output)

        # Generate 2 QA pairs in JSON format
        json_output = generate_qa_pairs("The Earth revolves around the Sun.", num_pairs=2, format_output='json')
        print(json_output)
    """

    # Input validation
    if not isinstance(content, str):
        raise TypeError("content must be a string.")
    if not isinstance(num_pairs, int):
        raise TypeError("num_pairs must be an integer.")
    if num_pairs <= 0:
        raise ValueError("num_pairs must be a positive integer.")
    if format_output not in ['plain', 'json']:
        raise ValueError("format_output must be either 'plain' or 'json'.")

    try:
        # Initialize the generative model
        model = genai.GenerativeModel('gemini-1.5-flash')

        # Construct the prompt for plain text output
        prompt = f"Generate {num_pairs} question-answer pairs from the following content.\n\nContent:\n{content}\n\nFormat the output as follows:\nQ: [question]\nA: [answer]"

        response = model.generate_content(prompt)
        output_text = response.text.replace('*', '').replace('#', '')  # Remove potential markdown artifacts

        if format_output == 'plain':
            return output_text
        elif format_output == 'json':
            # Split the plain text into lines
            lines = output_text.strip().split('\n')
            qa_pairs = []
            i = 0
            while i < len(lines):
                if lines[i].startswith('Q:'):
                    question = lines[i][2:].strip()
                    if i + 1 < len(lines) and lines[i + 1].startswith('A:'):
                        answer = lines[i + 1][2:].strip()
                        qa_pairs.append({"question": question, "answer": answer})
                        i += 2
                    else:
                        # No corresponding answer found
                        i += 1
                else:
                    i += 1
            return qa_pairs

    except Exception as e:
        raise RuntimeError(f"An error occurred during QA pair generation: {e}")

# Example usage with explanations
content = generated_text

# Generate QA pairs in plain text format
print("--- Generating QA pairs in plain text format ---")
plain_qa_pairs = generate_qa_pairs(content, num_pairs=3, format_output='plain')
print(plain_qa_pairs)
# Explanation:
# The 'generate_qa_pairs' function is called with the 'content', requesting 4 QA pairs,
# and specifying 'plain' as the output format. The function will return a string
# where each question and answer are presented in a human-readable format.

# Generate QA pairs in JSON format
print("\n--- Generating QA pairs in JSON format ---")
json_qa_pairs = generate_qa_pairs(content, num_pairs=5, format_output='json')
# print(json_qa_pairs)
# Assuming json_qa_pairs is your list of dictionaries
# Convert the list of dictionaries to a formatted JSON string
formatted_json = json.dumps(json_qa_pairs, indent=4)

# Print the formatted JSON string
print(formatted_json)
# Explanation:
# Here, the function is called again with the same 'content', but this time requesting 2 QA pairs
# and setting 'format_output' to 'json'. The function generates plain text QA pairs and then
# converts them into a list of dictionaries in JSON format.

--- Generating QA pairs in plain text format ---
Q: What are the primary resources Samsung provides for microwave troubleshooting?
A: Samsung offers user manuals, online support pages (including FAQs, troubleshooting guides, and how-to videos), and community forums for users to find solutions and connect with others experiencing similar issues.

Q: My Samsung microwave isn't heating. What are some initial troubleshooting steps I should take?
A: First, check the power supply (outlet, circuit breaker, power cord).  Then, ensure the microwave door is fully closed and the door seals are clean and intact.  If these checks don't resolve the problem, the magnetron or other high-voltage components may be faulty, requiring professional repair.

Q:  What information is crucial when seeking online support from Samsung for my microwave?
A:  You need your microwave's model number and serial number.  Providing a clear and detailed description of the problem is also essential for efficient support.



In [41]:
import google.generativeai as genai
import json
import re

def clean_json_response(response_text: str) -> str:
    """
    Clean and extract valid JSON from the model's response text.

    Args:
        response_text: Raw response text from the model

    Returns:
        Cleaned JSON string
    """
    # Find JSON array pattern (anything between [ and ])
    json_match = re.search(r'\[[\s\S]*\]', response_text)
    if json_match:
        json_str = json_match.group(0)
        # Remove any markdown code block markers
        json_str = re.sub(r'```json|```', '', json_str)
        # Clean up any remaining whitespace
        json_str = json_str.strip()
        return json_str
    return response_text

def generate_qa_pairs(content: str, num_pairs: int = 3, format_output: str = 'plain') -> str | list[dict]:
    """
    Generates question-answer pairs from the given text content using a generative AI model.

    Args:
        content: The string containing the text from which to generate question-answer pairs.
        num_pairs: An integer specifying the desired number of question-answer pairs to generate.
                   Defaults to 3.
        format_output: Output format ('plain' or 'json'). Defaults to 'plain'.

    Returns:
        str | list[dict]: The generated question-answer pairs.

    Raises:
        TypeError: If input types are invalid
        ValueError: If input values are invalid
        RuntimeError: If an error occurs during generation
    """
    # Input validation
    if not isinstance(content, str):
        raise TypeError("content must be a string.")
    if not isinstance(num_pairs, int):
        raise TypeError("num_pairs must be an integer.")
    if num_pairs <= 0:
        raise ValueError("num_pairs must be a positive integer.")
    if format_output not in ['plain', 'json']:
        raise ValueError("format_output must be either 'plain' or 'json'.")

    try:
        # Initialize the generative model
        model = genai.GenerativeModel('gemini-1.5-flash')

        if format_output == 'plain':
            prompt = f"""Generate {num_pairs} question-answer pairs from the following content.
            Format each pair exactly as shown:
            Q: [question]
            A: [answer]

            Content:
            {content}"""

            response = model.generate_content(prompt)
            output_text = response.text.strip()
            # Clean up potential markdown artifacts while preserving Q: and A: format
            output_text = re.sub(r'[*#`]', '', output_text)
            return output_text

        else:  # json format
            prompt = f"""Generate exactly {num_pairs} question-answer pairs from this content and return them in valid JSON format.
            The response must be a JSON array containing objects with 'question' and 'answer' keys.
            Do not include any other text, explanations, or markdown formatting.

            Content:
            {content}

            Required JSON structure:
            [
              {{
                "question": "Sample question?",
                "answer": "Sample answer."
              }}
            ]"""

            max_retries = 3
            for attempt in range(max_retries):
                try:
                    response = model.generate_content(prompt)

                    if not response.text or not response.text.strip():
                        if attempt == max_retries - 1:
                            return []
                        continue

                    # Clean and extract JSON from response
                    cleaned_json = clean_json_response(response.text)

                    # Parse and validate JSON structure
                    qa_json = json.loads(cleaned_json)
                    if not isinstance(qa_json, list):
                        raise ValueError("Expected JSON array")

                    # Validate each QA pair
                    validated_pairs = []
                    for item in qa_json:
                        if isinstance(item, dict) and "question" in item and "answer" in item:
                            validated_pairs.append({
                                "question": str(item["question"]).strip(),
                                "answer": str(item["answer"]).strip()
                            })

                    # If we got valid pairs, return them
                    if validated_pairs:
                        return validated_pairs[:num_pairs]  # Ensure we don't return more than requested

                except (json.JSONDecodeError, ValueError) as e:
                    if attempt == max_retries - 1:
                        print(f"Failed to generate valid JSON after {max_retries} attempts. Last error: {e}")
                        print(f"Last response received:\n{response.text}")
                        return []
                    continue

    except Exception as e:
        raise RuntimeError(f"An error occurred during QA pair generation: {e}")


# Example usage
# content = """
# Title: Fixing Heating Issues in Samsung Microwave Ovens
# [your content here]
# """

# Generate plain text QA pairs
plain_qa = generate_qa_pairs(content, num_pairs=3, format_output='plain')
print(plain_qa)

# Generate JSON QA pairs
json_qa = generate_qa_pairs(content, num_pairs=4, format_output='json')
print(json.dumps(json_qa, indent=2))

Q: What are the key resources available within Samsung's microwave support documentation?
A: Samsung's support resources include user manuals, online support pages (with FAQs, troubleshooting guides, how-to videos, and software updates), and community forums where users can share experiences and find solutions.

Q: What are some common problems with Samsung microwaves and their basic troubleshooting steps?
A: Common issues include the microwave not heating (check power supply, door interlock, magnetron, diode, capacitor, and control panel), not turning on (check power, control panel lock, and internal fuse), arcing/sparks (check for metal objects, damaged waveguide cover, and food debris), turntable not rotating (check track and motor), and error codes (consult the user manual or online support).

Q:  What information is crucial when using Samsung's online support resources or contacting customer support?
A:  Having your microwave's model number and serial number is crucial for accessi

In [10]:
!mkdir /content/results
!wget https://github.com/navneetkrc/Open_LLM_Apps/raw/main/sample%20txt%20files/Agentic%20rag%20Survey%20paper.pdf

--2025-01-18 19:49:06--  https://github.com/navneetkrc/Open_LLM_Apps/raw/main/sample%20txt%20files/Agentic%20rag%20Survey%20paper.pdf
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/navneetkrc/Open_LLM_Apps/main/sample%20txt%20files/Agentic%20rag%20Survey%20paper.pdf [following]
--2025-01-18 19:49:06--  https://raw.githubusercontent.com/navneetkrc/Open_LLM_Apps/main/sample%20txt%20files/Agentic%20rag%20Survey%20paper.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14826884 (14M) [application/octet-stream]
Saving to: ‘Agentic rag Survey paper.pdf’


2025-01-18 19:49:07 (125 MB/s) - ‘Agentic r

In [22]:
#generated qa pairs saved in text file
import google.generativeai as genai
import fitz  # PyMuPDF for PDF text extraction
import json
import os
import time

## Configure API key for Google Generative AI
# apikey=userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=apikey)

def extract_text_from_pdf(pdf_path):
    """Extract text from a PDF file."""
    text = ""
    try:
        # Open the PDF file
        pdf_document = fitz.open(pdf_path)

        # Extract text from each page
        for page_num in range(len(pdf_document)):
            page = pdf_document.load_page(page_num)
            text += page.get_text()

        pdf_document.close()
        return text
    except Exception as e:
        print(f"Error extracting text from PDF: {e}")
        return ""

def generate_qa_pairs(content, num_pairs=3, format_output='plain', retry_attempts=3, save_to_file=None):
    """
    Generate question-answer pairs from the provided content.

    Parameters:
    - content (str): The text content to generate QA pairs from.
    - num_pairs (int): The number of QA pairs to generate.
    - format_output (str): The format for output ('plain' or 'json').
    - retry_attempts (int): Number of retry attempts in case of API failure.
    - save_to_file (str): Path to file where output will be saved.

    Returns:
    - str or dict: The generated QA pairs in the specified format.
    """
    if not isinstance(num_pairs, int) or num_pairs <= 0:
        raise ValueError("num_pairs must be a positive integer.")

    if format_output not in ['plain', 'json']:
        raise ValueError("format_output must be 'plain' or 'json'.")

    attempts = 0
    while attempts < retry_attempts:
        try:
            # Create the GenerativeModel instance
            model = genai.GenerativeModel('gemini-1.5-flash')

            # Generate content with the specified prompt
            prompt = (f"Generate {num_pairs} question-answer pairs from the following content. "
                      f"Provide the output in the form Q: question text\n A: answer text\n\nContent: {content}")
            response = model.generate_content(prompt)

            # Clean up the generated content
            output = response.text.replace('*', '').replace('#', '')

            # Format the output based on user preference
            if format_output == 'json':
                qa_pairs = [line.strip() for line in output.split('\n') if line.strip()]
                qa_json = [{'question': qa_pairs[i], 'answer': qa_pairs[i+1]} for i in range(0, len(qa_pairs), 2)]
                output = json.dumps(qa_json, indent=2)

            if save_to_file:
                with open(save_to_file, 'w') as file:
                    file.write(output)
                print(f"Output saved to {save_to_file}")

            return output

        except Exception as e:
            attempts += 1
            print(f"Attempt {attempts} failed: {e}")
            if attempts < retry_attempts:
                time.sleep(2)  # Wait before retrying

    return "Failed to generate QA pairs after several attempts."

# Example usage
#path of pdf
pdf_path = r"/content/Agentic rag Survey paper.pdf"
num_pairs = 3 #no of qa pairs we want
format_output = 'plain'  # Change to 'json' for JSON output
save_to_file = r"/content/results/text.txt"  # Specify the file path to save output

# Extract text from PDF
content = extract_text_from_pdf(pdf_path)

# Generate and print QA pairs
if content:
    qa_pairs_output = generate_qa_pairs(content, num_pairs, format_output, save_to_file=save_to_file)
    print(qa_pairs_output)
else:
    print("No content extracted from PDF.")


Output saved to /content/results/text.txt
Q: What is Agentic Retrieval-Augmented Generation (Agentic RAG), and how does it improve upon traditional RAG systems?
A: Agentic RAG integrates autonomous AI agents into the Retrieval-Augmented Generation (RAG) pipeline.  Unlike traditional RAG, which has static workflows, Agentic RAG uses agents that leverage agentic design patterns (reflection, planning, tool use, and multi-agent collaboration) to dynamically manage retrieval strategies, refine contextual understanding, and adapt workflows for complex tasks. This results in unparalleled flexibility, scalability, and context-awareness.


Q: What are the key agentic patterns used in Agentic RAG systems?
A: The four key agentic patterns are Reflection (iteratively evaluating and refining outputs), Planning (decomposing complex tasks into smaller subtasks), Tool Use (interacting with external tools and resources), and Multi-Agent Collaboration (distributing tasks among specialized agents).  Thes

In [24]:
import google.generativeai as genai
import fitz  # PyMuPDF for PDF text extraction
import json
import os
import time

# !wget https://github.com/navneetkrc/Open_LLM_Apps/raw/main/sample%20txt%20files/Agentic%20rag%20Survey%20paper.pdf
## Configure API key for Google Generative AI
# apikey=userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=apikey)

def extract_text_from_pdf(pdf_path):
    """Extract text from a PDF file."""
    text = ""
    try:
        # Open the PDF file
        pdf_document = fitz.open(pdf_path)

        # Extract text from each page
        for page_num in range(len(pdf_document)):
            page = pdf_document.load_page(page_num)
            text += page.get_text()

        pdf_document.close()
        return text
    except Exception as e:
        print(f"Error extracting text from PDF: {e}")
        return ""

def generate_qa_pairs(content, num_pairs=3, format_output='plain', retry_attempts=3, save_to_file=None):
    """
    Generate question-answer pairs from the provided content.

    Parameters:
    - content (str): The text content to generate QA pairs from.
    - num_pairs (int): The number of QA pairs to generate.
    - format_output (str): The format for output ('plain' or 'json').
    - retry_attempts (int): Number of retry attempts in case of API failure.
    - save_to_file (str): Path to file where output will be saved.

    Returns:
    - str or dict: The generated QA pairs in the specified format.
    """
    if not isinstance(num_pairs, int) or num_pairs <= 0:
        raise ValueError("num_pairs must be a positive integer.")

    if format_output not in ['plain', 'json']:
        raise ValueError("format_output must be 'plain' or 'json'.")

    attempts = 0
    while attempts < retry_attempts:
        try:
            # Create the GenerativeModel instance
            model = genai.GenerativeModel('gemini-1.5-flash')

            # Generate content with the specified prompt
            prompt = (f"Generate {num_pairs} question-answer pairs from the following content. "
                      f"Provide the output in the form Q: question text\n A: answer text\n\nContent: {content}")
            response = model.generate_content(prompt)

            # Clean up the generated content
            output = response.text.replace('*', '').replace('#', '')

            # Split the output into questions and answers
            questions = []
            answers = []
            lines = output.split('\n')
            for line in lines:
                if line.startswith('Q:'):
                    questions.append(line.replace('Q:', '').strip())
                elif line.startswith('A:'):
                    answers.append(line.replace('A:', '').strip())

            # Ensure the number of questions and answers matches num_pairs
            if len(questions) > num_pairs:
                questions = questions[:num_pairs]
            if len(answers) > num_pairs:
                answers = answers[:num_pairs]

            # Format the output based on user preference
            if format_output == 'json':
                qa_json = [{'question': questions[i], 'answer': answers[i]} for i in range(len(questions))]
                output = json.dumps(qa_json, indent=2)
            else:  # plain text
                output = "\n".join([f"Q{i+1}: {questions[i]}\nA{i+1}: {answers[i]}" for i in range(len(questions))])

            if save_to_file:
                with open(save_to_file, 'w') as file:
                    file.write(output)
                print(f"Output saved to {save_to_file}")

            return output

        except Exception as e:
            attempts += 1
            print(f"Attempt {attempts} failed: {e}")
            if attempts < retry_attempts:
                time.sleep(2)  # Wait before retrying

    return "Failed to generate QA pairs after several attempts."

# Example usage
pdf_path = r"/content/Agentic rag Survey paper.pdf"
num_pairs = 3
format_output = 'plain'  # Change to 'json' for JSON output
save_to_file = r"/content/results/text.txt"  # Specify the file path to save output

# Extract text from PDF
content = extract_text_from_pdf(pdf_path)

# Generate and print QA pairs
if content:
    qa_pairs_output = generate_qa_pairs(content, num_pairs, format_output, save_to_file=save_to_file)
    print(qa_pairs_output)
else:
    print("No content extracted from PDF.")


Output saved to /content/results/text.txt
Q1: What are the limitations of Large Language Models (LLMs) that led to the development of Retrieval-Augmented Generation (RAG)?
A1: LLMs rely on static training data, resulting in outdated information, hallucinated responses, and an inability to adapt to dynamic, real-world scenarios.  This limitation necessitates systems that integrate real-time data and dynamically refine responses to maintain accuracy and relevance.
Q2: What is Agentic Retrieval-Augmented Generation (Agentic RAG), and how does it improve upon traditional RAG?
A2: Agentic RAG integrates autonomous AI agents into the RAG pipeline. These agents use agentic design patterns (reflection, planning, tool use, multi-agent collaboration) to dynamically manage retrieval strategies, refine contextual understanding, and adapt workflows for complex tasks.  This surpasses traditional RAG's limitations of static workflows and inadaptability to multi-step reasoning.
Q3: What are some key a

In [12]:
import google.generativeai as genai
import fitz  # PyMuPDF
import json
import os
import time
from typing import Union, List, Dict, Optional
from pathlib import Path

class PDFQAGenerator:
    """
    A class to extract text from PDFs and generate question-answer pairs using Google's Gemini model.
    """

    def __init__(self, model_name: str = 'gemini-1.5-flash'):
        """
        Initialize the QA Generator with specified model.

        Args:
            model_name: Name of the Gemini model to use
        """
        self.model = genai.GenerativeModel(model_name)

    def extract_text_from_pdf(self, pdf_path: Union[str, Path]) -> str:
        """
        Extract text content from a PDF file.

        Args:
            pdf_path: Path to the PDF file

        Returns:
            Extracted text from the PDF

        Raises:
            FileNotFoundError: If PDF file doesn't exist
            ValueError: If PDF is empty or corrupted
        """
        pdf_path = Path(pdf_path)
        if not pdf_path.exists():
            raise FileNotFoundError(f"PDF file not found: {pdf_path}")

        try:
            with fitz.open(pdf_path) as pdf:
                if len(pdf) == 0:
                    raise ValueError("PDF file is empty")

                # Extract text from all pages with proper spacing
                text = " ".join(
                    page.get_text().replace('\n', ' ').strip()
                    for page in pdf
                )
                return text.strip()

        except Exception as e:
            raise ValueError(f"Error processing PDF: {str(e)}")

    def _parse_qa_response(self, response_text: str, num_pairs: int) -> List[Dict[str, str]]:
        """
        Parse the model's response into structured QA pairs.

        Args:
            response_text: Raw text response from the model
            num_pairs: Number of desired QA pairs

        Returns:
            List of dictionaries containing questions and answers
        """
        qa_pairs = []
        current_pair = {}

        for line in response_text.split('\n'):
            line = line.strip()
            if not line:
                continue

            if line.startswith('Q:'):
                if current_pair.get('question'):  # Save previous pair
                    qa_pairs.append(current_pair)
                    current_pair = {}
                current_pair['question'] = line[2:].strip()
            elif line.startswith('A:') and 'question' in current_pair:
                current_pair['answer'] = line[2:].strip()

        # Add the last pair if complete
        if len(current_pair) == 2:
            qa_pairs.append(current_pair)

        return qa_pairs[:num_pairs]  # Ensure we don't exceed requested pairs

    def generate_qa_pairs(
        self,
        content: str,
        num_pairs: int = 3,
        format_output: str = 'plain',
        retry_attempts: int = 3,
        save_path: Optional[Union[str, Path]] = None
    ) -> Union[str, List[Dict[str, str]]]:
        """
        Generate question-answer pairs from text content.

        Args:
            content: Source text to generate QA pairs from
            num_pairs: Number of QA pairs to generate
            format_output: Output format ('plain' or 'json')
            retry_attempts: Number of retries on failure
            save_path: Optional path to save results

        Returns:
            Generated QA pairs in requested format

        Raises:
            ValueError: For invalid arguments
            RuntimeError: For generation failures
        """
        if not content.strip():
            raise ValueError("Content cannot be empty")
        if not isinstance(num_pairs, int) or num_pairs < 1:
            raise ValueError("num_pairs must be a positive integer")
        if format_output not in {'plain', 'json'}:
            raise ValueError("format_output must be 'plain' or 'json'")

        prompt = f"""Generate {num_pairs} question-answer pairs from this content.
        Make questions diverse and cover different aspects of the content.
        Format each pair exactly as:
        Q: [Question]
        A: [Answer]

        Content: {content[:8000]}  # Limit content length for API
        """

        for attempt in range(retry_attempts):
            try:
                response = self.model.generate_content(prompt)
                qa_pairs = self._parse_qa_response(response.text, num_pairs)

                if not qa_pairs:
                    continue  # Retry if no valid pairs were parsed

                # Format output
                if format_output == 'json':
                    output = qa_pairs
                else:
                    output = '\n\n'.join(
                        f"Q: {pair['question']}\nA: {pair['answer']}"
                        for pair in qa_pairs
                    )

                # Save if requested
                if save_path:
                    save_path = Path(save_path)
                    save_path.parent.mkdir(parents=True, exist_ok=True)
                    with save_path.open('w', encoding='utf-8') as f:
                        if format_output == 'json':
                            json.dump(output, f, indent=2)
                        else:
                            f.write(output)

                return output

            except Exception as e:
                if attempt == retry_attempts - 1:
                    raise RuntimeError(f"Failed to generate QA pairs: {str(e)}")
                time.sleep(2)  # Wait before retry

def main():
    """Example usage of the PDFQAGenerator."""
    # Initialize generator
    generator = PDFQAGenerator()

    # Example configuration
    pdf_path = "/content/Agentic rag Survey paper.pdf"
    output_path = "qa_output.txt"

    try:
        # Extract text and generate QA pairs
        content = generator.extract_text_from_pdf(pdf_path)
        qa_pairs = generator.generate_qa_pairs(
            content=content,
            num_pairs=5,
            format_output='json',
            save_path=output_path
        )
        print(json.dumps(qa_pairs, indent=2))

    except Exception as e:
        print(f"Error: {str(e)}")

if __name__ == "__main__":
    main()

[
  {
    "question": "What is the main problem that Agentic RAG solves, and how does it address it?",
    "answer": "Agentic RAG solves the problem of Large Language Models (LLMs) relying on static training data, leading to outdated or inaccurate responses to dynamic, real-time queries.  It addresses this by integrating autonomous AI agents into the Retrieval-Augmented Generation (RAG) pipeline. These agents dynamically manage retrieval strategies, refine contextual understanding, and adapt workflows to handle complex tasks, resulting in more accurate and up-to-date responses."
  },
  {
    "question": "What are the key components of a traditional Retrieval-Augmented Generation (RAG) system?",
    "answer": "Traditional RAG systems have three core components: Retrieval (querying external data sources), Augmentation (processing and summarizing retrieved data), and Generation (combining retrieved information with LLM generation)."
  },
  {
    "question": "How does Agentic RAG improve u

In [13]:
# Initialize the generator
generator = PDFQAGenerator()

# Generate QA pairs from PDF
try:
    # Extract text
    content = generator.extract_text_from_pdf("/content/Agentic rag Survey paper.pdf")

    # Generate QA pairs
    qa_pairs = generator.generate_qa_pairs(
        content=content,
        num_pairs=5,
        format_output='json',
        save_path='output.json'
    )
    print(json.dumps(qa_pairs, indent=2))
except Exception as e:
    print(f"Error: {str(e)}")

[
  {
    "question": "What is the main limitation of Large Language Models (LLMs) addressed by Retrieval-Augmented Generation (RAG)?",
    "answer": "LLMs rely on static training data, resulting in outdated information, hallucinated responses, and an inability to adapt to dynamic, real-world scenarios. RAG addresses this by integrating real-time data retrieval."
  },
  {
    "question": "How does Agentic RAG improve upon traditional RAG systems?",
    "answer": "Agentic RAG embeds autonomous AI agents into the RAG pipeline, enabling dynamic retrieval strategies, iterative refinement of contextual understanding, and adaptable workflows for complex tasks.  Traditional RAG systems have static workflows and lack the adaptability for multi-step reasoning."
  },
  {
    "question": "What are some key applications of Agentic RAG mentioned in the survey?",
    "answer": "The survey highlights applications in healthcare, finance, and education."
  },
  {
    "question": "What are the core comp

In [26]:
#separate qa pairs
import google.generativeai as genai
import fitz  # PyMuPDF for PDF text extraction
import json
import os
import time

## Configure API key for Google Generative AI
# apikey=userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=apikey)

def extract_text_from_pdf(pdf_path):
    """Extract text from a PDF file."""
    text = ""
    try:
        # Open the PDF file
        pdf_document = fitz.open(pdf_path)

        # Extract text from each page
        for page_num in range(len(pdf_document)):
            page = pdf_document.load_page(page_num)
            text += page.get_text()

        pdf_document.close()
        return text
    except Exception as e:
        print(f"Error extracting text from PDF: {e}")
        return ""

def generate_qa_pairs(content, num_pairs=3, format_output='plain', retry_attempts=3, save_to_file=None):
    """
    Generate question-answer pairs from the provided content.

    Parameters:
    - content (str): The text content to generate QA pairs from.
    - num_pairs (int): The number of QA pairs to generate.
    - format_output (str): The format for output ('plain' or 'json').
    - retry_attempts (int): Number of retry attempts in case of API failure.
    - save_to_file (str): Path to file where output will be saved.

    Returns:
    - str or dict: The generated QA pairs in the specified format.
    """
    if not isinstance(num_pairs, int) or num_pairs <= 0:
        raise ValueError("num_pairs must be a positive integer.")

    if format_output not in ['plain', 'json']:
        raise ValueError("format_output must be 'plain' or 'json'.")

    attempts = 0
    while attempts < retry_attempts:
        try:
            # Create the GenerativeModel instance
            model = genai.GenerativeModel('gemini-1.5-flash')

            # Generate content with the specified prompt
            prompt = (f"Generate {num_pairs} question-answer pairs from the following content. "
                      f"Provide all questions first and then all answers, separated by new lines. "
                      f"Output in the form Q: question text\nA: answer text\n\nContent: {content}")
            response = model.generate_content(prompt)

            # Clean up the generated content
            output = response.text.replace('*', '').replace('#', '')

            # Split the output into questions and answers
            lines = output.split('\n')
            questions = [line.replace('Q:', '').strip() for line in lines if line.startswith('Q:')]
            answers = [line.replace('A:', '').strip() for line in lines if line.startswith('A:')]

            # Ensure the number of questions and answers matches num_pairs
            if len(questions) > num_pairs:
                questions = questions[:num_pairs]
            if len(answers) > num_pairs:
                answers = answers[:num_pairs]

            # Format the output based on user preference
            if format_output == 'json':
                qa_json = [{'question': questions[i], 'answer': answers[i]} for i in range(len(questions))]
                output = json.dumps(qa_json, indent=2)
            else:  # plain text
                qa_text = "\n".join([f"Q{i+1}: {questions[i]}" for i in range(len(questions))]) + "\n\n"
                qa_text += "\n".join([f"A{i+1}: {answers[i]}" for i in range(len(answers))])
                output = qa_text

            if save_to_file:
                with open(save_to_file, 'w') as file:
                    file.write(output)
                print(f"Output saved to {save_to_file}")

            return output

        except Exception as e:
            attempts += 1
            print(f"Attempt {attempts} failed: {e}")
            if attempts < retry_attempts:
                time.sleep(2)  # Wait before retrying

    return "Failed to generate QA pairs after several attempts."

# Example usage
pdf_path = r"/content/Agentic rag Survey paper.pdf"
num_pairs = 5
format_output = 'json'  # Change to 'json' for JSON output
save_to_file = r"/content/results/Results_text_with_separate_QA.txt"  # Specify the file path to save output

# Extract text from PDF
content = extract_text_from_pdf(pdf_path)

# Generate and print QA pairs
if content:
    qa_pairs_output = generate_qa_pairs(content, num_pairs, format_output, save_to_file=save_to_file)
    print(qa_pairs_output)
else:
    print("No content extracted from PDF.")


Output saved to /content/results/Results_text_with_separate_QA.txt
[
  {
    "question": "What is Retrieval-Augmented Generation (RAG), and what are its limitations?",
    "answer": "Retrieval-Augmented Generation (RAG) combines the generative capabilities of LLMs with real-time data retrieval to provide contextually relevant and up-to-date responses.  However, traditional RAG systems are limited by static workflows, a lack of adaptability for multi-step reasoning, and complex task management."
  },
  {
    "question": "How does Agentic RAG address the limitations of traditional RAG systems?",
    "answer": "Agentic RAG overcomes these limitations by embedding autonomous AI agents into the RAG pipeline. These agents use agentic design patterns (reflection, planning, tool use, and multi-agent collaboration) to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows to meet complex task requirements."
  },
  {
    "question": "What are th

In [42]:
import google.generativeai as genai
import fitz
import json
import time
from pathlib import Path
from typing import Union, List, Dict, Optional, Tuple

class QAGenerator:
    """Generates separate questions and answers from PDF content using Gemini model."""

    def __init__(self, model_name: str = 'gemini-1.5-flash'):
        self.model = genai.GenerativeModel(model_name)

    def extract_pdf_text(self, pdf_path: Union[str, Path]) -> str:
        """
        Extracts text from PDF file.

        Args:
            pdf_path: Path to PDF file

        Returns:
            Extracted text content
        """
        try:
            with fitz.open(pdf_path) as doc:
                return " ".join(page.get_text() for page in doc)
        except Exception as e:
            raise IOError(f"PDF extraction failed: {e}")

    def _split_qa(self, text: str) -> Tuple[List[str], List[str]]:
        """
        Splits response into separate questions and answers.

        Args:
            text: Raw response text

        Returns:
            Tuple of question and answer lists
        """
        questions = []
        answers = []

        for line in text.splitlines():
            line = line.strip()
            if line.startswith('Q:'):
                questions.append(line[2:].strip())
            elif line.startswith('A:'):
                answers.append(line[2:].strip())

        return questions, answers

    def generate_qa(
        self,
        content: str,
        num_pairs: int = 3,
        format_output: str = 'plain',
        max_retries: int = 3,
        output_file: Optional[Union[str, Path]] = None
    ) -> Union[str, List[Dict[str, str]]]:
        """
        Generates separated questions and answers from content.

        Args:
            content: Source text
            num_pairs: Number of QA pairs to generate
            format_output: Output format ('plain'/'json')
            max_retries: Maximum retry attempts
            output_file: Optional save path

        Returns:
            QA pairs in requested format
        """
        prompt = f"""Generate {num_pairs} question-answer pairs from this content.
        List all questions first, then list all answers in matching order.
        Format as:
        Q: [Question text]
        Q: [Question text]
        ...
        A: [Answer text]
        A: [Answer text]
        ...

        Content: {content[:8000]}"""  # Limit content length

        for attempt in range(max_retries):
            try:
                # Generate and parse response
                response = self.model.generate_content(prompt)
                questions, answers = self._split_qa(response.text)

                # Validate and trim to requested number
                if len(questions) >= num_pairs and len(answers) >= num_pairs:
                    questions = questions[:num_pairs]
                    answers = answers[:num_pairs]

                    # Format output
                    if format_output == 'json':
                        output = [
                            {'question': q, 'answer': a}
                            for q, a in zip(questions, answers)
                        ]
                    else:
                        q_text = "\n".join(f"Q{i+1}: {q}" for i, q in enumerate(questions))
                        a_text = "\n".join(f"A{i+1}: {a}" for i, a in enumerate(answers))
                        output = f"{q_text}\n\n{a_text}"

                    # Save if requested
                    if output_file:
                        Path(output_file).write_text(
                            json.dumps(output, indent=2) if format_output == 'json' else output
                        )

                    return output

            except Exception as e:
                if attempt == max_retries - 1:
                    raise RuntimeError(f"QA generation failed: {e}")
                time.sleep(2)  # Wait before retry

def main():
    """
    Example usage of QAGenerator with formatted output handling.
    Automatically detects JSON format and prints it with proper indentation.
    """
    generator = QAGenerator()

    try:
        # Extract text from PDF
        content = generator.extract_pdf_text("/content/Agentic rag Survey paper.pdf")

        # Generate QA pairs
        result = generator.generate_qa(
            content=content,
            num_pairs=5,
            format_output='json',
            output_file='qa_output.txt'
        )

        # Format output based on type
        if isinstance(result, (list, dict)):  # JSON output
            print(json.dumps(result, indent=4, ensure_ascii=False))
        else:  # Plain text output
            print(result)

    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

[
    {
        "question": "What are the key resources within Samsung's microwave support documentation?",
        "answer": "Samsung's support resources include user manuals, online support pages with FAQs, troubleshooting guides, how-to videos, and software updates;  as well as community forums."
    },
    {
        "question": "What are the troubleshooting steps for a Samsung microwave that is not heating?",
        "answer": "Troubleshooting steps for a microwave not heating include checking the power supply, the door interlock system, and potentially the magnetron, high-voltage diode, and capacitor (requiring professional help for the latter three)."
    },
    {
        "question": "What are some reasons why a Samsung microwave might not turn on?",
        "answer": "A Samsung microwave might not turn on due to power supply issues, an enabled control panel lock, or a blown internal fuse (requiring professional repair)."
    },
    {
        "question": "How can arcing or sparks

In [44]:
# Initialize generator
qa_gen = QAGenerator()

# Generate QA pairs
try:
    content = qa_gen.extract_pdf_text("/content/Agentic rag Survey paper.pdf")
    result = qa_gen.generate_qa(
        content=content,
        num_pairs=3,
        format_output='json',
        output_file='output.txt'
    )
        # Format output based on type
    if isinstance(result, (list, dict)):  # JSON output
            print(json.dumps(result, indent=4, ensure_ascii=False))
    else:  # Plain text output
            print(result)

except Exception as e:
        print(f"Error: {e}")

[
    {
        "question": "What is the main limitation of Large Language Models (LLMs) according to the provided text?",
        "answer": "LLMs rely on static training data, resulting in outdated information, hallucinated responses, and an inability to adapt to dynamic, real-world scenarios."
    },
    {
        "question": "What is Retrieval-Augmented Generation (RAG), and how does it address the limitations of LLMs?",
        "answer": "RAG combines the generative capabilities of LLMs with real-time data retrieval from external sources, enhancing the relevance and timeliness of responses to address the limitations of LLMs."
    },
    {
        "question": "What is Agentic RAG, and how does it improve upon traditional RAG systems?",
        "answer": "Agentic RAG integrates autonomous AI agents into the RAG pipeline, enabling dynamic retrieval strategies, contextual understanding, and iterative refinement for improved flexibility, scalability, and context-awareness compared to tr

In [17]:
# Initialize generator
qa_gen = QAGenerator()

# Generate QA pairs
try:
    content = qa_gen.extract_pdf_text("/content/Agentic rag Survey paper.pdf")
    result = qa_gen.generate_qa(
        content=content,
        num_pairs=3,
        format_output='plain',
        output_file='output.txt'
    )
        # Format output based on type
    if isinstance(result, (list, dict)):  # JSON output
            print(json.dumps(result, indent=4, ensure_ascii=False))
    else:  # Plain text output
            print(result)

except Exception as e:
        print(f"Error: {e}")

Q1: What is the main limitation of Large Language Models (LLMs) according to the provided text?
Q2: What is Retrieval-Augmented Generation (RAG), and how does it address the limitations of LLMs?
Q3: What is Agentic Retrieval-Augmented Generation (Agentic RAG), and how does it improve upon traditional RAG?

A1: LLMs rely on static training data, resulting in outdated information, hallucinated responses, and an inability to adapt to dynamic, real-world scenarios.
A2: RAG combines the generative capabilities of LLMs with real-time data retrieval from external sources, enhancing the relevance and timeliness of responses.  It bridges the gap between static training data and dynamic applications.
A3: Agentic RAG integrates autonomous AI agents into the RAG pipeline, enabling dynamic retrieval strategies, contextual understanding, and iterative refinement. It uses agentic design patterns (reflection, planning, tool use, multi-agent collaboration) for more adaptability and efficiency than trad

In [45]:
#generate QA pairs from text input
import google.generativeai as genai
import json
import time

## Configure API key for Google Generative AI
# apikey=userdata.get('GOOGLE_API_KEY')
# genai.configure(api_key=apikey)

def generate_qa_pairs(text, num_pairs=3, format_output='plain', retry_attempts=3, save_to_file=None):
    """
    Generate question-answer pairs from a text input.

    Parameters:
    - text (str): The text to generate QA pairs from.
    - num_pairs (int): The number of QA pairs to generate.
    - format_output (str): The format for output ('plain' or 'json').
    - retry_attempts (int): Number of retry attempts in case of API failure.
    - save_to_file (str): Path to file where output will be saved.

    Returns:
    - str or dict: The generated QA pairs in the specified format.
    """
    if not isinstance(num_pairs, int) or num_pairs <= 0:
        raise ValueError("num_pairs must be a positive integer.")

    if format_output not in ['plain', 'json']:
        raise ValueError("format_output must be 'plain' or 'json'.")

    attempts = 0
    while attempts < retry_attempts:
        try:
            # Create the GenerativeModel instance
            model = genai.GenerativeModel('gemini-1.5-flash')

            # Generate content with the specified prompt
            prompt = (f"Generate {num_pairs} question-answer pairs from the following text. "
                      f"Provide all questions first and then all answers, separated by new lines. "
                      f"Output in the form Q: question text\nA: answer text\n\nText: {text}")
            response = model.generate_content(prompt)

            # Clean up the generated content
            output = response.text.replace('*', '').replace('#', '')

            # Split the output into questions and answers
            lines = output.split('\n')
            questions = [line.replace('Q:', '').strip() for line in lines if line.startswith('Q:')]
            answers = [line.replace('A:', '').strip() for line in lines if line.startswith('A:')]

            # Ensure the number of questions and answers matches num_pairs
            if len(questions) > num_pairs:
                questions = questions[:num_pairs]
            if len(answers) > num_pairs:
                answers = answers[:num_pairs]

            # Format the output based on user preference
            if format_output == 'json':
                qa_json = [{'question': questions[i], 'answer': answers[i]} for i in range(len(questions))]
                output = json.dumps(qa_json, indent=2)
            else:  # plain text
                qa_text = "\n".join([f"Q{i+1}: {questions[i]}" for i in range(len(questions))]) + "\n\n"
                qa_text += "\n".join([f"A{i+1}: {answers[i]}" for i in range(len(answers))])
                output = qa_text

            if save_to_file:
                with open(save_to_file, 'w') as file:
                    file.write(output)
                print(f"Output saved to {save_to_file}")

            return output

        except Exception as e:
            attempts += 1
            print(f"Attempt {attempts} failed: {e}")
            if attempts < retry_attempts:
                time.sleep(2)  # Wait before retrying

    return "Failed to generate QA pairs after several attempts."

# Example usage
text_input = generated_text
num_pairs = 3
format_output = 'plain'  # Change to 'json' for JSON output
save_to_file = r"/content/results/qa_output.txt"  # Specify the file path to save output

# Generate and print QA pairs
qa_pairs_output = generate_qa_pairs(text_input, num_pairs, format_output, save_to_file=save_to_file)
print(qa_pairs_output)


Output saved to /content/results/qa_output.txt
Q1: What are the key resources available in Samsung's microwave support documentation?
Q2: What are some common problems encountered with Samsung microwaves and their basic troubleshooting steps?
Q3: What information is crucial when using Samsung's online support resources and contacting Samsung support directly?

A1: Samsung's support resources include user manuals, online support pages with FAQs, troubleshooting guides, how-to videos, and software updates, as well as community forums.
A2: Common problems include the microwave not heating (check power supply, door interlock, magnetron, diode, capacitor, and control panel), not turning on (check power supply, control panel lock, and internal fuse), arcing or sparks (check for metal objects, damaged waveguide cover, and food debris), turntable not rotating (check track and motor), and error codes (consult user manual or online support).
A3: When using online resources or contacting support,

In [46]:
import google.generativeai as genai
import json
import time
from typing import Union, List, Dict, Optional
from pathlib import Path

class TextQAGenerator:
    """Generates question-answer pairs from text using Gemini model."""

    def __init__(self, model_name: str = 'gemini-1.5-flash'):
        """Initialize with specified model."""
        self.model = genai.GenerativeModel(model_name)

    def _extract_qa_pairs(self, text: str) -> tuple[List[str], List[str]]:
        """
        Extract questions and answers from generated text.

        Args:
            text: Raw generated text

        Returns:
            Tuple of (questions, answers) lists
        """
        questions = []
        answers = []

        for line in text.split('\n'):
            line = line.strip()
            if line.startswith('Q:'):
                questions.append(line[2:].strip())
            elif line.startswith('A:'):
                answers.append(line[2:].strip())

        return questions, answers

    def generate_qa(
        self,
        text: str,
        num_pairs: int = 3,
        format_output: str = 'plain',
        max_retries: int = 3,
        save_path: Optional[Union[str, Path]] = None
    ) -> Union[str, List[Dict[str, str]]]:
        """
        Generate QA pairs from input text.

        Args:
            text: Source text
            num_pairs: Number of QA pairs to generate
            format_output: Output format ('plain'/'json')
            max_retries: Maximum retry attempts
            save_path: Optional path to save results

        Returns:
            Generated QA pairs in requested format

        Raises:
            ValueError: For invalid inputs
            RuntimeError: For generation failures
        """
        # Validate inputs
        if not text.strip():
            raise ValueError("Input text cannot be empty")
        if not isinstance(num_pairs, int) or num_pairs < 1:
            raise ValueError("num_pairs must be positive integer")
        if format_output not in {'plain', 'json'}:
            raise ValueError("format_output must be 'plain' or 'json'")

        prompt = f"""Generate {num_pairs} question-answer pairs from this text.
        List all questions first, then all corresponding answers.
        Format as:
        Q: [Question]
        Q: [Question]
        ...
        A: [Answer to first question]
        A: [Answer to second question]
        ...

        Text: {text[:8000]}"""  # Limit text length for API

        for attempt in range(max_retries):
            try:
                # Generate and parse content
                response = self.model.generate_content(prompt)
                questions, answers = self._extract_qa_pairs(response.text)

                # Validate pairs
                if len(questions) >= num_pairs and len(answers) >= num_pairs:
                    questions = questions[:num_pairs]
                    answers = answers[:num_pairs]

                    # Format output
                    if format_output == 'json':
                        output = [
                            {'question': q, 'answer': a}
                            for q, a in zip(questions, answers)
                        ]
                    else:
                        q_section = "\n".join(f"Q{i+1}: {q}" for i, q in enumerate(questions))
                        a_section = "\n".join(f"A{i+1}: {a}" for i, a in enumerate(answers))
                        output = f"{q_section}\n\n{a_section}"

                    # Save if path provided
                    if save_path:
                        save_path = Path(save_path)
                        save_path.parent.mkdir(parents=True, exist_ok=True)
                        save_path.write_text(
                            json.dumps(output, indent=2) if format_output == 'json'
                            else output
                        )

                    return output

            except Exception as e:
                if attempt == max_retries - 1:
                    raise RuntimeError(f"Failed to generate QA pairs: {str(e)}")
                time.sleep(2)  # Wait before retry

def main():
    """Example usage of TextQAGenerator."""
    # Sample text
    text = """Add your sample text here"""

    generator = TextQAGenerator()

    try:
        # Generate QA pairs
        result = generator.generate_qa(
            text=generated_text,
            num_pairs=5,
            format_output='json',
            save_path='qa_output.txt'
        )

        # Print formatted result
        if isinstance(result, (list, dict)):
            print(json.dumps(result, indent=4, ensure_ascii=False))
        else:
            print(result)

    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    main()

[
    {
        "question": "What are the key support resources offered by Samsung for its microwaves?",
        "answer": "Samsung offers user manuals, online support pages (including FAQs, troubleshooting guides, how-to videos, and software updates), and community forums."
    },
    {
        "question": "What are the troubleshooting steps for a Samsung microwave that's not heating?",
        "answer": "Check the power supply, door interlock system, magnetron, high-voltage diode and capacitor, and control panel.  Professional repair may be needed for magnetron, diode, capacitor, or control panel issues."
    },
    {
        "question": "How can I resolve a Samsung microwave that won't turn on?",
        "answer": "Check the power supply, ensure the control panel lock isn't enabled, and check for a blown internal fuse (requiring professional repair)."
    },
    {
        "question": "What are the potential causes of arcing or sparks inside a Samsung microwave?",
        "answer": "