As a busy parent, I'm always looking for ways to nurture my 8-year-old's budding curiosity, especially when it comes to science. Their enthusiastic participation in the North South Junior Science Bee sparked an idea: what if we could create a dynamic and engaging way for them to explore even more scientific concepts, tailored to their interests and learning pace?

That's where the magic of AI comes in! Inspired by the fascinating world of Generative AI and leveraging concepts I picked up from Kaggle's insightful 5-Day Gen AI course, I embarked on a project to build a science quiz platform using a powerful technique called Retrieval-Augmented Generation (RAG).

# The Challenge: Beyond the Bee

The North South Junior Science Bee is fantastic, but learning shouldn't stop there. Finding relevant and engaging science quiz questions across diverse topics for an 8-year-old can be a real time sink. Existing resources can be too generic, too simplistic, or might not align with the specific areas my child is currently exploring. Plus, just getting the answer right isn't always the best way to learn. What truly helps is understanding why the answer is correct and having the opportunity to delve deeper.

# Gen AI to the the Rescue: Creating a Dynamic Learning Tool

This project harnesses the power of several key Gen AI technologies to create a personalized and interactive science learning experience:

Document Understanding: Imagine being able to feed the AI various science materials – maybe interesting articles, textbook snippets, or even notes from the Science Bee itself! The AI can understand the content, identify the core ideas, and pull out the important stuff. In our case, we can even leverage the topics provided by organizations like northsouth.org as our knowledge base.
Embedding: This is where the AI truly starts to "understand" science. By converting the text from our documents into numerical representations called embeddings, the AI can grasp the relationships between different scientific concepts. Think of it like creating a map of knowledge where related ideas are close together. This allows the AI to find connections and identify questions relevant to specific topics.
Retrieval-Augmented Generation (RAG): This is the heart of our quiz generator. When my child wants to test their knowledge on a specific topic (like "the solar system" or "animal habitats"), the AI uses the embeddings to find the most relevant information from the documents we've provided. This retrieved information then becomes the foundation for creating new and targeted quiz questions.
Prompting: Just like guiding a helpful assistant, carefully crafted prompts tell the AI what kind of questions to create (multiple choice, perhaps?), how difficult they should be for an 8-year-old, and – crucially – to generate brief, easy-to-understand explanations for the correct answers. This is key for fostering genuine learning and encouraging further exploration.
# Putting it All Together: An Interactive Science Adventure

Essentially, Gen AI transforms static learning materials into an interactive adventure. Instead of relying on pre-made quizzes, we can dynamically generate questions tailored to my child's specific interests and learning needs. The immediate feedback and concise explanations provide a much richer learning experience, sparking curiosity and encouraging them to ask more questions.

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/science-trivia-topics/2025_SLPGuide_08R_SC_JSC.pdf
/kaggle/input/science-topics/2025_SLPGuide_08R_SC_JSC.pdf


# Setup
First, install required libraries for this project

In [2]:
# Remove conflicting packages from the Kaggle base environment.
!pip uninstall -qqy google-generativeai
!pip install -qU google-genai langchain chromadb pypdf python-dotenv langchain-google-genai>=0.0.3 google-generativeai>=0.3.0 gtts langgraph langchain-community

In [3]:
from google import genai
from google.genai import types

from IPython.display import Markdown

genai.__version__

'1.11.0'

# Set up your API key
To run the following cell, your API key must be stored it in a Kaggle secret named GOOGLE_API_KEY.

If you don't already have an API key, you can grab one from AI Studio. You can find detailed instructions in the docs.

To make the key available through Kaggle secrets, choose Secrets from the Add-ons menu and follow the instructions to add your key or enable it for this notebook.

In [4]:
import os
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

If you received an error response along the lines of No user secrets exist for kernel id ..., then you need to add your API key via Add-ons, Secrets and enable it.

![Screenshot of the checkbox to enable GOOGLE_API_KEY secret](https://storage.googleapis.com/kaggle-media/Images/5gdai_sc_3.png)

## Define Data Types

Define data types for the question and quiz state.

In [5]:
from typing import Dict, List, Any, TypedDict, Optional

class QuestionData(TypedDict):
    question: str
    options: List[str]
    correct_answer: str
    explanation: str
    topic: str

class QuizState(TypedDict):
    score: int
    total_questions: int
    current_question: Optional[QuestionData]
    missed_questions: List[QuestionData]
    status: str

## Initialize Models and Database

In [6]:
from langchain_google_genai import GoogleGenerativeAI
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
import google.generativeai as genai

genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))

# Initialize Gemini model and embeddings
llm = GoogleGenerativeAI(model="gemini-2.0-flash", 
                        google_api_key=os.getenv("GOOGLE_API_KEY"))

# Use Google Generative AI Embeddings
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004",
    google_api_key=os.getenv("GOOGLE_API_KEY")
)

# Create prompt template for generating questions
question_prompt = PromptTemplate(
    input_variables=["context", "topic"],
    template="""
    Based on the following context about {topic}, create a science quiz question for 8 year old.
    
    Context:
    {context}
    
    Format your response as a JSON object with the following structure:
    {{
        "question": "A clear, concise question about the topic",
        "options": ["Option 1", "Option 2", "Option 3", "Option 4"],
        "correct_answer": "The correct option (must be exactly one of the options)",
        "explanation": "A brief explanation of why the answer is correct",
        "topic": "{topic}"
    }}
    
    Make sure the question tests understanding of the concept, not just memorization.
    The options should be distinct and plausible.
    """
)

question_chain = LLMChain(llm=llm, prompt=question_prompt)

# Create prompt template for topic extraction
topic_extraction_prompt = PromptTemplate.from_template(
    "Extract the main science topics from this text as a comma-separated list: {text}"
)

# Create chain for topic extraction
topic_extraction_chain = LLMChain(llm=llm, prompt=topic_extraction_prompt)

  question_chain = LLMChain(llm=llm, prompt=question_prompt)


## Initialize Vector Database

Document Processing: The application loads a science guide PDF and splits it into chunks.

Vector Database: It creates a vector database using Google Gemini embeddings.

In [7]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

# Constants
DB_DIR = "googlecardb"
PDF_PATH = "/kaggle/input/science-trivia-topics/2025_SLPGuide_08R_SC_JSC.pdf"

def initialize_database() -> Chroma:
    """Initialize the vector database with science topics from the PDF"""
    try:
        # Load PDF document
        loader = PyPDFLoader(PDF_PATH)
        documents = loader.load()
        
        # Split text into chunks
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )
        splits = text_splitter.split_documents(documents)
        
        # Create and return ChromaDB instance
        return Chroma.from_documents(
            documents=splits,
            embedding=embeddings,
            persist_directory=DB_DIR
        )
    except Exception as e:
        print(f"Error initializing database: {e}")
        # If PDF loading fails, create an empty database
        return Chroma(
            embedding_function=embeddings,
            persist_directory=DB_DIR
        )

# Initialize the database
db = initialize_database()

## Extract Topics from the Database

Topic Extraction: It extracts main science topics from the document.

In [8]:
def extract_topics() -> List[str]:
    """Extract main topics from the database"""
    try:
        # Query the database for main topics
        results = db.similarity_search("What are the main science topics covered in this document?", k=5)
        
        # Extract topics from the results
        all_topics = []
        for doc in results:
            # Use Gemini to extract topics from the document
            response = topic_extraction_chain.invoke({"text": doc.page_content})
            extracted_topics = [topic.strip() for topic in response["text"].split(",")]
            all_topics.extend(extracted_topics)
        
        # Remove duplicates and return
        seen = set()
        unique_topics = [topic for topic in all_topics if not (topic in seen or seen.add(topic))]
        topics = unique_topics
        return list(set(topics))
        
    except Exception as e:
        print(f"Error extracting topics: {e}")
        # Return default topics if extraction fails
        return [
            "Physics", "Chemistry", "Biology", "Earth Science", "Astronomy",
            "Ecology", "Genetics", "Human Body", "Energy", "Matter"
        ]

# Extract topics
topics = extract_topics()

# Display available topics
print("Available science topics:")
for i, topic in enumerate(topics, 1):
    print(f"{i}. {topic}")

Available science topics:
1. Rocks
2. wind
3. Speed
4. Kinetic Energy
5. precipitation
6. water vapor
7. Force
8. Physical Sciences
9. Landforms
10. Seasons
11. Gravity
12. water cycle
13. condensation
14. soil
15. Climate
16. weather instruments
17. clouds
18. evaporation
19. storms
20. Simple Machines
21. Plant Adaptations
22. Animal Adaptations
23. Habitats
24. Light Energy
25. drought
26. Life Sciences
27. Food Webs
28. Earth Sciences
29. Human Adaptations
30. Food Chains
31. Scientific Inquiry
32. Sound Energy
33. Soil types
34. Erosion
35. temperature
36. Thermal Energy
37. Earthquakes
38. paleontology
39. Minerals
40. Volcanoes
41. Magnetic Energy
42. Health
43. Electrical Energy
44. Weather
45. space
46. Human Body
47. Fuels and fossils
48. atmosphere
49. Ecosystems
50. Weathering
51. Changes of State
52. meteorology
53. Physical Changes
54. Position
55. Motion
56. Potential Energy
57. tsunamis
58. Work
59. weather
60. Chemical Changes
61. Topography
62. air and water resources

# Generate Questions
Question Generation: When generating a question, it:

Selects a topic (randomly or specified)
Retrieves relevant context from the database
Uses Google Gemini to generate a question based on the context

In [9]:
import random
import json

def get_question(topic: Optional[str] = None) -> QuestionData:
    """Get a question based on a specific topic or a random topic"""
    try:
        # Select a topic if none provided
        if not topic:
            topic = random.choice(topics)
        
        # Query the database for relevant context
        results = db.similarity_search(f"Information about {topic}", k=2)
        context = "\n".join([doc.page_content for doc in results])
        
        # Generate a question using the LLM
        question_data = question_chain.run(
            context=context,
            topic=topic
        )
        
        # Parse the JSON response
        try:
            # Extract JSON from the response
            json_str = question_data[question_data.find('{'):question_data.rfind('}')+1]
            parsed_data = json.loads(json_str)
            return QuestionData(
                question=parsed_data["question"],
                options=parsed_data["options"],
                correct_answer=parsed_data["correct_answer"],
                explanation=parsed_data["explanation"],
                topic=parsed_data["topic"]
            )
        except json.JSONDecodeError:
            # Fallback if JSON parsing fails
            print("Error parsing question data. Using fallback question.")
            return QuestionData(
                question=f"What is the main component of {topic}?",
                options=["Option 1", "Option 2", "Option 3", "Option 4"],
                correct_answer="Option 1",
                explanation=f"This is a fallback question about {topic}.",
                topic=topic
            )
    except Exception as e:
        print(f"Error generating question: {e}")
        # Return a fallback question
        return QuestionData(
            question="What is the chemical symbol for water?",
            options=["H2O", "CO2", "O2", "NaCl"],
            correct_answer="H2O",
            explanation="H2O is the chemical formula for water, representing two hydrogen atoms and one oxygen atom.",
            topic="Chemistry"
        )

# Generate a sample question
sample_question = get_question()

# Display the question
print(f"\nSample Question - Topic: {sample_question['topic']}")
print(f"Question: {sample_question['question']}")
for i, option in enumerate(sample_question['options'], 1):
    print(f"{i}. {option}")

# Display the correct answer and explanation
print(f"\nCorrect answer: {sample_question['correct_answer']}")
print(f"Explanation: {sample_question['explanation']}")

  question_data = question_chain.run(



Sample Question - Topic: Magnetic Energy
Question: Which of these uses magnetic energy to work?
1. A light bulb turning on
2. A swing moving back and forth
3. A magnet holding pictures on a fridge
4. A bicycle moving

Correct answer: A magnet holding pictures on a fridge
Explanation: Magnets use magnetic energy to attract and hold things. The magnet on the fridge uses this energy to stick.


# Run a Quiz session

Interactive Quiz: The user answers questions and receives immediate feedback.

Score Tracking: The application keeps track of correct and incorrect answers.

Review System: At the end, it shows missed questions with explanations.

In [11]:
def start_quiz(num_questions: int = 5) -> None:
    """Start a quiz session"""
    # Initialize quiz state
    state = QuizState(
        score=0,
        total_questions=num_questions,
        current_question=None,
        missed_questions=[],
        status="PLAYING"
    )
    
    print(f"\nWelcome to Science Trivia!")
    print(f"You'll be asked {num_questions} questions about various science topics.")
    print("Type 'hint' to see the explanation for the current question.")
    print("Press Enter after typing your answer.\n")

    # Ask questions
    for i in range(num_questions):
        # Get a random question
        question = get_question()
        state["current_question"] = question
        
        # Display the question
        print(f"\nQuestion {i+1}/{num_questions} - Topic: {question['topic']}")
        print(f"Question: {question['question']}")
        for j, option in enumerate(question["options"], 1):
            print(f"{j}. {option}")
        
        # Get user input
        user_input = input("\nEnter your answer (1-4) or 'hint' for explanation: ").strip().lower()
        
        if user_input == 'hint':
            print(f"Explanation: {question['explanation']}")
            continue
        
        try:
            answer_index = int(user_input) - 1
            if 0 <= answer_index < len(question["options"]):
                selected_answer = question["options"][answer_index]
                
                if selected_answer == question["correct_answer"]:
                    print("Correct! ✓")
                    print(f"Explanation: {question['explanation']}")
                    state["score"] += 1
                else:
                    print(f"Incorrect. The correct answer is: {question['correct_answer']}")
                    print(f"Explanation: {question['explanation']}")
                    state["missed_questions"].append(question)
            else:
                print("Invalid choice. Please enter a number between 1 and 4.")
        except ValueError:
            print("Please enter a valid number or 'hint'.")
    
    # Show final score
    print(f"\nQuiz complete! Your score: {state['score']}/{state['total_questions']}")
    
    # Review missed questions
    if state["missed_questions"]:
        print("\nQuestions to review:")
        for question in state["missed_questions"]:
            print(f"\nTopic: {question['topic']}")
            print(f"Question: {question['question']}")
            print(f"Correct answer: {question['correct_answer']}")
            print(f"Explanation: {question['explanation']}")
            input("Press Enter to continue...")

# Run a quiz with 5 questions
start_quiz(5)


Welcome to Science Trivia!
You'll be asked 5 questions about various science topics.
Type 'hint' to see the explanation for the current question.
Press Enter after typing your answer.


Question 1/5 - Topic: Force
Question: Which of these is an example of a force making something move?
1. A book sitting on a table.
2. A ball rolling down a hill.
3. A lightbulb glowing.
4. Ice melting in the sun.



Enter your answer (1-4) or 'hint' for explanation:  2


Correct! ✓
Explanation: A force, like gravity, is causing the ball to move down the hill. The book is not moving, the lightbulb is producing light, and the ice is changing state due to heat, not a direct force causing motion.

Question 2/5 - Topic: Animal Adaptations
Question: A cactus lives in the desert where it doesn't rain very often. Which adaptation helps the cactus survive in this dry environment?
1. Brightly colored flowers to attract insects.
2. Sharp thorns to protect it from animals eating it.
3. A long, deep root system to absorb water from the ground.
4. Large, wide leaves to catch as much sunlight as possible.



Enter your answer (1-4) or 'hint' for explanation:  3


Correct! ✓
Explanation: A cactus needs to find water in a dry place. A long, deep root system helps it reach water far underground. The other options are not the primary ways a cactus adapts to a desert environment.

Question 3/5 - Topic: Magnetic Energy
Question: Which of these uses magnetic energy to work?
1. A light bulb turning on
2. A swing moving back and forth
3. A refrigerator magnet sticking to the fridge
4. A bicycle rolling down a hill



Enter your answer (1-4) or 'hint' for explanation:  3


Correct! ✓
Explanation: Refrigerator magnets use magnetic energy to stick to metal surfaces like a fridge. The other options use different types of energy like electrical (light bulb), kinetic (swing and bicycle).

Question 4/5 - Topic: Rocks
Question: Imagine you find a rock with lots of different pebbles and sand grains stuck together. What type of rock is it most likely to be?
1. Metamorphic
2. Igneous
3. Sedimentary
4. A mineral



Enter your answer (1-4) or 'hint' for explanation:  3


Correct! ✓
Explanation: Sedimentary rocks are formed from small pieces of other rocks and materials that get pressed and cemented together over time. The pebbles and sand grains are clues that it's sedimentary.

Question 5/5 - Topic: Climate
Question: Imagine you live in a place that is hot and sunny almost every day of the year. What is this an example of?
1. Weather changing quickly
2. Climate being consistent
3. A storm happening
4. The water cycle starting



Enter your answer (1-4) or 'hint' for explanation:  2


Correct! ✓
Explanation: Climate describes the usual weather pattern of a place over a long time. If a place is hot and sunny almost every day, that's its climate.

Quiz complete! Your score: 5/5


## Generate a Question for a Specific Topic

Learn by selecting specific topic

In [12]:
def generate_topic_question(topic_name: str) -> None:
    """Generate and display a question for a specific topic"""
    question = get_question(topic_name)
    
    print(f"Question - Topic: {question['topic']}")
    print(f"Question: {question['question']}")
    for i, option in enumerate(question['options'], 1):
        print(f"{i}. {option}")
    
    print(f"\nCorrect answer: {question['correct_answer']}")
    print(f"Explanation: {question['explanation']}")
    
    return question

# Example: Generate a question for a specific topic
print("Available topics:")
for i, topic in enumerate(topics, 1):
    print(f"{i}. {topic}")

topic_index = int(input("\nEnter the number of the topic (1-{}): ".format(len(topics)))) - 1
if 0 <= topic_index < len(topics):
    generate_topic_question(topics[topic_index])
else:
    print("Invalid topic number.")

Available topics:
1. Rocks
2. wind
3. Speed
4. Kinetic Energy
5. precipitation
6. water vapor
7. Force
8. Physical Sciences
9. Landforms
10. Seasons
11. Gravity
12. water cycle
13. condensation
14. soil
15. Climate
16. weather instruments
17. clouds
18. evaporation
19. storms
20. Simple Machines
21. Plant Adaptations
22. Animal Adaptations
23. Habitats
24. Light Energy
25. drought
26. Life Sciences
27. Food Webs
28. Earth Sciences
29. Human Adaptations
30. Food Chains
31. Scientific Inquiry
32. Sound Energy
33. Soil types
34. Erosion
35. temperature
36. Thermal Energy
37. Earthquakes
38. paleontology
39. Minerals
40. Volcanoes
41. Magnetic Energy
42. Health
43. Electrical Energy
44. Weather
45. space
46. Human Body
47. Fuels and fossils
48. atmosphere
49. Ecosystems
50. Weathering
51. Changes of State
52. meteorology
53. Physical Changes
54. Position
55. Motion
56. Potential Energy
57. tsunamis
58. Work
59. weather
60. Chemical Changes
61. Topography
62. air and water resources



Enter the number of the topic (1-62):  59


Question - Topic: weather
Question: Imagine you see puffy, white clouds that look like cotton balls in the sky. What kind of weather might you expect?
1. A big thunderstorm with lots of rain
2. A sunny day with fair weather
3. A long period of very dry weather
4. A snowstorm with strong winds

Correct answer: A sunny day with fair weather
Explanation: Puffy, white clouds called cumulus clouds often mean good weather. They form when warm air rises, but they usually don't bring rain unless they get very big.


# The Future is Bright (and Full of Science!):

Looking ahead, I'm excited to evolve this project further. My future plans include leveraging agentic AI. Imagine converting the current RAG-based notebook into a more sophisticated agentic AI application. This would allow for more dynamic interactions, where the AI could understand the child's learning progress, proactively suggest relevant topics, and even guide them through mini-science explorations.

Furthermore, the goal is to develop a user-friendly UI interaction specifically designed for kids. This intuitive interface would make it easy for my 8-year-old (and others!) to select topics, engage with the quizzes, and learn in a fun and accessible way. By combining the power of RAG with the proactive capabilities of agentic AI and a child-friendly interface, we can create an even more personalized and engaging science learning journey.

This project is more than just a quiz generator; it's a step towards creating a personalized and engaging science learning environment for my child. By harnessing the power of AI, we can transform static information into a dynamic tool that fuels their curiosity and empowers them to explore the wonders of science at their own pace. And who knows, maybe it will inspire other busy parents to do the same!