This project demonstrates how to use Google's Generative AI (Gemini) to build an intelligent assistant that helps students prepare for the TOEFL exam, specifically the reading and listening comprehension sections.

The assistant performs the following tasks:

🧾 Document Understanding: It takes a reading passage (e.g., from a PDF or transcript) and understands the academic content.

🧠 Question Generation with Structured Output: It generates multiple-choice TOEFL-style questions in JSON format, including distractors, the correct answer, and explanations.

🎧 Interactive Quiz Loop: It simulates a TOEFL test experience by asking the user to input answers and provides real-time feedback and score evaluation.

🪄 Few-shot Prompting: It guides the model with examples to generate high-quality questions and varied answer distributions.

The entire process is integrated in a Kaggle Notebook, with structured prompts and response parsing to automate test generation and evaluation — offering a scalable way to practice TOEFL reading/listening/speaking skills with minimal supervision.

# Installing Libraries

In [1]:
!pip uninstall -qy jupyterlab  # Remove unused packages from Kaggle's base image that conflict
!pip install -U -q "google-genai==1.7.0"
!pip install google-generativeai python-dotenv youtube-transcript-api

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyterlab-lsp 3.10.2 requires jupyterlab<4.0.0a0,>=3.1.0, which is not installed.[0m[31m
Collecting python-dotenv
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Collecting youtube-transcript-api
  Downloading youtube_transcript_api-1.0.3-py3-none-any.whl.metadata (23 kB)
Downloading python_dotenv-1.1.0-py3-none-any.whl (20 kB)
Downloading youtube_transcript_api-1.0.3-py3-none-any.whl (2.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m53.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected

# Setting API Keys

In [2]:
# Libraries
from google import genai
from google.genai import types
from google.api_core import retry
import json
from IPython.display import HTML, Markdown, display
import re
import random

In [3]:
# Get API Key
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")

In [4]:
# Setup retry
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

genai.models.Models.generate_content = retry.Retry(
    predicate=is_retriable)(genai.models.Models.generate_content)

In [5]:
# Setting client
client = genai.Client(api_key=GOOGLE_API_KEY)

# Capacity 1: Document Understanding & Structured output/JSON mode/controlled generation & Few Shot

In this section, we demonstrate the ability of the Generative AI model (Gemini) to understand an academic passage and generate structured TOEFL-style multiple-choice questions.

This task showcases three key Gen AI capabilities:

- **🧠 Document Understanding:** The model processes a full passage (from a PDF and extracts the key ideas needed to formulate relevant questions.
- **🧾 Structured Output (JSON):** The generated questions are formatted as JSON objects, including the question, options, correct answer, and explanation. This format allows for automated grading and further processing.
- **🎯 Few-Shot Prompting:** The prompt includes examples of properly formatted questions to guide the model’s output style and quality, ensuring it adheres to the TOEFL format.

This structured generation simulates the kind of test material seen in real TOEFL exams, and sets the foundation for the interactive quiz evaluation built later in the notebook.

## 🧠 Interactive Quiz Function: `GenerateQuiz(response, inputEnable)`

This function creates and runs an interactive TOEFL-style quiz based on questions generated by the Gemini model.

### 🔍 Function Parameters:
- `response`: The output object from Gemini containing a JSON-formatted list of questions.
- `inputEnable` (`bool`): 
  - If `True`, the quiz will prompt the user to manually input answers.
  - If `False`, the system will randomly select answers (useful for simulation or testing).

### ⚙️ What It Does:
1. **Parses the Gemini Response**:
   - Extracts and cleans the generated JSON string.
   - Converts it into a list of questions using `json.loads()`.

2. **Runs a Quiz Loop**:
   - Displays each question and its options.
   - Accepts user input (or simulates an answer randomly).
   - Compares the user’s answer with the correct one.
   - Prints feedback and the explanation for each question.

3. **Tracks Score**:
   - Keeps a running score.
   - Outputs the final result at the end of the quiz.

### 📦 Capabilities Demonstrated:
- Interactive evaluation of Gen AI output
- Automated grading of structured responses
- Simulation-ready mode (for batch testing without user input)

This function is key to turning static question generation into a dynamic learning experience, simulating the TOEFL environment and offering immediate feedback.

In [6]:
def GenerateQuiz(response, inputEnable):

    raw_text = response.candidates[0].content.parts[0].text
    cleaned_text = re.sub(r"```json|```", "", raw_text).strip()

    # Step 1: Parse the JSON
    questions = json.loads(cleaned_text)
    
    # Step 2: Quiz loop
    score = 0
    
    print("🧠 TOEFL Reading Quiz - Powered by Gemini\n")
    
    for i, q in enumerate(questions):
        print(f"Question {i+1}: {q['question']}")
        for opt in q['options']:
            print(opt)
        
        if inputEnable:
            user_answer = input("Your answer (A/B/C/D): ").strip().upper()
        else:
            options = ["A", "B", "C", "D"]

            # Select one randomly
            user_answer = random.choice(options)
        
        if user_answer == q['answer']:
            print("✅ Correct!")
            score += 1
        else:
            print(f"❌ Incorrect. The correct answer was {q['answer']}.")
        
        print("Explanation:", q['explanation'])
        print("-" * 60)
    
    # Final Score
    print(f"\n🎯 Your final score: {score} / {len(questions)}")

## 📄 Question Generation from PDF: `GenerateQuestionsFromDocumentFile(document_filepath, number_of_questions=3, inputEnable=True)`

This function allows users to generate TOEFL-style multiple-choice questions directly from an **academic passage stored in a PDF document**.

### 🔍 Function Parameters:
- `document_filepath` (`str`): Path to the PDF file containing the passage.
- `number_of_questions` (`int`, default = 3): The number of multiple-choice questions to generate.
- `inputEnable` (`bool`, default = `True`): 
  - If `True`, the user is prompted to manually answer the questions.
  - If `False`, the assistant answers randomly (for testing or simulation).

### ⚙️ What It Does:
1. **Uploads the Document**:
   - The file is uploaded using the Gen AI file API to make its contents available to the model.

2. **Constructs a Prompt**:
   - Sends a well-defined prompt to Gemini including:
     - The academic passage.
     - A sample JSON structure for output.
     - Constraints to ensure randomness in correct answer placement.
     - A request for structured TOEFL-style questions with answers and explanations.

3. **Calls Gemini API**:
   - Sends the prompt and document to the model.
   - Receives a structured JSON response with questions.

4. **Runs Interactive Quiz**:
   - Passes the model's response to the `GenerateQuiz()` function.
   - Delivers an interactive quiz to the user or simulates one.

### 🎯 Capabilities Demonstrated:
- ✅ Document Understanding (from PDF)
- ✅ Structured Output / JSON
- ✅ Controlled Generation (with formatting and behavior constraints)
- ✅ Few-Shot Prompting (via sample question format)

This function forms the **core user interface** of the project, turning any academic PDF into an intelligent and evaluable TOEFL practice quiz.

In [7]:
def GenerateQuestionsFromDocumentFile(document_filepath, number_of_questions = 3, inputEnable = True):

    document_file = client.files.upload(file=document_filepath)
    
    # Ask the model to generate a transcript of the audio
    prompt_for_document = """
    You are a TOEFL reading assistant. Based on the following academic passage extracted from a PDF document, create {0} multiple choice questions. 
    
    Each question should follow this structure in JSON format:
    [
    {{
        "question": "What is the main idea of the passage?",
        "options": ["A. ...", "B. ...", "C. ...", "D. ..."],
        "answer": "B",
        "explanation": "Option B is correct because it reflects the main argument of the passage."
    }},
      ,
    {{
        "question": "According to the passage, what is one reason that supports more the main argument of the passage?",
        "options": ["A. ...", "B. ...", "C. ...", "D. ..."],
        "answer": "D",
        "explanation": "Option A is correct because it supports the main argument with facts."
    }}
      ...
    ]

    Important restrictions for the answers:
    - Do not place the correct answer in the same position each time.
    - Randomly shuffle the order of options per question.
    - Avoid repeating the correct answer in the same letter position across multiple questions.
    
    Passage:
    {1}
    """.format(number_of_questions, document_file)

    # Run the prompt and audio through the model
    response = client.models.generate_content(
      model='gemini-2.0-flash-thinking-exp',
      config=types.GenerateContentConfig(temperature=2.0),
      contents=[prompt_for_document, document_file]
    )

    GenerateQuiz(response, inputEnable)

# Exercises for Capacity 1 Document Understanding

In order you can input your own answers based on the PDF Files, you can set the variable inputEnable = True, for simulation purposes and so the notebook will not give any error we use inputEnable = False which selects randomly an answer

In [8]:
GenerateQuestionsFromDocumentFile(document_filepath = '/kaggle/input/Sample_1_Text.pdf',
                                  number_of_questions = 3, inputEnable = False)

🧠 TOEFL Reading Quiz - Powered by Gemini

Question 1: What is the primary subject explored in this passage?
A. The geological timescale and its divisions.
B. The climate and geography of the Mesozoic Era.
C. Theories surrounding the extinction of dinosaurs.
D. The properties and distribution of the element iridium.
❌ Incorrect. The correct answer was C.
Explanation: Option C is correct because the passage primarily discusses different theories explaining the extinction of dinosaurs, evaluating both climatic change and asteroid impact hypotheses.
------------------------------------------------------------
Question 2: According to the passage, which of the following observations weakens the theory that simple climatic change due to sea-level retreat caused dinosaur extinction?
A.  The relatively constant temperature maintained by shallow seas during the Cretaceous.
B. The discovery of iridium in a clay layer marking the end of the Cretaceous period.
C. The survival of cold-blooded anima

In [9]:
GenerateQuestionsFromDocumentFile(document_filepath = '/kaggle/input/Sample_2_Text.pdf',
                                  number_of_questions = 3,  inputEnable = False)

🧠 TOEFL Reading Quiz - Powered by Gemini

Question 1: What is the central theme of the passage regarding the rise of Teotihuacan?
A. The primary factor in Teotihuacan's growth was its superior military might that overshadowed neighboring cities.
B. Teotihuacan's strategic geographic location and abundant natural resources played a crucial role in its development.
C. Religious reforms and the construction of massive religious edifices were the sole reasons for Teotihuacan's expansion.
D.  Centralized urban planning and advanced administrative techniques independently led to Teotihuacan's dominance.
❌ Incorrect. The correct answer was B.
Explanation: Option B is correct because the passage emphasizes the significance of Teotihuacan's geographic location, natural trade route, and obsidian resources as key factors contributing to its rise and prosperity.  These elements are highlighted as fundamental advantages that propelled its development.
-----------------------------------------------

In [10]:
GenerateQuestionsFromDocumentFile(document_filepath = '/kaggle/input/Sample_3_Text.pdf',
                                  number_of_questions = 3,  inputEnable = False)

🧠 TOEFL Reading Quiz - Powered by Gemini

Question 1: What is the main topic explored in the passage?
A. The impact of the Glomar Challenger on oceanographic research.
B. The reasons for invertebrate fauna changes in the Mediterranean Sea.
C. The methodology for dating geological events using sediment samples.
D. The hypothesis that the Mediterranean Sea was once a desert.
❌ Incorrect. The correct answer was D.
Explanation: Option D is correct because the passage primarily discusses the investigation into the geological history of the Mediterranean, culminating in the theory that it was once a desert based on collected evidence.
------------------------------------------------------------
Question 2: According to the passage, which finding directly supported the hypothesis of a desert environment in the Mediterranean basin?
A. The discovery of volcanic rock fragments.
B. The presence of tiny marine fossils in sediment layers.
C.  The peculiar composition and structure of the gypsum.
D.

# Capacity 2: Context Catching Youtube Transcript Understanding & Structured output/JSON mode/controlled generation & Few Shot

In [11]:
# Youtube Transcript API
from youtube_transcript_api import YouTubeTranscriptApi
from youtube_transcript_api.formatters import TextFormatter

## 🎥 YouTube Transcript Extractor: `get_youtube_transcript(youtube_url)`

This function retrieves the **transcript (closed captions)** from a given YouTube video and returns the full text as a single clean string.

### 🔍 Function Parameters:
- `youtube_url` (`str`): The full URL of a YouTube video (e.g., `https://www.youtube.com/watch?v=abc123XYZ`).

### 🔁 What It Does:
1. **Extracts the Video ID**:
   - Parses the YouTube video URL to obtain the video ID required by the transcript API.

2. **Fetches the Transcript**:
   - Uses the `YouTubeTranscriptApi` to get the list of subtitle segments.

3. **Concatenates All Text**:
   - Joins the subtitle segments into a single string.
   - Removes line breaks and leading/trailing spaces.

4. **Handles Errors Gracefully**:
   - If no transcript is found or an error occurs, it prints the error and returns `None`.

### 🧠 Use Case:
This function is especially useful for feeding **spoken academic content** (e.g., TED Talks, lectures) into the **TOEFL question generation pipeline**. It can serve as an alternative input source to PDFs or plain text.

### ⚠️ Notes:
- The video must have manually uploaded or auto-generated **subtitles enabled**.
- This function does not support videos without transcripts.

In [12]:
def get_youtube_transcript(youtube_url):
    """
    Retrieves the transcript from a YouTube video.

    Args:
        youtube_url (str): The URL of the YouTube video.

    Returns:
        str: The transcript of the video, or None if an error occurs.
    """
    try:
        video_id = youtube_url.split("v=")[1].split("&")[0]  # Extract video ID
        transcript_list = YouTubeTranscriptApi.get_transcript(video_id)
        concatenated_text = " ".join([segment["text"] for segment in transcript_list])
        clean_text = concatenated_text.replace("\n", " ").strip()
        return clean_text
    except Exception as e:
        print(f"Error fetching transcript: {e}")
        return None

## 🎧 Generate TOEFL Listening Questions from YouTube: `GenerateQuestions(youtube_url, number_of_questions=3, inputEnable=True)`

This function generates a short set of **TOEFL-style listening comprehension questions** based on the transcript of a YouTube video.

### 🔍 Function Parameters:
- `youtube_url` (`str`): The URL of a YouTube video with captions enabled.
- `number_of_questions` (`int`, default = 3): Number of multiple-choice questions to generate (currently fixed to 3 in the prompt).
- `inputEnable` (`bool`, default = `True`): If `True`, prompts the user for answers; if `False`, randomly selects answers (simulation/testing mode).

### ⚙️ What It Does:
1. **Extracts the Transcript**:
   - Calls `get_youtube_transcript()` to retrieve the full spoken content from the YouTube video.
   - Returns an empty list if no transcript is found.

2. **Builds a Listening Comprehension Prompt**:
   - Sends the transcript to Gemini with a prompt that asks for questions focusing on:
     - Main idea
     - Supporting details
     - Inference
     - Speaker attitude or purpose
   - Ensures structured output in JSON format with randomized answer positions.

3. **Generates and Evaluates Questions**:
   - Calls the Gemini model to generate the questions.
   - Passes the response to `GenerateQuiz()` for interactive answering or automated evaluation.

### 🎯 Capabilities Demonstrated:
- ✅ Audio Understanding (via transcript proxy)
- ✅ Structured Output (JSON)
- ✅ Few-Shot Prompting with format examples
- ✅ Controlled Generation and Answer Randomization

This function turns any **educational or lecture-style YouTube video** into a dynamic TOEFL listening practice tool, enabling customized comprehension quizzes in seconds.

In [13]:

def GenerateQuestions(youtube_url, number_of_questions = 3, inputEnable = True):
    """
    Generates 3 English TOEFL-style questions (listening comprehension) based on a YouTube video.

    Args:
        youtube_url (str): The URL of the YouTube video.

    Returns:
        list: A list containing 3 TOEFL-style questions.  Returns an empty list on error.
    """

    transcript = get_youtube_transcript(youtube_url)
    if not transcript:
        return []
    
    prompt_text = f"""
    Generate 3 English TOEFL listening comprehension questions based on the content of the following transcript:

    {transcript}
    
    Provide questions that assess the test-taker's ability to understand:

    -   Main ideas
    -   Details
    -   Inferences
    -   Purpose
    -   Speaker attitude

    Each question should follow this structure in JSON format:
    [
    {{
        "question": "What is the main idea of the passage?",
        "options": ["A. ...", "B. ...", "C. ...", "D. ..."],
        "answer": "B",
        "explanation": "Option B is correct because it reflects the main argument of the video."
    }},
      ,
    {{
        "question": "According to the video, what is one reason that supports more the main argument of the video?",
        "options": ["A. ...", "B. ...", "C. ...", "D. ..."],
        "answer": "D",
        "explanation": "Option A is correct because it supports the main argument with facts."
    }}
      ...
    ]

    Important restrictions for the answers:
    - Do not place the correct answer in the same position each time.
    - Randomly shuffle the order of options per question.
    - Avoid repeating the correct answer in the same letter position across multiple questions.
    
    """

    response = client.models.generate_content(
              model='models/gemini-2.5-pro-exp-03-25',
              config=types.GenerateContentConfig(temperature=2.0),
              contents=[prompt_text]
            )
    GenerateQuiz(response, inputEnable)

On this exercise, also you can input your youtube video URL if you have one. For this simulation we hardcode with the following video.

In [14]:
#youtube_url = input("Youtube Video URL: ")
youtube_url = "https://www.youtube.com/watch?v=eIho2S0ZahI"
GenerateQuestions(youtube_url, inputEnable = False)

🧠 TOEFL Reading Quiz - Powered by Gemini

Question 1: What is the central topic discussed by the speaker?
A. A scientific analysis of voice production and acoustics.
B. Techniques and principles for speaking effectively and avoiding common communication barriers.
C. The historical evolution of public speaking.
D. A comparison of different vocal warm-up routines used by professionals.
❌ Incorrect. The correct answer was B.
Explanation: Option B is correct because the speaker focuses on both negative speaking habits ('deadly sins') to avoid and positive foundations (HAIL) and vocal techniques (register, timbre, prosody, etc.) to adopt for powerful speaking.
------------------------------------------------------------
Question 2: According to the speaker, why do people tend to prefer politicians with lower voices?
A. Lower voices are perceived as warmer and more trustworthy.
B. Politicians with lower voices tend to use less exaggeration.
C. Deeper vocal registers are associated with power

# Capacity 3: Image Understanding & Structured output/JSON mode/controlled generation & Few Shot

## 🖼️ Generate TOEFL Questions from Image: `GenerateQuestionsFromImage(image_filepath, number_of_questions=3, inputEnable=True)`

This function enables the generation of **TOEFL-style multiple-choice questions** based on the **content of an image** — useful for infographics, diagrams, or labeled visuals in academic contexts.

### 🔍 Function Parameters:
- `image_filepath` (`str`): Path to the image file (e.g., JPEG, PNG).
- `number_of_questions` (`int`, default = 3): Number of questions to be generated.
- `inputEnable` (`bool`, default = `True`):  
  - If `True`, the user manually answers each question.  
  - If `False`, answers are randomly selected (for automated testing/simulation).

### ⚙️ What It Does:
1. **Uploads the Image**:
   - Uses the Gemini file upload API to send the image for visual analysis.

2. **Builds the Prompt**:
   - Asks the model to interpret the image and generate multiple-choice questions based on its content.
   - Provides a JSON-format example to guide the structure of the output.
   - Includes constraints to randomize the placement of the correct answer across different questions.

3. **Generates & Evaluates Questions**:
   - Calls Gemini to produce questions about image content (e.g., observation, inference, details).
   - Uses `GenerateQuiz()` to evaluate the questions through user input or random simulation.

### 📦 Capabilities Demonstrated:
- ✅ Image Understanding (Visual Analysis)
- ✅ Structured Output in JSON
- ✅ Controlled Generation (Randomized answers, clear format)
- ✅ Few-Shot Prompting (Format guidance via examples)

This function expands the project to support **visual learning materials**, enabling rich TOEFL-style comprehension practice from figures, charts, or academic illustrations.


In [15]:
def GenerateQuestionsFromImage(image_filepath, number_of_questions = 3, inputEnable = True):

    image_file = client.files.upload(file=image_filepath)
    
    # Ask the model to generate a transcript of the audio
    prompt_for_image = """
    You are a TOEFL reading assistant. Based on the following image, create {0} multiple choice questions. 
    
    Each question should follow this structure in JSON format:
    [
    {{
        "question": "What is the best option that describes the image?",
        "options": ["A. ...", "B. ...", "C. ...", "D. ..."],
        "answer": "B",
        "explanation": "Option B is correct because it reflects the main argument of the passage."
    }},
      ,
    {{
        "question": "According to the image, describe more about the item at the right?",
        "options": ["A. ...", "B. ...", "C. ...", "D. ..."],
        "answer": "D",
        "explanation": "Option A is correct because it supports the main argument with facts."
    }}
      ...
    ]

    Important restrictions for the answers:
    - Do not place the correct answer in the same position each time.
    - Randomly shuffle the order of options per question.
    - Avoid repeating the correct answer in the same letter position across multiple questions.
    
    Image:
    {1}
    """.format(number_of_questions, image_file)

    # Run the prompt and audio through the model
    response = client.models.generate_content(
      model='gemini-2.0-flash-thinking-exp',
      config=types.GenerateContentConfig(temperature=2.0),
      contents=[prompt_for_image, image_file]
    )

    GenerateQuiz(response, inputEnable)

In [16]:
GenerateQuestionsFromImage('/kaggle/input/Toefl_Image_Sample_1.jpg', inputEnable = False)

🧠 TOEFL Reading Quiz - Powered by Gemini

Question 1: What is the primary activity depicted in the image?
A. Children engaged in a classroom lesson.
B. A group of friends enjoying an outdoor celebration.
C. A family having a quiet picnic in a garden.
D. Students rehearsing for a musical performance.
❌ Incorrect. The correct answer was B.
Explanation: Option B is the most accurate description because the image shows a lively outdoor party scene with children playing and an adult playing music, suggesting a celebration or joyful gathering.
------------------------------------------------------------
Question 2: According to the image, what is the role of the adult figure in the scene?
A. To supervise and ensure the children's safety.
B. To teach the children a new game or activity.
C. To entertain the children with music.
D. To serve refreshments and food to the children.
❌ Incorrect. The correct answer was C.
Explanation: Option C best describes the adult's action as they are playing a 

In [17]:
GenerateQuestionsFromImage('/kaggle/input/Toefl_Image_Sample_2.jpg', inputEnable = False)

🧠 TOEFL Reading Quiz - Powered by Gemini

Question 1: What is the primary subject of the image?
A. A construction site
B. An airport scene
C. A parking lot
D. A train station
❌ Incorrect. The correct answer was B.
Explanation: Option B is the most accurate description because the image clearly depicts an airplane connected to a jet bridge at what appears to be an airport terminal. The presence of ground personnel and airport markings further confirms this.
------------------------------------------------------------
Question 2: According to the image, what is the likely purpose of the extended structure connected to the airplane?
A. To refuel the airplane
B. To load cargo containers
C. To allow passengers to board or deboard
D. To provide shade for ground crew
❌ Incorrect. The correct answer was C.
Explanation: Option C is correct because the extended structure is a jet bridge or passenger boarding bridge, designed to connect the terminal building directly to the aircraft door, facilit

In [18]:
GenerateQuestionsFromImage('/kaggle/input/Toefl_Image_Sample_3.jpg', inputEnable = False)

🧠 TOEFL Reading Quiz - Powered by Gemini

Question 1: What is the most appropriate description of the activity shown in the image?
A. A family preparing food in the kitchen.
B. A family enjoying a meal together at home.
C. A group of friends having a picnic outdoors.
D.  People ordering food at a restaurant.
❌ Incorrect. The correct answer was B.
Explanation: Option B accurately describes the image as it depicts a family sitting around a table filled with food in a home setting and engaging in a mealtime.
------------------------------------------------------------
Question 2: Based on the image, which statement is most likely true about the food on the table?
A. The food primarily consists of desserts and pastries.
B. The meal appears to be lacking in vegetables and fruits.
C. The food includes a variety of dishes such as salads and grain-based items.
D. The food seems to be exclusively fast food and fried snacks.
❌ Incorrect. The correct answer was C.
Explanation: Option C is the mos

# Capacity 4: Audio Understanding & Structured output/JSON mode/controlled generation & Few Shot

## 🎙️ Speaking Evaluation from Audio: `GenerateQuestionsFromAudio(audio_filepath)`

This function analyzes a user's **spoken response** (e.g., from a TOEFL speaking task) and provides a **concise automated evaluation**, including a score estimate and improvement suggestions.

### 🔍 Function Parameters:
- `audio_filepath` (`str`): Path to an audio file containing the user's spoken response (e.g., `.mp3`, `.wav`).

### ⚙️ What It Does:
1. **Uploads the Audio**:
   - Sends the user’s recorded speaking sample to Gemini using the audio file upload API.

2. **Builds a Prompt for Speaking Feedback**:
   - Instructs Gemini to evaluate the speech sample based on TOEFL criteria (clarity, fluency, coherence).
   - Asks for a concise report with:
     - An estimated score range
     - Key strengths
     - Areas for improvement in bullet-point format

3. **Displays the Feedback**:
   - The feedback is printed to the console or notebook output, simulating a real TOEFL speaking review.

### 🧠 Use Case:
This tool acts like a **TOEFL speaking coach**, giving learners instant feedback after practicing a speaking task — useful for self-assessment and progress tracking.

### ✅ Capabilities Demonstrated:
- ✅ Audio Understanding (Speech evaluation)
- ✅ Controlled Generation (Score + feedback format)
- ✅ Structured Output for actionable self-improvement

This function helps learners reflect on their oral performance, improve pronunciation, pacing, and delivery — all critical for succeeding on the TOEFL Speaking section.


In [19]:
def GenerateQuestionsFromAudio(audio_filepath):

    audio_file = client.files.upload(file=audio_filepath)
    
    # Ask the model to generate a transcript of the audio
    prompt_for_audio = """
    You are a TOEFL reading assistant. Based on the following audio, score my speaking and suggest me to improve my speaking and be very concise.

    Follow this structure:
    Based on your audio, here's a concise score and feedback to improve your speaking:

    **Score:**

    Your speaking seems to be in the **Good** range (estimated TOEFL Speaking score: Range of Scores out of 30).
    
    **Strengths:**:

    ....

    **Areas for Improvement (Concise Tips):**

    ...
    
    Audio:
    {0}
    """.format(audio_file)

    # Run the prompt and audio through the model
    response = client.models.generate_content(
      model='gemini-2.0-flash-thinking-exp',
      config=types.GenerateContentConfig(temperature=1.5),
      contents=[prompt_for_audio, audio_file]
    )
    print(response.text)

In [20]:
GenerateQuestionsFromAudio('/kaggle/input/b1_audio.mp3')

Based on your audio, here's a concise score and feedback to improve your speaking:

**Score:**

Your speaking seems to be in the **Good** range (estimated TOEFL Speaking score: 20-25 out of 30).

**Strengths:**:

*   **Clear Pronunciation**: You pronounce words clearly, making it easy to understand you.
*   **Fluent Pace**: Your speaking pace is natural and smooth, without excessive hesitations.
*   **Good Grammar and Vocabulary**: You use correct grammar and appropriate vocabulary for the topic presented in the audio.

**Areas for Improvement (Concise Tips):**

*   **Enhance Intonation**: Vary your intonation more to emphasize key words and phrases. This will make your speaking sound more engaging and less monotone. Try to express emotion and importance through your voice modulation.


In [21]:
GenerateQuestionsFromAudio('/kaggle/input/b2_audio.mp3')

Based on your audio, here's a concise score and feedback to improve your speaking:

**Score:**

Your speaking seems to be in the **Good** range (estimated TOEFL Speaking score: 23-27 out of 30).

**Strengths:**

*   **Clear Pronunciation:** You articulated words clearly and understandably.
*   **Smooth Pacing:**  Your reading speed is well-paced and easy to follow.
*   **Appropriate Intonation:** You demonstrated a good understanding of sentence structure with your intonation.
*   **Vocabulary Handling:** You navigated academic vocabulary related to climate change effectively.

**Areas for Improvement (Concise Tips):**

*   **Enhance Naturalness:** Practice speaking more conversationally rather than just reading aloud. Try to internalize the content to sound less like reading and more like explaining.
*   **Focus on Extempore Speaking:**  Shift practice to responding to prompts without pre-written text to better simulate actual TOEFL speaking tasks.
*   **Vary Intonation Further:**  Wh

In [22]:
GenerateQuestionsFromAudio('/kaggle/input/b3_audio.mp3')

Based on your audio, here's a concise score and feedback to improve your speaking:

**Score:**

Your speaking seems to be in the **Good** range (estimated TOEFL Speaking score: 23-26 out of 30).

**Strengths:**

*   **Pronunciation:** You have clear pronunciation and are easily understandable.
*   **Pacing:** Your pace is generally good and allows listeners to follow.
*   **Fluency:** You speak smoothly without excessive hesitations or pauses in reading this passage.

**Areas for Improvement (Concise Tips):**

*   **Intonation:**  Incorporate more natural intonation. Vary your pitch to emphasize key words and phrases instead of reading in a monotone.
*   **Expression:** Add more expression to your voice. Imagine you are explaining this topic to someone and inject some enthusiasm or engagement.
*   **Pauses:**  Use pauses strategically for emphasis and to create a more conversational flow, rather than just pausing at sentence endings.
