In [1]:
!pip install google-generativeai tqdm --quiet

In [2]:
from PIL import Image
from google import genai
from pydantic import BaseModel, Field

client = genai.Client(api_key="AIzaSyBKmGiaiQUyfXVS9IjWys1lIiXBifY3VzE")

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents='''Design a prompt to generate exactly 5 questions, with a single answer per image.
            The given answer must be of only one word.
            For a given image, I want to to generate multiple types of questions
            based on the dataset covering questions which one can answer just by
            looking at the image. Make sure the answers never contain digits e.g. Three instead of 3.
            Example:
            What is the colour of the fruit?
            These types can include, but are *not limited* to:

            *   **Object Identification:** What is the main/prominent object in the image?
            *   **Color Recognition:** What is the dominant color in the scene?
            *   **Material Identification:** What is the primary material of the object?
            *   **Shape Recognition:** What is the general shape of the object?
            *   **Texture Identification:** What is the primary texture visible?
            *   **Location/Position:** Where is the object placed?
            *   **Quantity Estimation:** What is the number of the object?
            *   **Action/Activity (if applicable):** What is the action happening?
            *   **State (condition of the object):** Is the object broken?
            ''',
)

print(response.text)


  warn(


Okay, here's the prompt designed to generate 5 single-word answerable questions for an image, covering the specified question types:

```prompt
You are an AI Image Question Generator. Your task is to generate exactly 5 questions about a given image. Each question must be designed to be answerable with a single word response. Answers must not contain digits, use the word instead, for example, "three" instead of "3". Cover a diverse range of question types that can be answered just by looking at the image. These types include, but are not limited to: Object Identification, Color Recognition, Material Identification, Shape Recognition, Texture Identification, Location/Position, Quantity Estimation, Action/Activity, and State. The output should be strictly formatted as a numbered list, where each item is a question ending with a question mark.

Here's an example of the format you should follow:

1. What is the prominent object?
2. What color is the sky?
3. What is the object made of?
4. Wh

In [3]:
prompt_for_qa = '''
Generate exactly 5 questions about a given image. Each question must have a single-word answer.
The questions should cover a range of visual aspects detectable directly from the image, including, but not limited to:

*   **Object Identification:** (e.g., What is the main object?)
*   **Color Recognition:** (e.g., What color is the sky?)
*   **Material Identification:** (e.g., What is the vase made of?)
*   **Shape Recognition:** (e.g., What is the shape of the roof?)
*   **Texture Identification:** (e.g., What is the texture of the sand?)
*   **Location/Position:** (e.g., Where is the cat?)
*   **Quantity Estimation:** (e.g., How many trees?)  (Answer must be a word, e.g., "Three" not "3")
*   **Action/Activity (if applicable):** (e.g., What is the dog doing?)
*   **State (condition of the object):** (e.g., Is the glass full?)

The questions should not be repetitive in subject matter, and the answers should only be one word.
Ensure no answers contain digits; instead, use the word representation of numbers (e.g., "Four" instead of "4").

Here's what the desired output format should look like (but with different, relevant questions):

What is the main object?
What is the dominant color?
What is the object's shape?
Is the container empty?
'''

### Custom Sampling(S) Technique

In [4]:
import os
from PIL import Image
from google import genai
from pydantic import BaseModel, Field
from typing import List
from tqdm import tqdm
import pandas as pd

class QA(BaseModel):
    questions: List[str] = Field(description='Questions about the image as per format')
    answers: List[str] = Field(description='Single word answers')

client = genai.Client(api_key="AIzaSyBKmGiaiQUyfXVS9IjWys1lIiXBifY3VzE")

# Import directories from VRMP2 Curated Dataset 1
dir_path = pd.read_csv('/kaggle/input/vrmp2-paths/sampled_paths.csv')['path'].values

# file paths
filenames = list(dir_path)

results = []

for path in tqdm(filenames, desc="Processing images"):
    while True:
        try:
            # open image
            img = Image.open(path)

            # call the API
            response = client.models.generate_content(
                model="gemini-2.0-flash",
                contents=[img, prompt_for_qa],
                config={
                    'response_mime_type': 'application/json',
                    'response_schema': QA,
                }
            )

            # validate and store
            data = QA.model_validate_json(response.text)
            results.append({
                'image_path': path,
                'questions': "|".join(data.questions),
                'answers':   "|".join(data.answers),
            })

            # success → break out of retry loop
            break

        except Exception as e:
            # log and immediately retry this same image
            print(f"[ERROR] processing {path!r}: {e}. Retrying...")

# convert to DataFrame and save CSV
df = pd.DataFrame(results)
df.to_csv('S_generated_vqa.csv', index=False)
print("Saved results to S_generated_vqa.csv")

Processing images:  47%|████▋     | 11668/24958 [4:57:42<4:51:58,  1.32s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/8b/8b80ace5.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  52%|█████▏    | 12893/24958 [5:32:19<5:03:50,  1.51s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/05/057062a7.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  56%|█████▌    | 14033/24958 [6:02:31<5:36:03,  1.85s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/18/18ab372b.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  57%|█████▋    | 14121/24958 [6:06:22<4:12:49,  1.40s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/23/23b0bdd3.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  61%|██████    | 15215/24958 [6:37:21<3:39:53,  1.35s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/3b/3bfebab2.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  61%|██████    | 15242/24958 [6:39:41<3:46:09,  1.40s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/73/736f202c.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  73%|███████▎  | 18277/24958 [7:56:44<7:13:11,  3.89s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/c2/c2185a71.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  74%|███████▍  | 18422/24958 [8:01:50<2:39:15,  1.46s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/ae/aecd54c2.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  78%|███████▊  | 19373/24958 [8:27:50<2:22:56,  1.54s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/01/019ecf7c.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  78%|███████▊  | 19434/24958 [8:30:58<2:13:20,  1.45s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/6b/6bc586b9.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  89%|████████▉ | 22274/24958 [9:41:24<1:05:10,  1.46s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/64/64800774.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images:  98%|█████████▊| 24363/24958 [10:33:16<18:55,  1.91s/it]

[ERROR] processing '/kaggle/input/abo-small/images/small/c1/c1a08003.jpg': 408 Request Timeout. {'message': 'Request Timeout', 'status': 'Request Timeout'}. Retrying...


Processing images: 100%|██████████| 24958/24958 [10:48:42<00:00,  1.56s/it]


Saved results to S_generated_vqa.csv


In [5]:
df = pd.read_csv('/kaggle/working/S_generated_vqa.csv')
print(df.shape)
df

(24958, 3)


Unnamed: 0,image_path,questions,answers
0,/kaggle/input/abo-small/images/small/35/35aeaf...,What object is pictured?|What color are object...,Brush|Black|Two|Cylindrical|Box
1,/kaggle/input/abo-small/images/small/5b/5b146f...,What product is displayed?|What color are the ...,Shampoo|Green|Six|Vertical|Leaves
2,/kaggle/input/abo-small/images/small/b4/b4b24d...,What product is shown?|How many packages shown...,Eyelashes|Four|White|Paper|False
3,/kaggle/input/abo-small/images/small/71/71c626...,What object appears?|What color is box?|How ma...,Box|Blue|Two|Square|Paper
4,/kaggle/input/abo-small/images/small/c7/c70482...,How many bottles?|What color are lids?|What is...,Three|Black|Bottles|Clear|Upright
...,...,...,...
24953,/kaggle/input/abo-small/images/small/b6/b6a657...,What is the object?|How many colors?|What colo...,Case|Two|Pink|White|Plastic
24954,/kaggle/input/abo-small/images/small/7a/7abd01...,What object is shown?|What color is it?|How ma...,Cable|White|Two|Straight|Plastic
24955,/kaggle/input/abo-small/images/small/66/6606a2...,What is the main object?|What color is the ban...,Watch|Beige|Square|One|Analog
24956,/kaggle/input/abo-small/images/small/5d/5d29cf...,What is the object?|What color is it?|What sha...,Charger|Black|Rectangle|Five|Plastic
