# Introduction


We will load a puzzle image (the image with the scrambled pieces).  
Then we use a LLM (Gemini 2.5-pro in this case) to reconstruct the puzzle from the scrambled pieces. We task the LLM to also interpret the image and extract a certain object - and describe it - from the reconstructed image.
Note: we cannot export the reconstructed image directly because the Gemini model output image (image generation) is restricted to certain geographies. As the Keggle community is global, I wanted to create the Notebook so that it can be used by any Kaggle member.


In [1]:
import base64
import re
from pathlib import Path

import cv2
import numpy as np

from google import genai
from google.genai import types
from kaggle_secrets import UserSecretsClient

from IPython.display import display, Markdown

# Utility functions

We define here few utility functions.

In [2]:
def load_bgr(path: Path) -> np.ndarray:
    img = cv2.imread(str(path), cv2.IMREAD_COLOR)
    if img is None:
        raise FileNotFoundError(f"Could not read image: {path}")
    return img

def bgr_to_base64_png(img_bgr: np.ndarray) -> str:
    ok, buf = cv2.imencode(".png", img_bgr)
    if not ok:
        raise ValueError("Failed to encode image as PNG.")
    return base64.b64encode(buf.tobytes()).decode("utf-8")

def base64_png_to_bgr(b64: str) -> np.ndarray:
    raw = base64.b64decode(b64)
    arr = np.frombuffer(raw, dtype=np.uint8)
    img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
    if img is None:
        raise ValueError("Failed to decode base64 PNG into image.")
    return img

# Load the puzzle image

In [3]:
DATASET_ROOT = Path("/kaggle/input/jigsaw-puzzles")  
SCRAMBLED_PATH = DATASET_ROOT / "20pieces1.png" 
REFERENCE_SOLVED_PATH = DATASET_ROOT / "20pieces1_solved.png"  

In [4]:
scrambled_bgr = load_bgr(SCRAMBLED_PATH)
reference_bgr = load_bgr(REFERENCE_SOLVED_PATH)

scrambled_b64 = bgr_to_base64_png(scrambled_bgr)

# Load the Gemini API Key

In order to run this Notebook, you will have to define your own `GEMINI_KEY` using Kaggle Secrets.

In [5]:
user_secrets = UserSecretsClient()
secret_value_gemini_key = user_secrets.get_secret("GEMINI_API_KEY_SECOND")

# Initialize the model

We initialize the model and prepare the system prompt and input prompt.

In [6]:
client = genai.Client(api_key=secret_value_gemini_key)

config = types.GenerateContentConfig(
    response_modalities=["Text"],
)

# Encode OpenCV image -> PNG bytes
_, buf = cv2.imencode(".png", scrambled_bgr)
scrambled_bytes = buf.tobytes()

system_prompt = (
    """
    You are an image reconstruction assistant.
    Given a scrambled jigsaw puzzle image, reconstruct the correctly solved image.
    Then you will use the reconstructed image (not the original one) to extract some
    visual elements, according to the input.
    A frequent error is to refer to the original image, not the reconstructed one.
    """
    
)
input_text = "Describe the traffic signs in the reconstructed image, if any.\
              Describe the location in the reconstructed image of each traffic sign.\
              When describing the traffic signs, explain the signification of it, \
              and the color."

contents = [
    system_prompt,
    input_text,
    types.Part.from_bytes(data=scrambled_bytes, mime_type="image/png"),
]

# Run the query

Next we run the intialized model to generate content. Only text content will be output by the model.

In [7]:
response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=contents,
    config=config
)

text_response = ""
for part in response.candidates[0].content.parts:
    if getattr(part, "text", None):
        text_response += part.text

In [8]:
display(Markdown(text_response))

Here is the reconstructed image from the puzzle pieces:


Based on the reconstructed image, here is a description of the traffic signs:

There are two traffic signs visible in the reconstructed image.

**First Traffic Sign:**

*   **Description:** This is a circular sign with a red border and a white background. It features a black pictogram of two bicycles, one above the other.
*   **Signification:** This is a prohibitory sign that means "No Cycling" or "Bicycles Prohibited." It indicates that this route is not for cyclists.
*   **Color:** The sign is red, white, and black.
*   **Location:** It is located on the upper right side of the image, on a post next to the waterway, near a moored boat.

**Second Traffic Sign:**

*   **Description:** This is a solid blue, circular sign with a white pictogram of a walking person in the center.
*   **Signification:** This is a mandatory sign indicating a "Route for pedestrians only." It specifies that the path is designated for people on foot.
*   **Color:** The sign is blue and white.
*   **Location:** It is located in the middle-right portion of the image, along a paved path below the horizon line.