[bug] PDF ingestion failure with Gemini 3.5/3.1 Flash models

### Is this a client library issue or a product issue?
This is primarily a backend/model pipeline regression, but it affects the `google-genai` SDK experience due to a complete lack of structured error signaling or failure detection during PDF processing.

### Description
There is a significant regression in how text certain PDFs are ingested when moving from Gemini 2.5 to Gemini 3.x (**specifically affecting `gemini-3.5-flash` and `gemini-3.1-flash-lite`**). 

From what I was able to understand, when a PDF contains Type1 fonts using a custom `/Encoding` with a `/Differences` array but lacks a `/ToUnicode` map, Gemini 3.5/3.1 Flash models fail to decode the glyphs and treat the document as entirely empty. Instead of raising an error or returning a structured warning, the API returns a `200 OK` response with a natural-language message asking the user to manually paste the text.

This behavior makes it impossible to programmatically detect ingestion failures in automated enterprise pipelines.

*Note: `gemini-2.5-flash`, `gemini-2.5-pro`, and surprisingly `gemini-3.1-pro-preview` manage to ingest the file correctly, while `gemini-3.5-flash` and `gemini-3.1-flash-lite` do not.*

### Environment details
- **Programming language:** Python
- **SDK Package:** `google-genai` (latest version)

### Steps to reproduce
1. Download this public sample PDF:
   👉 **[Download Sample PDF](https://docfinder.bnpparibas-am.com/api/files/183c4a9a-e238-4dd4-9bc5-7609579d2067/4608)**

2. Run the minimal reproducible script below passing the downloaded PDF:

```python
"""
Minimal script to reproduce the bug:
Gemini 3.x Flash/Lite models fail to read certain PDFs 
while Gemini 2.5 and Gemini 3.1 Pro process them correctly.

Usage:
    python reproduce_gemini3_pdf_bug.py <path_to_pdf>
"""

import os
import sys
import json
import tempfile
from pathlib import Path

from google import genai
from google.genai.types import Part, GenerateContentConfig


LOCATION = "global"

# Modelli da confrontare: il primo e' quello che fallisce, il secondo funziona.
MODELS_TO_TEST = [
    "gemini-3.5-flash",
    "gemini-3.1-flash-lite",
    "gemini-3.1-pro-preview",
    "gemini-2.5-flash",
    "gemini-2.5-pro"
]

PROMPT = "Extract the Manufacturer of the product from this PDF."


def _bootstrap_credentials() -> str:
    """Inizializza le credenziali e ritorna il project_id."""
    if os.getenv("GOOGLE_APPLICATION_CREDENTIALS") and os.getenv("GOOGLE_CLOUD_PROJECT"):
        return os.environ["GOOGLE_CLOUD_PROJECT"]

    creds_json = os.getenv("GOOGLE_CREDENTIALS_JSON")
    if not creds_json:
        raise RuntimeError(
            "Set GOOGLE_CREDENTIALS_JSON (service account JSON) or "
            "GOOGLE_APPLICATION_CREDENTIALS + GOOGLE_CLOUD_PROJECT."
        )

    creds = json.loads(creds_json)
    tmp = tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False)
    json.dump(creds, tmp)
    tmp.close()
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = tmp.name
    return creds["project_id"]


def run(pdf_path: str) -> None:
    pdf_path = Path(pdf_path).expanduser().resolve()
    if not pdf_path.is_file():
        raise FileNotFoundError(pdf_path)

    project_id = _bootstrap_credentials()
    client = genai.Client(vertexai=True, project=project_id, location=LOCATION)

    pdf_bytes = pdf_path.read_bytes()
    pdf_part = Part.from_bytes(data=pdf_bytes, mime_type="application/pdf")

    config = GenerateContentConfig(
        temperature=0,
        max_output_tokens=2048,
    )

    print("=" * 78)
    print("Gemini PDF ingestion reproducer")
    print("=" * 78)
    print(f"PDF path        : {pdf_path}")
    print(f"Models to test  : {MODELS_TO_TEST}")
    print("=" * 78)

    for model in MODELS_TO_TEST:
        print("\n" + "-" * 78)
        print(f">>> MODEL: {model}")
        print("-" * 78)
        try:
            response = client.models.generate_content(
                model=model,
                contents=[PROMPT, pdf_part],
                config=config,
            )

            # Response metadata
            usage = getattr(response, "usage_metadata", None)
            if usage is not None:
                print(f"[usage] prompt_tokens     = {getattr(usage, 'prompt_token_count', None)}")
                print(f"[usage] candidates_tokens = {getattr(usage, 'candidates_token_count', None)}")
                print(f"[usage] total_tokens      = {getattr(usage, 'total_token_count', None)}")

            candidate = response.candidates[0]
            finish_reason = getattr(candidate, "finish_reason", None)
            print(f"[finish_reason] {finish_reason}")

            # Stampa tutte le parti (testo + eventuali thinking parts)
            parts = candidate.content.parts or []
            print(f"[n_parts] {len(parts)}")
            for i, part in enumerate(parts):
                text = getattr(part, "text", None)
                thought = getattr(part, "thought", None)
                print(f"--- part[{i}] thought={thought} ---")
                print(text if text is not None else f"<non-text part: {part}>")

            # Fallback: stampa anche response.text se disponibile
            try:
                print("\n[response.text]")
                print(response.text)
            except Exception:
                pass

        except Exception as e:  # noqa: BLE001
            print(f"[ERROR calling {model}] {type(e).__name__}: {e}")

    print("\n" + "=" * 78)
    print("DONE")
    print("=" * 78)


if __name__ == "__main__":
    pdf_path = r""
    run(pdf_path)
``

What you will observe is that gemini-3.5-flash and gemini-3.1-flash-lite models fail to ingest the PDF. Looking at the generated responses and the token counts, it is clear that they cannot process it at all, and above all, they do not throw any error or warning alert. Meanwhile, the remaining models successfully ingest the file and respond perfectly.

### Additional Context / Previous Issue Reference
Please note that I previously opened an issue regarding this behavior (#2482), where @Venkaiahbabuneelam mentioned they did not encounter any errors and asked me to test it using the latest SDK version. 

I have now fully updated the SDK and environment, and the issue definitely persists.

To ensure complete reproducibility, I have provided both the direct link to the specific failing PDF and a clearer script this time. Running it will immediately show that `gemini-3.5-flash` and `gemini-3.1-flash-lite` completely miss the PDF content, while the other models handle it perfectly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] PDF ingestion failure with Gemini 3.5/3.1 Flash models #2553

Is this a client library issue or a product issue?

Description

Environment details

Steps to reproduce

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[bug] PDF ingestion failure with Gemini 3.5/3.1 Flash models #2553

Description

Is this a client library issue or a product issue?

Description

Environment details

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions