Skip to content

[bug] Silent failure when uploading PDF with Type1 fonts missing /ToUnicode map — no exception raised, gemini-3.x only #2482

@menonmm

Description

@menonmm

Is this a client library issue or a product issue?

This is both, but the client library has an actionable gap: the SDK raises no exception, warning, or structured signal when the model silently fails to process an uploaded PDF. The call returns 200 OK with a natural-language response asking the user to paste the document manually — indistinguishable from a successful response in automated pipelines. The underlying model-level regression is separately reported on the Google AI Dev Forum.


Environment details

  • Programming language: Python
  • OS: Linux / macOS (reproduced on both)
  • Language runtime version: Python 3.11+
  • Package version: google-generativeai latest

Steps to reproduce

  1. Take a PDF file whose Type1 fonts use a custom /Encoding with /Differences array but have no /ToUnicode map (e.g. any KID document generated by Neevia docCreator v4.5 — full file analysis in the Dev Forum post linked above).

  2. Upload the file and call generate_content() targeting any Gemini 3.x model:

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

uploaded_file = genai.upload_file(
    path="LU0089290844_KID.pdf",
    mime_type="application/pdf"
)

model = genai.GenerativeModel(model_name="gemini-3.5-flash")
# also reproduced with: gemini-3.1-pro, gemini-3.1-flash-lite

response = model.generate_content([
    uploaded_file,
    "Extract all data from this PDF document."
])

print(response.text)
  1. Observe that:

    • No exception is raised.
    • response.text contains a message such as "It seems the text of the document was not included — please paste it directly."
    • There is no structured field in the response to detect the failure programmatically.
  2. Switch model_name to "gemini-2.5-flash" with identical code and the same file → correct extracted content is returned.


Expected behavior

  • The model extracts the PDF content correctly (as gemini-2.5-flash does), or
  • The SDK raises a warning / structured error signal when the uploaded file is not processed, so callers can detect and handle the failure in automated pipelines.

Actual behavior

The call succeeds with HTTP 200. The model silently ignores the PDF content and returns a natural-language fallback response. No exception, no warning, no detectable signal.


Additional context

I analysed 4 failing files and 2 working references. The pattern is fully reproducible:

File Producer /ToUnicode missing Result
LU0089290844_KID.pdf Neevia docCreator v4.5 All Type1 fonts ❌ Empty
LU2533812058_KID.pdf Neevia docCreator v4.5 All Type1 fonts ❌ Empty
LU2314312922_KID.pdf Neevia docCreator v4.5 All Type1 fonts ❌ Empty
LU2526007799_KID.pdf Neevia docCreator v5.0 /R39 on page 2 only ⚠️ Partial — page 2 corrupted
PRIIP_KID_F0GBR04BQM_299.pdf Neevia docCreator v5.0 None ✅ OK

All files produced by Neevia docCreator v4.5 systematically omit /ToUnicode on Type1 fonts with custom encoding. Without /ToUnicode, a conforming PDF text extractor (ISO 32000) cannot map glyph codes to Unicode and reads the document as empty. gemini-2.5-flash and libraries like pypdf handle this correctly by falling back to glyph names in /Differences via the Adobe Glyph List. Gemini 3.x does not apply this fallback.

SDK-level ask: even if the model fix must happen on the product side, the library could optionally add a pre-flight check warning callers when a PDF's fonts lack /ToUnicode maps, preventing silent failures in production pipelines.

Happy to share the PDF files or further details if helpful.

Metadata

Metadata

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions