Skip to content

3rd Party multi-modal model gemini-2.0-flash-001 responds with MANY '-' and whitespaces in tables causing >1000 chunks for 1 page #614

@JohanBekker

Description

@JohanBekker

Describe the bug
3rd Party multi-modal model gemini-2.0-flash-001 sometimes uses a lot of -'s to describe a markdown table, exploding the number of tokens for the page. With a chunk size of <=512, this sometimes leads to >1000 chunks for a single page.

Files
https://www.asml.com/en/investors/annual-report/2023

Job ID
0d446037-f53b-4cd0-8978-dfd346e50915

Client:
Please remove untested options:

  • Python Library
  • API

Additional context
Response for page 347 of the ASML annual report is added as file because it didn't fit in here.

page_347.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions