Skip to content

Unable to evaluate datasets that contain Inline pdfs using chatlas to_solver() #219

@karangattu

Description

@karangattu

Use this evaluation code to see and observe the error:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/Documents/GitHub/chatlas/\t 1\t\nSample PDF  Created for testing PDFObject  This PDF is three pages long. Three long pages. Or three short pages if you’re optimistic. Is it the same as saying “three long minutes'
my_eval_dataset.jsonl
{"input":[{"id":"gMnWYQj4zRufqn8gXk6CQD","content":"You can inspect the provided image to answer questions accurately","role":"system"},{"id":"abJmLppRSrnVvTy5tyENcH","content":[{"type":"text","text":"Review the attached materials to answer the question."},{"type":"image","image":"https://www.allrecipes.com/thmb/GMjVlmWXRMGPuIK2FRh8MBvIXgA=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/AR-151595-Campbells-green-bean-casserole-DDMFS-4x3-49f408d95f4d40a39cb14ce6fa9544a5.jpg","detail":"auto"},{"type":"document","document":"\t 1\t\nSample PDF  Created for testing PDFObject  This PDF is three pages long. Three long pages. Or three short pages if you’re optimistic. Is it the same as saying “three long minutes","filename":"file_001.pdf","mime_type":"application/pdf"},{"type":"text","text":"What is the recipe shown in the image and what does the document say?"}],"role":"user"}],"target":"The recipe shown in the image is a generic Green Bean Casserole, and the document provides a sample PDF file with meaningless text as a filler."}
test_inline_eval.py
from chatlas import ChatOpenAI
from inspect_ai import Task, task
from inspect_ai.dataset import json_dataset
from inspect_ai.scorer import model_graded_qa

chat = ChatOpenAI(model="gpt-5-nano-2025-08-07")

@task
def test_inline_image():
    return Task(
        dataset=json_dataset("my_eval_dataset.jsonl"),
        solver=chat.to_solver(),
        scorer=model_graded_qa(),
        model="openai/gpt-5-nano-2025-08-07",
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions