Skip to content

Add task_35: PDF to Calendar Import#296

Merged
olearycrew merged 7 commits intopinchbench:mainfrom
chad-kiloclaw:task/pdf-to-calendar
Apr 14, 2026
Merged

Add task_35: PDF to Calendar Import#296
olearycrew merged 7 commits intopinchbench:mainfrom
chad-kiloclaw:task/pdf-to-calendar

Conversation

@chad-kiloclaw
Copy link
Copy Markdown
Contributor

New Task: PDF to Calendar Import

This task tests an agent's ability to extract structured date/event data from a PDF and produce a valid ICS calendar file.

Task ID: task_35_pdf_to_calendar
Category: calendar
Grading: automated
Timeout: 180s

What it tests

  • PDF text extraction
  • Date parsing from natural language (e.g. "July 28-31", "December 21-January 1")
  • ICS file generation with all-day events and multi-day date ranges
  • Completeness (at least 10 events extracted from a real school calendar PDF)

Grading criteria

  • ICS file created and valid
  • Minimum 10 events
  • Key anchor dates present: First Day of School (Aug 3 2026), Last Day (May 21 2027), Labor Day, Christmas Break, Spring Break

Workspace file

Provides a real school calendar PDF as input — agents must handle a non-trivial extraction task rather than a synthetic prompt.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 10, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (2 files)
  • tasks/task_35_pdf_to_calendar.md
  • assets/school-calendar.pdf (binary asset)

Reviewed by claude-4.6-sonnet-20260217 · 134,524 tokens

@chad-kiloclaw
Copy link
Copy Markdown
Contributor Author

Hey @olearycrew — tagging you for review as the primary maintainer. This adds task_35 (PDF to Calendar Import), a new automated calendar task that tests PDF text extraction and ICS generation. Passes 100% on all 8 grading criteria. 🦀

@olearycrew
Copy link
Copy Markdown
Member

@chad-kiloclaw thanks

@ScuttleBot
Copy link
Copy Markdown
Contributor

🧪 Test Started

Instance: 155.138.235.46 (Vultr vc2-2c-4gb, Ubuntu 22.04, ATL)
Instance ID: ae1e0f1c-2555-4072-8665-91f75484afc1
Branch: task/pdf-to-calendar
Task: task_35_pdf_to_calendar (PDF to Calendar Import)

Models Under Test

# Model
1 openrouter/anthropic/claude-opus-4.6
2 openrouter/openai/gpt-5.4
3 openrouter/google/gemini-3-pro

Running: All 3 models in parallel
Estimated completion: ~20-30 minutes (3 × 180s timeout + setup overhead)
Started: 2026-04-14 12:22 UTC


Automated PR test by ScuttleBot 🦀

@ScuttleBot
Copy link
Copy Markdown
Contributor

✅ Test Results — task_35_pdf_to_calendar

All 3 models scored 100% on the PDF to Calendar Import task. All grading criteria passed.

Scores

Model Score Time (s) Tokens Cost API Calls
claude-opus-4.6 100% 61.7 133,301 $0.33 7
gpt-5.4 100% 63.0 134,737 $0.12 9
gemini-3.1-pro-preview* 100% 60.1 104,152 $0.11 7

*Note: google/gemini-3-pro does not exist on OpenRouter. Used google/gemini-3.1-pro-preview instead.

Grading Breakdown (all models identical)

Criterion Result
ICS file created
Valid ICS structure
≥10 events extracted
First Day of School (Aug 3, 2026)
Last Day of School (May 21, 2027)
Labor Day (Sep 7, 2026)
Christmas Break (Dec 21, 2026)
Spring Break (Apr 5, 2027)

Efficiency Comparison

Model Score/$1 Score/1K tokens
claude-opus-4.6 3.01 0.0075
gpt-5.4 8.21 0.0074
gemini-3.1-pro-preview 8.97 0.0096

Observations

  • Task difficulty: Easy-Medium. All 3 frontier models achieved perfect scores on the first attempt within ~60s each. The PDF extraction → ICS generation pipeline is well within current model capabilities.
  • No errors or timeouts across any model run.
  • Cost-efficiency: Gemini was the most cost-efficient ($0.11, fewest tokens), Claude was the most expensive ($0.33) but completed in comparable time.
  • The grading checks are solid — date-based anchors (specific dates like 20260803) are unambiguous and easy to validate automatically.

Recommendation

✅ Merge — Task is well-constructed, grading criteria are clear and automated, and all 3 test models handle it cleanly. The task adds good coverage for PDF extraction + structured data generation, which wasn't tested before.


Tested on Vultr vc2-2c-4gb (ATL) | PinchBench v1.2.1 | 2026-04-14 12:35 UTC
Automated PR test by ScuttleBot 🦀

@olearycrew olearycrew merged commit 3fb96cd into pinchbench:main Apr 14, 2026
2 checks passed
olearycrew added a commit that referenced this pull request Apr 14, 2026
Remove numeric prefix from task file added in #296:
- task_35_pdf_to_calendar → task_pdf_to_calendar

Updates the id field and adds to manifest.yaml.
olearycrew added a commit that referenced this pull request Apr 14, 2026
Resolves manifest.yaml conflict by including all tasks:
- GWS tasks from #308
- PDF to calendar task from #296 (renamed to manifest convention)
- CVE security triage task from this branch

Lint passes with 41 tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants