Skip to content

Bug/190-bug-pypdfium2-버전-업그레이드에-따른-함수-교체#192

Merged
inoray merged 1 commit intodevelopfrom
bug/190-bug-pypdfium2-버전-업그레이드에-따른-함수-교체get_pos---get_bound
May 6, 2026

Hidden character warning

The head ref may contain hidden characters: "bug/190-bug-pypdfium2-\ubc84\uc804-\uc5c5\uadf8\ub808\uc774\ub4dc\uc5d0-\ub530\ub978-\ud568\uc218-\uad50\uccb4get_pos---get_bound"
Merged

Bug/190-bug-pypdfium2-버전-업그레이드에-따른-함수-교체#192
inoray merged 1 commit intodevelopfrom
bug/190-bug-pypdfium2-버전-업그레이드에-따른-함수-교체get_pos---get_bound

Conversation

@HeechanKim-Genon
Copy link
Copy Markdown

@HeechanKim-Genon HeechanKim-Genon commented May 6, 2026

📌 Summary

  • PR #163에서 진행된 pypdfium2 라이브러리 버전 업그레이드(4.30.0 → 5.6.0)로 인해 발생한 에러를 수정.
  • 기존 get_pos() 메소드가 v5.x에서 삭제됨에 따라, 이를 최신 API인 get_bounds()로 교체.

⚙️ 주요 변경 사항

  • API 마이그레이션: [PR #163](https://github.com/genonai/doc_parser/pull/163#issue-4043444798)의 의존성 업데이트 결과, 더 이상 지원되지 않는 PdfPageObject.get_pos() 호출부를 get_bounds()로 수정.
  • 호환성 확보: docling 백엔드에서 PDF 객체의 좌표를 정상적으로 가져오지 못해 발생하던 AttributeError를 해결함.

🛠 코드 변경 내역 (Diff)

- pos = obj.get_pos()
+ pos = obj.get_bounds()

Summary by CodeRabbit

  • Bug Fixes
    • Improved image bounds detection in PDF page processing to produce more accurate bitmaps and crops.
  • Chores
    • Upgraded pypdfium2 dependency to the 5.x series for updated PDF handling and extraction behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18936466-50d3-4000-81cf-340ed0d3a97c

📥 Commits

Reviewing files that changed from the base of the PR and between 54cccce and 9555fa5.

📒 Files selected for processing (2)
  • docling/backend/pypdfium2_backend.py
  • pyproject.toml

📝 Walkthrough

Walkthrough

A small API alignment and dependency update: the PyPdfium backend now calls obj.get_bounds() (replacing obj.get_pos()) when computing per-image crop boxes, and pypdfium2 dependency in pyproject.toml is raised to >=5.0.0, <6.0.0.

Changes

PyPdfium Image Bounds API Update

Layer / File(s) Summary
API Compatibility
docling/backend/pypdfium2_backend.py
In PyPdfiumPageBackend.get_bitmap_rects, image bounds retrieval is changed from obj.get_pos() to obj.get_bounds() to match the updated pypdfium2 API.

Dependency Bump

Layer / File(s) Summary
Manifest
pyproject.toml
Dependency constraint for pypdfium2 updated from >=4.30.0, !=4.30.1, <6.0.0 to >=5.0.0, <6.0.0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Poem

I hopped through code with careful bounds,
Replaced a call and checked the rounds,
Dependencies lifted, footprints neat,
Images align — my task complete! 🐇📐

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title uses a non-English string with special characters that doesn't clearly convey the main change to English-speaking reviewers. Consider using a clear English title like 'Replace get_pos() with get_bounds() for pypdfium2 v5 compatibility' to improve clarity for all team members.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bug/190-bug-pypdfium2-버전-업그레이드에-따른-함수-교체get_pos---get_bound

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@HeechanKim-Genon HeechanKim-Genon changed the title get_pos-->get_bound() Bug/190-bug-pypdfium2-버전-업그레이드에-따른-함수-교체 May 6, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docling/backend/pypdfium2_backend.py`:
- Line 259: The code uses pypdfium2's get_bounds() (replacing get_pos()), which
only exists in v5+, so update the pypdfium2 version constraint to require v5 or
newer by changing the constraint string ">=4.30.0, !=4.30.1, <6.0.0" to
">=5.0.0, <6.0.0" in the project dependency declaration; this ensures
installations cannot pull a v4.x pypdfium2 that would raise AttributeError for
get_bounds(), and no other code changes are needed because
BoundingBox.from_tuple(pos, origin=CoordOrigin.BOTTOMLEFT) remains compatible.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a330f5d-25f0-43cd-b280-5bad9925fb90

📥 Commits

Reviewing files that changed from the base of the PR and between cc782c9 and 54cccce.

📒 Files selected for processing (1)
  • docling/backend/pypdfium2_backend.py

Comment thread docling/backend/pypdfium2_backend.py
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the docling backend to use get_bounds() instead of get_pos() when retrieving image object positions in the pypdfium2 backend, which likely improves the accuracy of bounding box calculations. I have no feedback to provide as there were no review comments to evaluate.

@HeechanKim-Genon HeechanKim-Genon force-pushed the bug/190-bug-pypdfium2-버전-업그레이드에-따른-함수-교체get_pos---get_bound branch from 54cccce to 9555fa5 Compare May 6, 2026 08:40
@inoray inoray merged commit 847e1c4 into develop May 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] pypdfium2 버전 업그레이드에 따른 함수 교체(get_pos -> get_bound)

2 participants