-
Notifications
You must be signed in to change notification settings - Fork 159
Version 0.2.0 #315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Version 0.2.0 #315
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -210,19 +210,23 @@ def add_image_orphans(page, blocks): | |||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| def cluster_stripes(boxes, vertical_gap: float = 12): | ||||||||||||||||||||||||||||||||
| def cluster_stripes(boxes, joined_boxes, vectors, vertical_gap=12): | ||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||
| Divide page into horizontal stripes based on vertical gaps. | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| Args: | ||||||||||||||||||||||||||||||||
| boxes (list): List of bounding boxes, each defined as (x0, y0, x1, y1). | ||||||||||||||||||||||||||||||||
| boxes (list): List of bounding boxes. | ||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||
| boxes (list): List of bounding boxes. | |
| boxes (list): List of bounding boxes, each defined as (x0, y0, x1, y1, class). |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorting key changed from b[3] (bottom y-coordinate) to b[1] (top y-coordinate). While this may be intentional for top-to-bottom ordering within columns, this is inconsistent with the comment on line 249 'Sort top to bottom' which uses b[3] (bottom coordinate). Consider adding a comment explaining why columns are sorted by top coordinate while stripes use bottom coordinate.
| if box[0] - prev_right > HORIZONTAL_GAP: | |
| columns.append(sorted(current_column, key=lambda b: b[1])) | |
| current_column = [box] | |
| else: | |
| current_column.append(box) | |
| if box[0] - prev_right > HORIZONTAL_GAP: | |
| # Note: We sort boxes within each column by their top y-coordinate (b[1]) for top-to-bottom reading order. | |
| # This differs from stripes, which are sorted by bottom y-coordinate (b[3]). | |
| # The use of b[1] here is intentional to ensure columns are read from top to bottom. | |
| columns.append(sorted(current_column, key=lambda b: b[1])) | |
| current_column = [box] | |
| else: | |
| current_column.append(box) | |
| # As above, sort the last column by top y-coordinate (b[1]). |
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature added new parameters joined_boxes and vectors, but the docstring's Args section does not document these parameters. Add documentation for these new parameters to explain their purpose.
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature changed significantly with new parameters page_rect and blocks, and the default vertical_gap changed from 36 to 12, but the docstring's Args section does not document these new parameters or explain the change in default value. Update the documentation to include page_rect and blocks parameters.
Copilot
AI
Nov 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition b[\"bbox\"] in joined_boxes is checking if a tuple/list is 'in' a Rect object, which will always return False. This should use joined_boxes.contains(b[\"bbox\"]) or pymupdf.Rect(b[\"bbox\"]) in joined_boxes to properly check if the bbox is contained within the joined_boxes rectangle.
| if b["bbox"][3] - b["bbox"][1] >= min_bbox_height and b["bbox"] in joined_boxes | |
| if b["bbox"][3] - b["bbox"][1] >= min_bbox_height and pymupdf.Rect(b["bbox"]) in joined_boxes |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| # Generated file - do not edit. | ||
| MINIMUM_PYMUPDF_VERSION = (1, 26, 6) | ||
| VERSION = '0.1.9' | ||
| VERSION = '0.2.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature added new parameters
joined_boxesandvectors, but the docstring's Args section does not document these parameters. Add documentation forjoined_boxes(the bounding rectangle of all boxes) andvectors(list of vector rectangles to consider during stripe division).