Question / Comment: How to crop white margins around the page #617

sant527 · 2020-08-25T17:14:53Z

I generally have a need to crop the white margins on few sides and some times all sides

I am using pdf-crop-margins to crop the margin in the bottom only using the following commad

pdf-crop-margins -v -p4 100 0 100 100 test.pdf
here -p4 (means percentage not to crop
100 means leave it dont crop
0 means crop till the edge of text
-p4 left bot right top

Can something similar can be done using pymupdf

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2020-08-25T17:57:53Z

Sure. You can set the CropBox property. This attribute's diemsnions initially equal that of the (unrotated) page.
So toset the visible part of the page to some rectangle r, do page.setCropBox(r).

sant527 · 2020-08-26T10:08:34Z

What is r here. i saw in source code it says rect

Can you give some example

JorjMcKie · 2020-08-26T11:27:24Z

An object of class fitz.Rect which represents a rectangle defined by its top-left and bottom-right points (i.e. the diagonal). Can therefore be defined as fitz.Rect(x0, y0, x1, y1), where the top-left is fitz.Point(x0, y0) and the bottom-right is fitz.Point(x1, y1).
For a page, there exists the rectangle page.rect. Example of an A4 page page.rect = fitz.Rect(0, 0, 595, 842).
Omitting e.g. a 50 pixel border around such a rectangle can be achieved by

fitz.Rect(50, 50, 595 - 50, 842 - 50), or
page.rect + (50, 50, -50, -50)

72 pixels equal one inch, so you can calculate in this unit and respectivel centimeters.

You can algebraically add / subtract rectangles: r1 + r2 which adds the resp. coordinates. Here r2 can also be a 4-tuple, if the left operand r1 is a fitz.Rect (example 2 above).

So the shortest form to omit that border in this example is executing page.setCropBox(page.rect + (50, 50, -50, -50).

sant527 · 2020-08-26T15:54:20Z

thank you very much. For an elaborate answer

StevenClontz · 2023-09-29T15:23:40Z

We're looking at dropping pdf-crop-margins as we already need PyMuPDF for other functionality. I think I understand that page.setCropBox(r) crops the page to the rectangle r. Is there any way to automatically compute r to be the smallest rectangle containing all the content on a page (e.g. so we automatically detect and crop out margins)?

JorjMcKie · 2023-09-30T04:42:36Z

We're looking at dropping pdf-crop-margins as we already need PyMuPDF for other functionality. I think I understand that page.setCropBox(r) crops the page to the rectangle r. Is there any way to automatically compute r to be the smallest rectangle containing all the content on a page (e.g. so we automatically detect and crop out margins)?

Yes, page.set_cropbox() (with page being a Page object) sets the visible part of a page.

It does not physically delete the part becoming invisible. Other values for that rectangle may recover these things.

To compute the smallest rectangle for anything the page has to show use page.get_bboxlog() in the following code snippet:

rect = fitz.EMPTY_RECT()  # start with the standard empty rectangle
for item in page.get_bboxlog():
    rect |= item[1]  # join this bbox into the result
# rect now wraps all page content

The advantage is, that no text or image or whatever needs to be extracted to do this.

An item of page.get_bboxlog() looks like this (type, (x0, y0, x1, y1)). "type" can be "fill-text", "fill-image" and more, showing the object type. The second tuple is the boundary box.

StevenClontz · 2023-10-01T23:39:18Z

Thanks @JorjMcKie we'll check this out. :-)

sant527 added the question label Aug 25, 2020

sant527 assigned JorjMcKie Aug 25, 2020

JorjMcKie closed this as completed Aug 26, 2020

JorjMcKie added the resolved fixed / implemented / answered label Aug 26, 2020

StevenClontz mentioned this issue Sep 29, 2023

pyPDF2 dependency? PreTeXtBook/pretext-cli#603

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question / Comment: How to crop white margins around the page #617

Question / Comment: How to crop white margins around the page #617

sant527 commented Aug 25, 2020 •

edited

Loading

JorjMcKie commented Aug 25, 2020

sant527 commented Aug 26, 2020

JorjMcKie commented Aug 26, 2020

sant527 commented Aug 26, 2020

StevenClontz commented Sep 29, 2023 •

edited

Loading

JorjMcKie commented Sep 30, 2023

StevenClontz commented Oct 1, 2023

Question / Comment: How to crop white margins around the page #617

Question / Comment: How to crop white margins around the page #617

Comments

sant527 commented Aug 25, 2020 • edited Loading

JorjMcKie commented Aug 25, 2020

sant527 commented Aug 26, 2020

JorjMcKie commented Aug 26, 2020

sant527 commented Aug 26, 2020

StevenClontz commented Sep 29, 2023 • edited Loading

JorjMcKie commented Sep 30, 2023

StevenClontz commented Oct 1, 2023

sant527 commented Aug 25, 2020 •

edited

Loading

StevenClontz commented Sep 29, 2023 •

edited

Loading