# Manipulating PDF Page Rectangles: MediaBox and Friends
## The MediaBox
In PDF, the size or "dimension" of each page must be explicitly defined. This happens via a PDF dictionary key named `/MediaBox`. A typical page object definition will look like this:

In [None]:
import fitz
assert tuple(map(int, fitz.VersionBind.split("."))) >= (1, 19, 4), "Need PyMuPDF v1.19.4+"
doc = fitz.open()  # make an empty, new PDF
page = doc.new_page()  # give it a new page
print(f"Page {page.number} object at xref {page.xref}:")
print(doc.xref_object(page.xref))  # show the resulting page object definition

The above statements have created a page with the default **_ISO A4_** dimension: width 595 points, height 842 points. For now, imagine the **MediaBox** to be something like the "physical" page size. For a precise definition consult the [PDF reference manual](https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf), page 77.

Please note that the first two numbers of the array (in our case above: "0 0") denote the coordinates of the **_bottom-left_** point of the page ... relative to some abstract coordinate system. They usually are zero, but other (including negative) values are possible and do occur. The y-axis is oriented bottom to top.

> In PyMuPDF, the **MediaBox** is the only rectangle with the first two coordinates pointing to the **_bottom-left_** corner. All other rectangles follow MuPDF's convention: the first two numbers mean the **_top-left_** point and the y-axis is oriented top to bottom.

The position of **_all content shown_** by the page is coded with coordinates that are computed relative to the MediaBox. Changing the MediaBox is possible - but be aware that inevitably any existing content will be **_shown displaced_** accordingly. So only change it, if exactly this is your motivation.

## The CropBox and more
The PDF specification mentions four other, optional rectangles that a page may have: **CropBox, ArtBox, BleedBox** and **TrimBox**. All of them define subareas of the MediaBox and are used for various purposes. Again, please consult the reference material for any details.

The most important of these boxes is the CropBox: it defines which part of the physical page should be made visible by PDF consumer software (i.e. PDF readers). If a page has no `/CropBox` definition, the MediaBox will be used. The other three rectangles will default to the CropBox instead.

In [None]:
# optional rectangles default to MediaBox or CropBox
print("CropBox".rjust(10), page.cropbox)
print("page.rect".rjust(10), page.rect)
# the remaining default to CropBox and thus to MediaBox in this case
print("ArtBox".rjust(10), page.artbox)
print("BleedBox".rjust(10), page.bleedbox)
print("TrimBox".rjust(10), page.trimbox)

## Changing Page Rectangles
What happens, if we change any of the page rectangles?

For each of them, PyMuPDF provides a method to set it to a new value: `page.set_mediabox()`, `page.set_cropbox()` and so on. In each case, the argument must be a **_Python sequence_** of 4 numbers, that can be interpreted as a `fitz.Rect` object: this is called "rect-like". The derived rectangle must be *valid*, not *empty* and not *infinite* (see the documentation), and it must be **_contained in the MediaBox._** 

As mentioned before, the MediaBox coordinates must adhere to PDF coordinate conventions (the first two numbers specify bottom-left). The other four must be given in MuPDF's coordinate space.

We will now set the CropBox of our page from above to a true subrectangle of the MediaBox. It will have the dimension 200x300 and its top-left will be positioned at `fitz.Point(100, 100)`.

In [None]:
page.set_cropbox(fitz.Rect(100, 100, 300, 400))  # set the CropBox
print(f"CropBox: {page.cropbox}")
print(f"topl-left: {page.cropbox_position}")

See what happened to the page rectangle, `page.rect` which is presented to the application: it shows the new dimension, but its top-left position is `fitz.Point(0, 0)` as is always the case:

In [None]:
print(f"page rectangle: {page.rect}")

As a result of our change, the page object definition in the PDF has changed in the following way:

In [None]:
print(f"Page {page.number} object at xref {page.xref}:")
print(doc.xref_object(page.xref))

> -----
> The `/CropBox` array above is coded in PDF coordinates, which are derived from MuPDF coordinates in the following way:
> * `442 = mediabox.y1 - cropbox.y1 = 842 - 400`
> * `742 = mediabox.y1 - cropbox.y0 = 842 - 100`
> -----

The remaining three optional rectangles will also show the new value of CropBox:

In [None]:
page.cropbox == page.artbox == page.bleedbox == page.trimbox

To **_revert the previous change,_** you can simply set the Cropbox to the value of MediaBox:

In [None]:
page.set_cropbox(page.mediabox)  # revert the previous change
print(f"CropBox: {page.cropbox}")
print(f"page rect: {page.rect}")

Look at the page definition changes. It will still have a `/CropBox` key, but with the same values as the `/MediaBox`:

In [None]:
print(f"Page {page.number} object at xref {page.xref}:")
print(doc.xref_object(page.xref))

Another way of resetting the CropBox will **_avoid the redundant_** left over `/CropBox` key:

Method `page.set_mediabox()` does not only set the MediaBox, **_but it also removes all optional rectangles._** So you might prefer using `page.set_mediabox(page.mediabox)` ...

In [None]:
page.set_mediabox(page.mediabox)
print(f"Page {page.number} object at xref {page.xref}:")
print(doc.xref_object(page.xref))