diff --git a/docs/user/extract-text.md b/docs/user/extract-text.md index 129e66068..639a715a2 100644 --- a/docs/user/extract-text.md +++ b/docs/user/extract-text.md @@ -85,7 +85,7 @@ parts = [] def visitor_body(text, cm, tm, font_dict, font_size): y = cm[5] - if y > 50 and y < 720: + if 50 < y < 720: parts.append(text) diff --git a/docs/user/forms.md b/docs/user/forms.md index d8a0fe6e6..747415e0a 100644 --- a/docs/user/forms.md +++ b/docs/user/forms.md @@ -32,7 +32,6 @@ writer.update_page_form_field_values( auto_regenerate=False, ) -# write "output" to pypdf-output.pdf with open("filled-out.pdf", "wb") as output_stream: writer.write(output_stream) ``` @@ -94,14 +93,14 @@ for page in reader.pages: However, while similar, there are some very important differences between the two above blocks of code. Most importantly, the first block will return a list of Field objects, whereas the second will return more generic dictionary-like objects. The objects lists will *mostly* reference the same object in the underlying PDF, meaning you'll find that `obj_taken_fom_first_list.indirect_reference == obj_taken_from _second_list.indirect_reference`. Field objects are generally more ergonomic, as the exposed data can be accessed via clearly named properties. However, the more generic dictionary-like objects will contain data that the Field object does not expose, such as the Rect (the widget's position on the page). Therefore the correct approach depends on your use case. -However, it's also important to note that the two lists do not *always* refer to the same underlying PDF object. For example, if the form contains radio buttons, you will find that `reader.get_fields()` will get the parent object (the group of radio buttons) whereas `page.annotations` will return all the child objects (the individual radio buttons). +However, it is also important to note that the two lists do not *always* refer to the same underlying PDF object. For example, if the form contains radio buttons, you will find that `reader.get_fields()` will get the parent object (the group of radio buttons) whereas `page.annotations` will return all the child objects (the individual radio buttons). -__Caution: Remember that fields are not stored in pages: If you use `add_page()` the field structure is not copied. It is recommended to use `.append()` with the proper parameters instead.__ +__Caution: Remember that fields are not stored in pages; if you use `add_page()` the field structure is not copied. It is recommended to use `.append()` with the proper parameters instead.__ -In case of missing _field_ objects in `/Fields`, `writer.reattach_fields()` will parse page(s) annotations and will reattach them. This fix can not guess intermediate fields and will not report fields using the same _name_. +In case of missing _field_ objects in `/Fields`, `writer.reattach_fields()` will parse page(s) annotations and will reattach them. This fix cannot guess intermediate fields and will not report fields using the same _name_. ## Identify pages where fields are used -On order to ease locating page fields you can use `page.get_pages_using_field`. This methods accepts a field object, id est a *PdfObject* that represents a field (as are extracted from `_root_object["/AcroForm"]["/Fields"]`. The method returns a list of pages, because a field can have multiple widgets as mentioned previously (e.g. radio buttons or text displayed on multiple pages). +In order to ease locating page fields you can use `get_pages_showing_field` of PdfReader or PdfWriter. This method accepts a field object, a *PdfObject* that represents a field (as extracted from `_root_object["/AcroForm"]["/Fields"]`). The method returns a list of pages, because a field can have multiple widgets as mentioned previously (e.g. radio buttons or text displayed on multiple pages). The page numbers can then be retrieved as usual by using `page.page_number`. diff --git a/docs/user/merging-pdfs.md b/docs/user/merging-pdfs.md index a0f4314da..befc1442f 100644 --- a/docs/user/merging-pdfs.md +++ b/docs/user/merging-pdfs.md @@ -67,9 +67,9 @@ If you want to insert pages in the middle of the destination, use `merge` (which You can insert the same page multiple times, if necessary even using a list-based syntax: ```python +# Insert pages 2 and 3, with page 1 before, between, and after writer.append(reader, [0, 1, 0, 2, 0]) ``` -will insert the pages 1 and 2 with page 0 before, in the middle and after. ## add_page / insert_page diff --git a/docs/user/viewer-preferences.md b/docs/user/viewer-preferences.md index bda3970ab..189adef44 100644 --- a/docs/user/viewer-preferences.md +++ b/docs/user/viewer-preferences.md @@ -4,7 +4,7 @@ It is possible to set viewer preferences of a PDF file. These properties are described in Section 12.2 of the [PDF 1.7 specification](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf). Note that the `/ViewerPreferences` dictionary does not exist by default. -If it's not already present, it must be created by calling the `create_viewer_preferences` method +If it is not already present, it must be created by calling the `create_viewer_preferences` method of the `PdfWriter` object. If viewer preferences exist in a PDF file being read with `PdfReader`, @@ -79,5 +79,5 @@ with open("output.pdf", "wb") as output_stream: ``` The names beginning with a slash character are part of the PDF file format. They are -included here to aid to anyone searching pypdf documentation +included here to ease searching the pypdf documentation for these names from the PDF specification.