Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question / Comment: Slight link inconsistency #447

Closed
fortysixandtwo opened this issue Feb 20, 2020 · 11 comments
Closed

Question / Comment: Slight link inconsistency #447

fortysixandtwo opened this issue Feb 20, 2020 · 11 comments
Assignees
Labels

Comments

@fortysixandtwo
Copy link

Hi,

I have come across an issue while using PyMuPDF concerning how links are displayed.
Following scenario:
I have a PDF with a cover page, that needs to be signed.
The page has been printed, signed and scanned as PDF.
I should now swap the original cover page (first page) with the signed page from the scanned PDF.

I get two slightly different PDF files (the links are different) depending on how exactly I generate the output PDF.

First way (new empty document, insert everything, save):

  • Create a new (empty) document: new_doc = fitz.open()
  • Open “source” documents: doc_main = fitz.open(‘main.pdf’) and doc_cover = fitz.open(‘cover.pdf’)
  • Insert pages from source documents: new_doc.insertPDF(doc_cover) and new_doc.insertPDF(doc_main, from_page=1)
  • Save: new_doc.save(‘new_file.pdf’)
    I will have some rectangles around all links which were not there in the original ‘main.pdf’ document (see attached image).
    pdf_link_boxes

Second way (alter existing document, save under new name):

  • Open existing main document: doc_main = fitz.open(‘main.pdf’)
  • Open “cover” document: doc_cover = fitz.open(‘cover.pdf’)
  • Delete first page: doc_main.deletePage(0)
  • Insert doc_cover into the main document: doc_main.insertPDF(doc_cover, start_at=0)
  • Save: doc_main.save(‘new_file.pdf’)
    In this second workflow the document looks like the original without the rectangles (see attached image).
    pdf_link_no_boxes

Do you know what might be the difference between the two ways or what the problem might be?
The style in which a link is presented seems to be the only difference as can be seen in the previous images.

Any idea if/how this behaviour may be changed/controlled?

Thanks a lot in advance :)

PS: When I open the original document, delete all pages and insert everything I need again,
I get the same result as in the 'new document' case.

@JorjMcKie
Copy link
Collaborator

Thanks for submitting this. I'll will have a luck into it.

@JorjMcKie
Copy link
Collaborator

Weird - cannot reproduce this. I have a test script for insertPDF which does the same as your alternative 1 - attaching it with test files. If you would please try it out: insertPDF.zip

And / or you could send me your script and accompanying files if confidentiality situation permits.

@fortysixandtwo
Copy link
Author

Hey, thanks for looking into this. I wouldn't want to upload the PDF to github because of confidentiality,
but email would be fine (the original files are almost 20MB).
Can I send the file via email to you?

I will upload my script in the next post.

I've also tried the script and the provided PDF files and I get the same type of rectangle around links as you can see here:
Original file '2.pdf' in qpdfview:
pdf_no_black_rectangles
Output file in qpdfview (notice the black rectangles):
pdf_black_rectangles
The red rectangles are from my viewer and can be considered normal.
Output file in Acrobat Reader:
pdf_link_boxes2

Terminal output from your script:
script_output.txt

@fortysixandtwo
Copy link
Author

I've uploaded the script as you asked.
script.zip

Some explanation as to what it does:
The script is supposed to provide PDF insertion capabilities via a GUI.

There are 2 modules: fitz_pdf.py (everything concerning fitz) and pdfmerger.py (GUI stuff). And is started (at the moment) by running test_wxpanel.py

fitz_pdf contains a class providing 'easy access' to the fitz functionality.
Workflow looks something like this:

new_pdf = PDFMerge()
new_pdf.addPages(src_fname='file1.pdf, page_from=0, page_to=-1)
new_pdf.addPages(src_fname='file2.pdf, page_from=0, page_to=-1)
new_pdf.save('output.pdf')

Something like this will be called from the GUI code.

PS: I don't know if it's relevant, but the versions of python and pymupdf I'm running:

Python 3.8.1 (default, Jan 22 2020, 06:38:00) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import fitz; fitz.version
('1.16.10', '1.16.0', '20191221073132')

@JorjMcKie
Copy link
Collaborator

Ha! Found it!
Some more background for you:

  • Your method 2 does not change anything for pages after page 0. So everything just must appear equal for those.
  • Your method 1 (like my test script) recreates all link (and annotation) definitions for all pages. I forgot to include explicit /Border settings here, which surrenders to the default behaviour of the used PDF viewer, obviously. And they do behave differently: SumatraPDF, PDF-XChange, MuPDF do not show borders, Adobe, Nitro, Foxit do - like your Linux viewer.

This is an easy change. Please let me know your urgency - you seem to have found a workaround. I can also guide you to a temporary fix directly in your PyMuPDF installation.
My next version will contain it anyway.

@JorjMcKie JorjMcKie added bug and removed question labels Feb 24, 2020
@fortysixandtwo
Copy link
Author

Hey,
thanks a lot for having a look at this.
It's not extremely urgent, meaning I can wait,
but nevertheless I would be very interested in the temporary fix - to satisfy my curiosity :)

@JorjMcKie
Copy link
Collaborator

JorjMcKie commented Feb 24, 2020

but nevertheless I would be very interested in the temporary fix - to satisfy my curiosity :)

Wohlan denn:
Edit fitz/utils.py in the installation folder and replace all occurrences of
/Rect[%s]/Subtype/Link by /Rect[%s]/BS<</W 0>>/Subtype/Link.
That's it.
/BS = border style, a PDF dictionary, as such enclosed in << / >> brackets
/W = a dictionary key (denoting border width), set to 0

@fortysixandtwo
Copy link
Author

Hey,
just tested the changes and it works like a charm ;)
Thanks a lot for your assistance, I really appreciate it.

@JorjMcKie
Copy link
Collaborator

oops - closed prematurely, should wait until there is an official version solving this.

@JorjMcKie JorjMcKie reopened this Feb 25, 2020
@JorjMcKie
Copy link
Collaborator

Official v1.16.12 is published.

@tanduong
Copy link

tanduong commented May 5, 2024

I split a pdf file into smaller files like this

import sys
import fitz

fn = sys.argv[1]
fn1 = fn[:-4]
src = fitz.open(fn)
for i in range(len(src)):
    doc = fitz.open()
    doc.insert_pdf(src, from_page=i, to_page=i)
    doc.save("./output/%s-%i.pdf" % (fn1, i))
    doc.close()

And I got borders with this one, maybe it hasn't resolved all the cases yet

01_Introduction to Corporate Finance.pdf

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants