-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question :How to remove a word water_mark from PDF? #468
Comments
Thanks for your feedback!
This is probably possible - depends on details how this element is stored on the page(s). If it really is text and does not cover space already also covered by other visible elements on the page, then you can use that new PyMuPDF feature "Redaction Annotations". In a nutshell works like this:
This approach also works, if this is not text but really an image: Determine the image bbox via The caveat with this approach: applying redact annotations removes overlays everything within the resp. rectangles ... If a watermark exists as part of the background (e.g. to prevent unnoticed copies), then we would need to talk again about the details of that implementation. |
Thanks for your prompt reply ,it works! what wired is ,where used to be "www.abc.com" ,now it's empty .That's so cool But still when you click the empty place , it will link to the website of "www.abc.com" ,even if I replace the text to like "www.mmm.com". Is there any way to remove the link ? It will help me a lot ,thanks again. |
Ah ok. No problem I guess. >>> import fitz
>>> doc=fitz.open("PyMuPDF.pdf")
>>> page=doc[8] # we want to remove link reference to MuPDF web site
>>> for link in page.links(kinds=(fitz.LINK_URI,)): # iterate over internet links only
if 'www.mupdf.com' in link["uri"]: # if found
break
>>> page.addRedactAnnot(link["from"]) # use the link hot spot area on page
'Redact' annotation on page 8 of PyMuPDF.pdf
>>> page.apply_redactions() # remove the text
True
>>> page.deleteLink(link) # and now also the link itself
>>> doc.save("link-deleted.pdf") # this has no more link at that place ...
>>> |
I think my situation is a little different when I followed your instructions ,here is my code:
when I do this :
It's still there ,so I guess it is not deleted ,but covered by redactannotion. did I do it wrong ? many thanks |
This is weird indeed!
No, the text is really gone. But the link does still exist, the question is where. |
Sure , is it OK to send it to your email : jorj.x.mckie@outlook.de ? |
yes |
The output of |
Yes, I just realized that . |
Always welcome - you know where to find the PayPal button 😉? Here is a brute force script that removes watermarks from pages. Its approach is completely different: it scans through the paint commands of a page (after formatting and cleaning them up) and find and destroy This removes the watermarks obviously. But the link in page 1 still remains 🤔, because it is no watermark - it must still be treated via redact annotations. import fitz
doc = fitz.open("2.pdf")
for page in doc:
page.cleanContents() # cleanup page painting commands
xref = page.getContents()[0] # get xref of the resulting source
cont0 = doc.xrefStream(xref).decode().splitlines() # and read it as lines of strings
cont1 = [] # will contain reduced cont lines
found = False # indicates we are inside watermark instructions
for line in cont0:
if line.startswith("/Artifact") and "/Watermark" in line: # start of watermark
found = True # switch on
continue # and skip line
if found and line == "EMC": # end of watermark
found = False # switch off
continue # and skip line
if found is False: # copy commands while outside watermarks
cont1.append(line)
cont = "\n".join(cont1) # new paint commands source
doc.updateStream(xref, cont.encode()) # replace old one with 'bytes' version
doc.save("2-no-watermarks.pdf", garbage=4) |
Your codes above is much more effecient ! |
HI
Thank you for your great job .This project helps me a lot . Here I want to remove a word_mark ,which contains some paticular words .Let's say "www.abc.com" .
My question is , is it possible to remove this "www.abc.com" (on the top of page) ,or even just delete the link ,in that case I can cover it with some white images .
Thanks again.
The text was updated successfully, but these errors were encountered: