Replace text #1674
-
Hi I'm back again. I'm trying to replace text from a pdf and in almost of all the pdfs I tried it worked well, but this one has the characters too close and seems to generate a white box for the text too big and deletes the text below:
And another problem is that if the name starts in one line and ends in the next one it replaces it three times: PD: First pdf file attached if you need to test something. Thanks in advance |
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 19 replies
-
Unexpectedly large replacement rectangles are due to overlapping lines in the PDF. Unfortunately, some PDF creators insert lines at a closer vertical distance, than the lineheight (a font property, computed as A first way out is setting a global PyMuPDF parameter: If lines still overlap vertically, the PDF creator made a really ugly looking document! You can still fiddle around this situation by creating very small-height rectangles for redactions only (e.g. 20% of original height around the middle horizontal line of the original rectangle), and still use the original rectangle for inserting the new text. |
Beta Was this translation helpful? Give feedback.
-
This is something you must react to. Search will return several rectangles if |
Beta Was this translation helpful? Give feedback.
-
Ok, so maybe your names will never be hyphenated. Just mentioned it in case. |
Beta Was this translation helpful? Give feedback.
-
A hit rect is the "rect" from my previous example? Because the names can occur more than once haha |
Beta Was this translation helpful? Give feedback.
-
Just to confuse you a bit more: |
Beta Was this translation helpful? Give feedback.
-
Hi I'm back, sorry for answering late, I was on a little vacation. I'm still working on that replace in two lines, but I found a problem with a pdf. In that one I can replace everything but one string, if I try to replace a name in this case "Antonio González", it says NameError: name 'origin' is not defined. But is defined, and the string is in the pdf. Do you know why this might happen? |
Beta Was this translation helpful? Give feedback.
-
If you use nombre = "Emilio Jose Martí Gómez"
nombre_t = nombre.split(" ")
lnombre = len(nombre_t) # number of name components
words = page.get_text("words", sort=True)
i = 0
while i < len(words):
word = words[i]
if word[4] == nombre_t[0]: # found 1st part of name
rects = [fitz.Rect(w[:4]) for w in words[i : i+lnombre]]
# rects = list of rectangles containing the full name
# process them adequately, then ...
i += lnombre
else:
i += 1 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
As I said some post before:
|
Beta Was this translation helpful? Give feedback.
If you use
get_text("words",sort=True)
you don't need search!