Redact/Replace #344

Singrig · 2019-08-08T14:34:21Z

Hi ,
Pymupdf is really a great package to work with PDF's and other type of formats..

I have a quick question, I need to redact the sensitivity information from PDF, Is there any function related to redact or can we replace the words in PDF while highlighting?

JorjMcKie · 2019-08-08T14:37:25Z

The new MuPDF version 1.16.0 also supports Redact annotations.
I haven't decided yet, whether / when to include support for this also in PyMuPDF.
Currently I am still in v1.16.0 development.
Quite a big update to the current v1.14.x unfortunately.

JorjMcKie · 2019-08-08T14:40:00Z

What do you mean by "sensitivity"? Encryption?
PyMuPDF v1.16.0 will definitely fully support password-based encryption / decryption and permission levels.

Singrig · 2019-08-08T14:58:11Z

Sorry I'm confused Mupdf as redacting the specific words in PDF ? Can you please share some link?

Let's consider a eg, if PDF has Customer name either I need to redact that customer name or I need to replace with some junk letters

JorjMcKie · 2019-08-08T15:11:28Z

No, we are talking about different things. Redact annotations are part of the most recent PDF specification.
What you want is not supported at all by MuPDF - sorry.
I have written an anonymizer going in that direction some time ago. This was covering Base-14 fonts only, too. This is significant effort going far beyond what this repository has to offer.

Singrig · 2019-08-08T15:14:19Z

Sorry for bothering you, if I have list of can't we delete/replace those specific words like we highlighting those words using pymupdf?

JorjMcKie · 2019-08-08T15:21:28Z

No, this already goes fairly deep into how text is coded in a PDF. In the absolutely most simple cases you might even treat the PDF as a text file and use some editor.

Of course you can always cover sensitive things with - say - a black rectangle. But that is cosmetics only: the information is still there.
A text like "Mr. Anonymous" might be coded in a plethora of ways in unfortunate circumstances: hexadecimal, each single letter being separated from each other and not in natural reading sequence, the text may be split across several lines, ... and what not.

Singrig · 2019-08-08T15:27:21Z

Can you please share some code or some links that can help to achieve as u suggested like making it into not in natural reading seq or split across line...

JorjMcKie · 2019-08-08T15:55:13Z

Read chapter text, specifically about textbox text extraction to see how a natural reading order can be re-established.
And of course you should look at the original PDF spec manual to see examples for how complex a simingly simple text may be coded. For example, the PDF manual for this repo on page 9 looks like:

and the highlighted text is coded like this in the PDF file:

BT\n/F38 9.9626 Tf 72 462.176 Td [(PyMuPDF)]TJ/F52 9.9626 Tf 45.81 0 Td [(is)-270(a)-270(Python)-270(binding)-270(for)]TJ

I cannot share the code I developed in the above mentioned case, because that was paid work and I thus do not own the copyright.

Singrig · 2019-08-08T16:13:56Z

Thanks this will help me... Let me check what I can do... The difficulty would be I'm very new to this Field

Singrig · 2019-08-09T05:22:43Z

Hi
I'm trying to add text in pdf i'm getting error "name 'Py_RETURN_NONE' is not defined

My code

import fitz
doc = fitz.open('PyMuPDF.pdf') # new or existing PDF
page = doc[0] # new or existing page via doc[n]
p = fitzPoint(50, 72) # start point of 1st line
text = "Some text,\nspread across\nseveral lines."

JorjMcKie · 2019-08-11T08:14:11Z

you are not using the current version - please switch to 1.14.20

JorjMcKie · 2020-02-23T12:01:20Z

@Singrig - the new v1.16.11 supports redaction annotations.

Singrig · 2020-02-23T12:13:29Z

Thanks McKie,hope the more details I can find in PyMupdf docs

…

On Sun 23 Feb, 2020, 5:31 PM Jorj X. McKie, ***@***.***> wrote: @Singrig <https://github.com/Singrig> - the new v1.16.11 supports redaction annotations. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#344?email_source=notifications&email_token=AIV3AO26W2EUTARRTGEOR43REJQRBA5CNFSM4IKLCXPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMVZ6RA#issuecomment-590061380>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIV3AO5DYWPEDFFOPSEQZRTREJQRBANCNFSM4IKLCXPA> .

JorjMcKie · 2020-02-23T12:24:35Z

@Singrig - sure you will. The new documentation is already uploaded. I am about to also populate PyPI with the installation material.

Singrig · 2020-02-23T12:31:37Z

Cool that will help a lot

…

On Sun 23 Feb, 2020, 5:54 PM Jorj X. McKie, ***@***.***> wrote: @Singrig <https://github.com/Singrig> - sure you will. The new documentation is already uploaded. I am about to also populate PyPI with the installation material. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#344?email_source=notifications&email_token=AIV3AO2CQD247LSRYR4VM73REJTIJA5CNFSM4IKLCXPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMV2M5Y#issuecomment-590063223>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIV3AO2MZUOUCKUP5IYPCU3REJTIJANCNFSM4IKLCXPA> .

Singrig added the question label Aug 8, 2019

Singrig assigned JorjMcKie Aug 8, 2019

JorjMcKie closed this as completed Aug 11, 2019

juviwhale mentioned this issue Jan 24, 2020

PDF redaction #434

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redact/Replace #344

Redact/Replace #344

Singrig commented Aug 8, 2019

JorjMcKie commented Aug 8, 2019

JorjMcKie commented Aug 8, 2019

Singrig commented Aug 8, 2019

JorjMcKie commented Aug 8, 2019 •

edited

Singrig commented Aug 8, 2019 •

edited

JorjMcKie commented Aug 8, 2019

Singrig commented Aug 8, 2019

JorjMcKie commented Aug 8, 2019

Singrig commented Aug 8, 2019

Singrig commented Aug 9, 2019 •

edited

JorjMcKie commented Aug 11, 2019

JorjMcKie commented Feb 23, 2020

Singrig commented Feb 23, 2020 via email

JorjMcKie commented Feb 23, 2020

Singrig commented Feb 23, 2020 via email

Redact/Replace #344

Redact/Replace #344

Comments

Singrig commented Aug 8, 2019

JorjMcKie commented Aug 8, 2019

JorjMcKie commented Aug 8, 2019

Singrig commented Aug 8, 2019

JorjMcKie commented Aug 8, 2019 • edited

Singrig commented Aug 8, 2019 • edited

JorjMcKie commented Aug 8, 2019

Singrig commented Aug 8, 2019

JorjMcKie commented Aug 8, 2019

Singrig commented Aug 8, 2019

Singrig commented Aug 9, 2019 • edited

JorjMcKie commented Aug 11, 2019

JorjMcKie commented Feb 23, 2020

Singrig commented Feb 23, 2020 via email

JorjMcKie commented Feb 23, 2020

Singrig commented Feb 23, 2020 via email

JorjMcKie commented Aug 8, 2019 •

edited

Singrig commented Aug 8, 2019 •

edited

Singrig commented Aug 9, 2019 •

edited