# Example: Detect Hidden Text
With its v1.19.0, PyMuPDF has added a ways to detect whether object cover or hide other objects. For example in a legal context it is especially interesting to know whether there exists text that only **_appears_** to be deleted by some black rectangle - resulting in insufficiently protected personal data. Do have a look at the [this]() project.

The following script looks for text on a PDF page that is only hidden by some drawing - as opposed to being truly deleted. The page contains three occurrences of the word "estate" covered by black rectangles.

There also exist other drawings and text occupying the same space, but in these cases the text is **_above_** the drawings and are therefore accepting as harmless.

In [3]:
"""
Demo program: find characters covered by drawings.

We aim to identify text characters, that are covered by a "fill" drawing by
at least 80%.
This entails the following steps:
* make a list of "significant" drawings:
  - type "fill" or "fill-stroke"
  - large enough
  - not transparent
* Walk through the characters of text spans (as returned by page
  method "get_texttrace()") and check whether it intersects a "later" drawing.
  For each span character, report whether it is significantly covered.

Dependencies:
PyMuPDF v1.19.0, which introduces sequence numbers for drawings and text spans
as returned by methods 'page.get_drawings()' and 'page.get_texttrace()'.
"""
import fitz

if tuple(map(int, fitz.VersionBind.split("."))) < (1, 19, 0):
    raise ValueError("requires PyMuPDF v1.19.0 or later")

doc = fitz.open("blacked.pdf")
print("Processing file '%s' with %i pages." % (doc.name, doc.page_count))
for page in doc:
    print()
    # make list of relevant drawing rectangles:
    # type "fill", large enough, not transparent
    seq_paths = [
        (p["seqno"], p["rect"].irect)  # just use the IRect: precise enough
        for p in page.get_drawings()
        if p["rect"].width > 3  # exclude lines
        and p["rect"].height > 3  # exclude lines
        and p["type"][0] == "f"  # only fill or fill-stroke
        and p["fill_opacity"] == 1  # not transparent
    ]
    if seq_paths == []:
        print("No solid drawings on page %i.")
        continue
    print(
        "Page %i has %i solid drawings. Sequence numbers:"
        % (page.number, len(seq_paths))
    )
    print([s[0] for s in seq_paths])
    print()
    textspans = page.get_texttrace()  # get the text spans

    for span in textspans:
        span_seqno = span["seqno"]  # the text painting sequence number
        span_rect = fitz.Rect(span["bbox"])
        paths = [  # restrict to overlapping drawings occurring "later"
            p
            for p in seq_paths
            if p[0] >= span_seqno and not (p[1] & span_rect).is_empty
        ]
        if paths == []:  # this text span is clean
            continue
        for seqno, draw_rect in paths:  # this iterates over only 1 item normally
            problems = []
            for ch in span["chars"]:  # walk through characters in the span
                char = chr(ch[0])  # the character
                bbox = fitz.Rect(ch[3])  # its bbox
                if abs(bbox & draw_rect) >= abs(bbox) * 0.8:
                    problems.append(char)
            if problems != []:
                print("Drawing %i covers %s." % (seqno, ", ".join(problems)))


Processing file 'blacked.pdf' with 1 pages.

Page 0 has 15 solid drawings. Sequence numbers:
[26, 27, 30, 33, 34, 37, 40, 41, 44, 47, 48, 51, 169, 171, 173]

Drawing 169 covers e, s, t, a, t, e.
Drawing 171 covers E, s, t, a, t, e.
Drawing 173 covers E, s, t, a, t, e.
