Skip to content

Is it possible to extract the order of elements in the pdf? #1581

Answered by JorjMcKie
ayusonkj asked this question in Q&A
Discussion options

You must be logged in to vote

Yes, this is possible. Not completely straightforwardly though.
For methods under my (PyMuPDF) complete control, I am returning a "seqno" (sequence number) item. This currently pertains to

  • Page.get_drawings()
  • Page.get_texttrace()

Then there is method Page.get_bboxes() which is a list of all painting actions that a page performs to build its appearance. The sequence of the list items equals the sequence of the page's actions. Each item of that list is a tuple (type, rect_like) where "type" is the action type as a string like "fill-text" / "fill-image" / ..., and the rect_like bbox of the action.
The mentioned items "seqno" from above refer to the index in this list.

Other methods are mor…

Replies: 3 comments 12 replies

Comment options

You must be logged in to vote
7 replies
@JorjMcKie
Comment options

@ayusonkj
Comment options

@qwertynik
Comment options

@JorjMcKie
Comment options

@qwertynik
Comment options

Answer selected by ayusonkj
Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
4 replies
@JorjMcKie
Comment options

@benmagos
Comment options

@JorjMcKie
Comment options

@benmagos
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants