question about how to approach bonding box problem #99

klebs6-x · 2022-03-17T02:36:13Z

My PDF's have a lot of math, symbols, figures, etc.

is there any way you know of to extract text from a page but only within one of several bounding boxes?

I basically want to set up a feedback loop where I:

iterate through the pages of the pdf
set ordered bounding boxes visually on each page
automatically extract and concatenate text from these bounding boxes, in their indicated order (from step 2)

Is this doable? is there a simple way to do this? what do you think?

stefan6419846 · 2023-02-16T07:48:57Z

As far as I am aware, not for now if you use this Python wrapper. Nevertheless, if #111 is available, you should be able to implement such a functionality yourself using the bounding boxes of each word.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about how to approach bonding box problem #99

question about how to approach bonding box problem #99

klebs6-x commented Mar 17, 2022

stefan6419846 commented Feb 16, 2023

question about how to approach bonding box problem #99

question about how to approach bonding box problem #99

Comments

klebs6-x commented Mar 17, 2022

stefan6419846 commented Feb 16, 2023