-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
page dimensions #3
Comments
In the library, you should be able to do In the command-line tool, there's currently no way to output page dimensions. However, adding a subcommand to do that wouldn't be terribly difficult. I'll hope to get this in v0.3.0.
Yeah, I think it might be worth adding some more structure to the JSON output, so that it looks something like:
Thoughts on that? In CSV output, maybe there's just a separate command, e.g., |
Done! Richer JSON representation now in v0.3.0. #4 |
That's awesome. Thanks again! |
Is there a way to return overall page dimensions? Or alternatively, relative positions (i.e. in fractional terms--so like 0.56 of the page)?
Use case is comparing / finding words at comparable positions in documents that have different sizes due to different prior processing. (This is also required for accurately displaying word positions as overlays on a pdf). One could get at this by using relative positions (and I guess doctop would be prior_pages + current relative position). If you just captured relative position you'd probably also want to add an orientation variable--though I guess that would be determinable based on letter box proportions.
Having page_width and page_height in every line in the csv seems awkward--but would work. In json output one could just add it as a variable outside the rest of the data. Maybe that's the cleanest approach. Do you have any thoughts?
The text was updated successfully, but these errors were encountered: