-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds XML export method to DocumentBuilder #544
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good to me overall, I only think we should move everything into a Page.export_as_xml
method which will be called by Document.export_as_xml
Let me know what you think 👌
@fg-mindee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I added a few suggestions: would you mind adding more comment in your code please? So that people understand what the actual code is expected to do :)
@fg-mindee |
@fg-mindee Have a nice day 😃 |
Codecov Report
@@ Coverage Diff @@
## main #544 +/- ##
==========================================
+ Coverage 96.04% 96.08% +0.03%
==========================================
Files 109 109
Lines 4198 4236 +38
==========================================
+ Hits 4032 4070 +38
Misses 166 166
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again! I added some comments, I think we're almost ready for merge
@fg-mindee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last changes to do and we can merge :)
@fg-mindee |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few typos left 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the edits 🙏
FYI @felixdittrich92 If that's not on purpose, I would suggest sticking to your main account ;) |
@fg-mindee
@charlesmindee
feat: adds the option to export the results in XML (hocr) format (like tesseract)
This request also offers the possibility to convert the documents into PDFs with a text layer
As with
render()
, the results depend on the correct division into blocks / lines and correct sorting of the boxesResolves: #512
Note:
#512 can be closed after adding an example/tutorial how to use this output to generate PDF Files with text layer