Skip to content
This repository has been archived by the owner on Jan 17, 2022. It is now read-only.

[Enhancement] Page-XML extractor #30

Closed
EvertonTomalok opened this issue Sep 23, 2019 · 2 comments
Closed

[Enhancement] Page-XML extractor #30

EvertonTomalok opened this issue Sep 23, 2019 · 2 comments

Comments

@EvertonTomalok
Copy link

To adapt the script to extract information/coordinates about page, from XML's formats most knowledge in the industry, like YOLO and PASCAL/VOC.

Observation: I could help with this task.

@mrocr
Copy link

mrocr commented Sep 23, 2019

@EvertonTomalok
Can you clarify more? what do you mean extract information/coordinates about page.

  • Are you talking about improving the model backbone?
  • Are you talking about improving the baseline to textline extractor?

@lquirosd
Copy link
Owner

To adapt the script to extract information/coordinates about page, from XML's formats most knowledge in the industry, like YOLO and PASCAL/VOC.

Observation: I could help with this task.

Hi,
YOLO and PASCAL/VOC formats are very focused on image segmentation data, while PAGE-XML is focused on Document image representation, this is not just the segmentation of the document but the relationship between objects (e.g. reading order), the data in the documents (transcription, probabilistic index, modernization, notes. ....) and of course the metadata of the document itself.

If you think that YOLO/PASCAL format converted is useful for the community, please feel free to contribute (Send a pull request).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants