[Enhancement] Page-XML extractor #30

EvertonTomalok · 2019-09-23T16:12:02Z

To adapt the script to extract information/coordinates about page, from XML's formats most knowledge in the industry, like YOLO and PASCAL/VOC.

Observation: I could help with this task.

mrocr · 2019-09-23T19:38:25Z

@EvertonTomalok
Can you clarify more? what do you mean extract information/coordinates about page.

Are you talking about improving the model backbone?
Are you talking about improving the baseline to textline extractor?

lquirosd · 2019-09-24T07:48:24Z

To adapt the script to extract information/coordinates about page, from XML's formats most knowledge in the industry, like YOLO and PASCAL/VOC.

Observation: I could help with this task.

Hi,
YOLO and PASCAL/VOC formats are very focused on image segmentation data, while PAGE-XML is focused on Document image representation, this is not just the segmentation of the document but the relationship between objects (e.g. reading order), the data in the documents (transcription, probabilistic index, modernization, notes. ....) and of course the metadata of the document itself.

If you think that YOLO/PASCAL format converted is useful for the community, please feel free to contribute (Send a pull request).

lquirosd closed this as completed Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Page-XML extractor #30

[Enhancement] Page-XML extractor #30

EvertonTomalok commented Sep 23, 2019

mrocr commented Sep 23, 2019

lquirosd commented Sep 24, 2019

[Enhancement] Page-XML extractor #30

[Enhancement] Page-XML extractor #30

Comments

EvertonTomalok commented Sep 23, 2019

mrocr commented Sep 23, 2019

lquirosd commented Sep 24, 2019