PDF annotation: tables, figures, formulas should be excluded #22

lfoppiano · 2017-03-16T10:08:43Z

I've noticed that tables and figures, with the current amount of training data, are incorrectly annotated. Document hal-00643787.

See example of a figure:

here an example on a table:

here a formula:

kermitt2 · 2017-03-16T15:50:06Z

yes I was a bit in a hurry so I didn't go fine-grained for processing the PDF via GROBID. Right now the quantity parser is applied on the title, abstract, the whole full text and the annexes (the rest is ignored).

So to be done : in the body part, we should exclude formula, all the ref markers (ref of biblio [Foppiano and al. 2017], figure, formula, etc.) present in the body and figures/tables for the moment.

lfoppiano · 2017-03-17T15:57:57Z

OK, so I excluded <formula>, <table>, <equation> and <figure>

lfoppiano · 2017-03-17T15:58:26Z

here how it looks a figure:

kermitt2 · 2017-03-17T16:06:02Z

Nice!
Maybe we could also let the model annotate the caption and figure/table titles?

Then it requires the figure and table model to be applied to the LayoutTokens labelled with TaggingLabels.FIGURE and TaggingLabels.TABLE

…h the quantity model #22

lfoppiano · 2018-11-11T13:27:11Z

Now only captions are processed:

lfoppiano · 2019-07-02T00:30:07Z

I think this is solved. Sometimes tables are annotated anyway, but this is more due to the table model.

lfoppiano self-assigned this Mar 17, 2017

lfoppiano changed the title ~~PDF annotation: tables, figures, formulas should be excluded (at least for the moment)~~ PDF annotation: tables, figures, formulas should be excluded Mar 17, 2017

lfoppiano added a commit that referenced this issue Mar 17, 2017

excluding figures, equations, tables and formulas to be annotated wit…

1fc9193

…h the quantity model #22

everzeni mentioned this issue Aug 24, 2017

Reference markers, formulas and other irrelevant numbers #36

Closed

lfoppiano added a commit that referenced this issue Nov 11, 2018

Parsing figures and tables to process quantities in their captions #22

1cea5bb

lfoppiano closed this as completed Jul 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF annotation: tables, figures, formulas should be excluded #22

PDF annotation: tables, figures, formulas should be excluded #22

lfoppiano commented Mar 16, 2017 •

edited

Loading

kermitt2 commented Mar 16, 2017

lfoppiano commented Mar 17, 2017

lfoppiano commented Mar 17, 2017

kermitt2 commented Mar 17, 2017

lfoppiano commented Nov 11, 2018

lfoppiano commented Jul 2, 2019

PDF annotation: tables, figures, formulas should be excluded #22

PDF annotation: tables, figures, formulas should be excluded #22

Comments

lfoppiano commented Mar 16, 2017 • edited Loading

kermitt2 commented Mar 16, 2017

lfoppiano commented Mar 17, 2017

lfoppiano commented Mar 17, 2017

kermitt2 commented Mar 17, 2017

lfoppiano commented Nov 11, 2018

lfoppiano commented Jul 2, 2019

lfoppiano commented Mar 16, 2017 •

edited

Loading