You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to extract the table of content ("Introduction", ..., "References"), I looked into the extracted html file from Burdoc. It could fairly good distinguish the headings from other items in the text. Burdoc extracted all the named outlines correctly, but also an additional item, that is not part of the TOC. It additional extracted the item "Table 4".
I use the string "" to search in the generated html file for the TOC.
There seems to be no difference, if I use Burdoc with the parameter "--no-ml-tables" or not.
Ah, I think this one might be challenging as it's a false positive for one of the rules used to identify headings (a short bold piece of text directly preceding a standard paragraph and visually spaced from any prior text). Arguably it is a heading, albeit not one that'd be presented in a standard ToC.
I wouldn't expect --no-ml-tables to change this as turning off table-finding means we don't actually try to identify tables in the text, the text the contain still goes through the main text parsing pipeline (and Burdoc doesn't yet identify captions associated with tables so it wouldn't make a difference even if the table had been found)
Trying to extract the table of content ("Introduction", ..., "References"), I looked into the extracted html file from Burdoc. It could fairly good distinguish the headings from other items in the text. Burdoc extracted all the named outlines correctly, but also an additional item, that is not part of the TOC. It additional extracted the item "Table 4".
I use the string "" to search in the generated html file for the TOC.
There seems to be no difference, if I use Burdoc with the parameter "--no-ml-tables" or not.
The.pdf
The text was updated successfully, but these errors were encountered: