Use OCR to support structuring in metadata editor #5476

oliver-stoehr · 2022-12-06T16:16:18Z

The process of structuring in the metadata editor should be supported by a small tool to increase the level of automation.
It should be possible to select one or multiple pages containing the table of contents and perform OCR them.
The result of the OCR should be used to create structure elements and semi automatically insert the text. The workflow in the editor could be:

The user selects the page containing the table of contents.
The user clicks a new button "automatic structuring". This performs the OCR on the selected page.
The user can view the analysed page and click on one entry of the table of contents. Clicking on one entry will copy the recognised text.
A dialog will appear to select the type of the structure element to be created.
The user should click into the structure tree to select the position of the new structure element.
The new structure element is created and the selected text from step 3 is pasted into the title metadata of the element.

Requirement for this feature would be that @OCR-D is implemented in Kitodo.Production to allow a convenient way to perform the OCR.

illipsum · 2022-12-07T08:41:43Z

Very good idea! I strongly support this feature request.

markusweigelt · 2022-12-09T09:30:15Z

Currently an implementation project is underway for the integration of OCR-D and Kitodo. https://github.com/slub/ocrd_kitodo

In the context of our discussion, automated structuring was a topic, but for reasons not a main priority yet.

However, this issue shows us that there is a need for this. We will discuss this and possible include it in the integration.

bertsky · 2023-02-02T10:27:22Z

In the context of our discussion, automated structuring was a topic, but for reasons not a main priority yet.

To be precise, what we discussed so far is a different scenario: fully automatic structuring – as an external script task (backed by OCR-D) at the end of the Production workflow, when the METS has already been exported and can therefore be amended by automatic structuring tools (which naturally operate on METS/MODS directly).

But I also fully agree with you @oliver-stoehr that we should also support the semi-automatic scenario from our side (ocrd_kitodo). Currently, you can already have the OCR (script task) run on a single page and get back ALTO, so that should not be a problem. We can also try to provide OCR-D workflows dedicated for optimal layout analysis and recognition on index (toc) pages. Most of the work for this feature will be on the Kitodo/UI side though (esp. in step 3 where you need to embed some form of ALTO viewer, perhaps from Presentation).

bertsky · 2023-02-02T12:16:10Z

Additional ideas by @BartChris (originally reported to ocrd_kitodo):

(implementation) instead of a fully functional ALTO viewer, build a poor man's viewer which simply parses the TextString into a list of text lines and presents them in a text editor pane
(feature) could also be used in a "preview" context (outside of the metadata editor)
(feature) indicate which pages have been processed already (by looking into the filesystem, which image files there are FULLTEXT files for)
(feature) when adding/removing/moving pages, change the existing FULLTEXT files as well (so as to keep in sync)

BartChris · 2023-02-02T14:42:36Z

@solth I would like to suggest the UI parts of the OCR-D integration (needs more precision on what is needed) for the development fund but i cannot assign labels. Could you mark the issue maybe as candidate for the development fund?

solth · 2023-02-02T17:11:50Z

@solth I would like to suggest the UI parts of the OCR-D integration (needs more precision on what is needed) for the development fund but i cannot assign labels. Could you mark the issue maybe as candidate for the development fund?

@BartChris I updated your role in the repository - could you check if you can assign labels to issues again?

BartChris · 2023-02-02T17:14:18Z

@solth Great, it works now.

aetherfaerber · 2023-03-07T11:12:21Z

In the context of our discussion, automated structuring was a topic, but for reasons not a main priority yet.

To be precise, what we discussed so far is a different scenario: fully automatic structuring – as an external script task (backed by OCR-D) at the end of the Production workflow, when the METS has already been exported and can therefore be amended by automatic structuring tools (which naturally operate on METS/MODS directly).

Thank you for the clarification. I have made a suggestion towards such fully automatic structuring at #5573 (comment) albeit coming from a different scenario. As far as I understand the implementation already present in the BAR's fork modifies the metadata before the images are loaded into the metadata editor (since the separator pages are removed before that).

Also, as far as I can see one is free to access the same METS file that Kitodo is going to write metadata entered in the metadata editor into through the metadata folder and change its contents at any point as one sees fit.

Could you explain why you think the file could/should only be amended after the export?

I would like to know how different this two scenarios are and if joining the suggestions makes any sense or if we would need a completely new proposal for “fully automatic structuring by OCR”.

bertsky · 2023-03-07T13:37:33Z

Also, as far as I can see one is free to access the same METS file that Kitodo is going to write metadata entered in the metadata editor into through the metadata folder and change its contents at any point as one sees fit.

No, according to @Kathrin-Huber IIRC that file (and its schema) is:

parsed back on the fly into the runtime model; the parser cannot cope with everything you can do in METS/MODS
not a stable format but subject to change with every version as developers see fit

So even if you can work out something for a particular version, it might break with the next update.

Could you explain why you think the file could/should only be amended after the export?

That's the only reason. An external tool cannot know our exact schema. (The additions to Kitodo you sketched in #5573 – scanning physical barcode inserts with structuring information – would require an internal structuring tool, so that can be anywhere in the workflow.)

IMO the 3 scenarios are quite different:

semi-automatic support for structuring (this issue) just does a preliminary OCR of single pages: whereever the OCR is placed in the workflow (as a script task)
fully automatic structuring via OCR results runs visual-textual analysis tools across the entire document and tries to infer the structMap: after export, for the reason given above
fully automatic structuring via barcode pages just decodes the insert pages, it needs quite a few UI extensions and should be tightly integrated into the structure model: should be an internal tool

BartChris · 2023-03-07T14:00:17Z

related to: #3837

aetherfaerber · 2023-03-07T15:03:06Z

Also, as far as I can see one is free to access the same METS file that Kitodo is going to write metadata entered in the metadata editor into through the metadata folder and change its contents at any point as one sees fit.

No, according to @Kathrin-Huber IIRC that file (and its schema) is:
* parsed back on the fly into the runtime model; the parser cannot cope with everything you can do in METS/MODS

* not a stable format but subject to change with every version as developers see fit
So even if you can work out something for a particular version, it might break with the next update.

Great! Thank you for pointing this out.

Could you explain why you think the file could/should only be amended after the export?

That's the only reason. An external tool cannot know our exact schema. (The additions to Kitodo you sketched in #5573 – scanning physical barcode inserts with structuring information – would require an internal structuring tool, so that can be anywhere in the workflow.)

IMO the 3 scenarios are quite different:
1. semi-automatic support for structuring (this issue) just does a preliminary OCR of single pages: whereever the OCR is placed in the workflow (as a script task)

2. fully automatic structuring via OCR results runs visual-textual analysis tools across the entire document and tries to infer the structMap: after export, for the reason given above

3. fully automatic structuring via barcode pages just decodes the insert pages, it needs quite a few UI extensions and should be tightly integrated into the structure model: should be an internal tool

Ok. I will retract my suggestion to have internal UI extensions for controlling fully automatic structuring then and will try to find out where my other suggestions (automatic structuring by imported metadata or filenames) might fit in. Thank you!

oliver-stoehr added the feature label Dec 6, 2022

markusweigelt mentioned this issue Dec 9, 2022

Semi-automatic structuring in metadata editor using OCR slub/ocrd_kitodo#60

Open

BartChris added the development fund 2023 A candidate for the Kitodo e.V. development fund. label Feb 2, 2023

solth added the metadata editor label Feb 24, 2023

aetherfaerber mentioned this issue Mar 7, 2023

Generalisation of the Function "evaluate docket" #5573

Open

BartChris mentioned this issue Mar 20, 2023

Support for OCR results and maybe also OCR-D processing control from the Kitodo side #5600

Open

solth removed the development fund 2023 A candidate for the Kitodo e.V. development fund. label Jul 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use OCR to support structuring in metadata editor #5476

Use OCR to support structuring in metadata editor #5476

oliver-stoehr commented Dec 6, 2022

illipsum commented Dec 7, 2022

markusweigelt commented Dec 9, 2022

bertsky commented Feb 2, 2023

bertsky commented Feb 2, 2023

BartChris commented Feb 2, 2023 •

edited

Loading

solth commented Feb 2, 2023

BartChris commented Feb 2, 2023

aetherfaerber commented Mar 7, 2023 •

edited

Loading

bertsky commented Mar 7, 2023 •

edited

Loading

BartChris commented Mar 7, 2023 •

edited

Loading

aetherfaerber commented Mar 7, 2023

Use OCR to support structuring in metadata editor #5476

Use OCR to support structuring in metadata editor #5476

Comments

oliver-stoehr commented Dec 6, 2022

illipsum commented Dec 7, 2022

markusweigelt commented Dec 9, 2022

bertsky commented Feb 2, 2023

bertsky commented Feb 2, 2023

BartChris commented Feb 2, 2023 • edited Loading

solth commented Feb 2, 2023

BartChris commented Feb 2, 2023

aetherfaerber commented Mar 7, 2023 • edited Loading

bertsky commented Mar 7, 2023 • edited Loading

BartChris commented Mar 7, 2023 • edited Loading

aetherfaerber commented Mar 7, 2023

BartChris commented Feb 2, 2023 •

edited

Loading

aetherfaerber commented Mar 7, 2023 •

edited

Loading

bertsky commented Mar 7, 2023 •

edited

Loading

BartChris commented Mar 7, 2023 •

edited

Loading