Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OCR to support structuring in metadata editor #5476

Open
oliver-stoehr opened this issue Dec 6, 2022 · 11 comments
Open

Use OCR to support structuring in metadata editor #5476

oliver-stoehr opened this issue Dec 6, 2022 · 11 comments

Comments

@oliver-stoehr
Copy link
Collaborator

The process of structuring in the metadata editor should be supported by a small tool to increase the level of automation.
It should be possible to select one or multiple pages containing the table of contents and perform OCR them.
The result of the OCR should be used to create structure elements and semi automatically insert the text. The workflow in the editor could be:

  1. The user selects the page containing the table of contents.
  2. The user clicks a new button "automatic structuring". This performs the OCR on the selected page.
  3. The user can view the analysed page and click on one entry of the table of contents. Clicking on one entry will copy the recognised text.
  4. A dialog will appear to select the type of the structure element to be created.
  5. The user should click into the structure tree to select the position of the new structure element.
  6. The new structure element is created and the selected text from step 3 is pasted into the title metadata of the element.

Requirement for this feature would be that @OCR-D is implemented in Kitodo.Production to allow a convenient way to perform the OCR.

@illipsum
Copy link

illipsum commented Dec 7, 2022

Very good idea! I strongly support this feature request.

@markusweigelt
Copy link
Collaborator

Currently an implementation project is underway for the integration of OCR-D and Kitodo. https://github.com/slub/ocrd_kitodo

In the context of our discussion, automated structuring was a topic, but for reasons not a main priority yet.

However, this issue shows us that there is a need for this. We will discuss this and possible include it in the integration.

@bertsky
Copy link

bertsky commented Feb 2, 2023

In the context of our discussion, automated structuring was a topic, but for reasons not a main priority yet.

To be precise, what we discussed so far is a different scenario: fully automatic structuring – as an external script task (backed by OCR-D) at the end of the Production workflow, when the METS has already been exported and can therefore be amended by automatic structuring tools (which naturally operate on METS/MODS directly).

But I also fully agree with you @oliver-stoehr that we should also support the semi-automatic scenario from our side (ocrd_kitodo). Currently, you can already have the OCR (script task) run on a single page and get back ALTO, so that should not be a problem. We can also try to provide OCR-D workflows dedicated for optimal layout analysis and recognition on index (toc) pages. Most of the work for this feature will be on the Kitodo/UI side though (esp. in step 3 where you need to embed some form of ALTO viewer, perhaps from Presentation).

@bertsky
Copy link

bertsky commented Feb 2, 2023

Additional ideas by @BartChris (originally reported to ocrd_kitodo):

  • (implementation) instead of a fully functional ALTO viewer, build a poor man's viewer which simply parses the TextString into a list of text lines and presents them in a text editor pane
  • (feature) could also be used in a "preview" context (outside of the metadata editor)
  • (feature) indicate which pages have been processed already (by looking into the filesystem, which image files there are FULLTEXT files for)
  • (feature) when adding/removing/moving pages, change the existing FULLTEXT files as well (so as to keep in sync)

@BartChris
Copy link
Collaborator

BartChris commented Feb 2, 2023

@solth I would like to suggest the UI parts of the OCR-D integration (needs more precision on what is needed) for the development fund but i cannot assign labels. Could you mark the issue maybe as candidate for the development fund?

@solth
Copy link
Member

solth commented Feb 2, 2023

@solth I would like to suggest the UI parts of the OCR-D integration (needs more precision on what is needed) for the development fund but i cannot assign labels. Could you mark the issue maybe as candidate for the development fund?

@BartChris I updated your role in the repository - could you check if you can assign labels to issues again?

@BartChris BartChris added the development fund 2023 A candidate for the Kitodo e.V. development fund. label Feb 2, 2023
@BartChris
Copy link
Collaborator

@solth Great, it works now.

@aetherfaerber
Copy link

aetherfaerber commented Mar 7, 2023

In the context of our discussion, automated structuring was a topic, but for reasons not a main priority yet.

To be precise, what we discussed so far is a different scenario: fully automatic structuring – as an external script task (backed by OCR-D) at the end of the Production workflow, when the METS has already been exported and can therefore be amended by automatic structuring tools (which naturally operate on METS/MODS directly).

Thank you for the clarification. I have made a suggestion towards such fully automatic structuring at #5573 (comment) albeit coming from a different scenario. As far as I understand the implementation already present in the BAR's fork modifies the metadata before the images are loaded into the metadata editor (since the separator pages are removed before that).

Also, as far as I can see one is free to access the same METS file that Kitodo is going to write metadata entered in the metadata editor into through the metadata folder and change its contents at any point as one sees fit.

Could you explain why you think the file could/should only be amended after the export?

I would like to know how different this two scenarios are and if joining the suggestions makes any sense or if we would need a completely new proposal for “fully automatic structuring by OCR”.

@bertsky
Copy link

bertsky commented Mar 7, 2023

Also, as far as I can see one is free to access the same METS file that Kitodo is going to write metadata entered in the metadata editor into through the metadata folder and change its contents at any point as one sees fit.

No, according to @Kathrin-Huber IIRC that file (and its schema) is:

  • parsed back on the fly into the runtime model; the parser cannot cope with everything you can do in METS/MODS
  • not a stable format but subject to change with every version as developers see fit

So even if you can work out something for a particular version, it might break with the next update.

Could you explain why you think the file could/should only be amended after the export?

That's the only reason. An external tool cannot know our exact schema. (The additions to Kitodo you sketched in #5573 – scanning physical barcode inserts with structuring information – would require an internal structuring tool, so that can be anywhere in the workflow.)

IMO the 3 scenarios are quite different:

  1. semi-automatic support for structuring (this issue) just does a preliminary OCR of single pages: whereever the OCR is placed in the workflow (as a script task)
  2. fully automatic structuring via OCR results runs visual-textual analysis tools across the entire document and tries to infer the structMap: after export, for the reason given above
  3. fully automatic structuring via barcode pages just decodes the insert pages, it needs quite a few UI extensions and should be tightly integrated into the structure model: should be an internal tool

@BartChris
Copy link
Collaborator

BartChris commented Mar 7, 2023

related to: #3837

@aetherfaerber
Copy link

Also, as far as I can see one is free to access the same METS file that Kitodo is going to write metadata entered in the metadata editor into through the metadata folder and change its contents at any point as one sees fit.

No, according to @Kathrin-Huber IIRC that file (and its schema) is:

* parsed back on the fly into the runtime model; the parser cannot cope with everything you can do in METS/MODS

* not a stable format but subject to change with every version as developers see fit

So even if you can work out something for a particular version, it might break with the next update.

Great! Thank you for pointing this out.

Could you explain why you think the file could/should only be amended after the export?

That's the only reason. An external tool cannot know our exact schema. (The additions to Kitodo you sketched in #5573 – scanning physical barcode inserts with structuring information – would require an internal structuring tool, so that can be anywhere in the workflow.)

IMO the 3 scenarios are quite different:

1. semi-automatic support for structuring (this issue) just does a preliminary OCR of single pages: whereever the OCR is placed in the workflow (as a script task)

2. fully automatic structuring via OCR results runs visual-textual analysis tools across the entire document and tries to infer the structMap: after export, for the reason given above

3. fully automatic structuring via barcode pages just decodes the insert pages, it needs quite a few UI extensions and should be tightly integrated into the structure model: should be an internal tool

Ok. I will retract my suggestion to have internal UI extensions for controlling fully automatic structuring then and will try to find out where my other suggestions (automatic structuring by imported metadata or filenames) might fit in. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants