Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable handling of Newspapers with Article Structure #886

Open
stefanCCS opened this issue Jan 18, 2023 · 1 comment
Open

Enable handling of Newspapers with Article Structure #886

stefanCCS opened this issue Jan 18, 2023 · 1 comment
Labels
⚙ feature A new feature or enhancement.

Comments

@stefanCCS
Copy link

stefanCCS commented Jan 18, 2023

Description

A Newspaper issue might be digitized with identifying Articals and its content.
Besides the (OCRed) text, which is located in the ALTO files, this information is available in the logical section of the METS file.
Typically it looks like this:

<mets xmlns="http://www.loc.gov/METS/"  ...
    <metsHdr ...
    <dmdSec ...
    <amdSec ...
    <amdSec ...
    <fileSec ...
    <structMap LABEL="Physical Structure" TYPE="PHYSICAL">
        ...
    </structMap>
    <structMap LABEL="Logical Structure" TYPE="LOGICAL">
        ...
        <div ID="DIVL3" TYPE="ISSUE" LABEL="MyNewspaperTitle no. 123 from 01.01.2000">

            <!-- This is the title part of the Newspapers -->
            <div ID="DIVL4" TYPE="TITLE_SECTION">
                <div ID="DIVL5" TYPE="HEADLINE" ORDER="1"> ...
                <div ID="DIVL6" TYPE="TEXTBLOCK" ORDER="2"> ...
            </div>
            
            <!-- This is the Article part of the Newspapers -->
            <div ID="DIVL11" TYPE="CONTENT">
                <div ID="DIVL12" TYPE="ARTICLE" DMDID="MODSMD_ARTICLE1" LABEL="TAGESRUNDSCHAU"> ...
                <div ID="DIVL26" TYPE="ARTICLE" DMDID="MODSMD_ARTICLE2" LABEL="Just another Article title ..."> ...
                ...
                
                <!-- An Articel might have a "HEADING"-part, paragraph(s) and illustration(s) incl. caption -->
                <!-- Always(!) there is the "fptr" to the according element in the ALTO file! -->
                <!-- With this information to an article the according paragraph(s) incl. text plus images can be made visible in the Presentation -->

image

                <div ID="DIVL61" TYPE="ARTICLE" DMDID="MODSMD_ARTICLE6" LABEL="Just another Article title ..."">
                    <div ID="DIVL62" TYPE="HEADING">
                        <div ID="DIVL63" TYPE="TITLE">
                            <fptr>
                                <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00027"/>
                            </fptr>
                        </div>
                    </div>
                    <div ID="DIVL64" TYPE="BODY">
                        <div ID="DIVL65" TYPE="PARAGRAPH" ORDER="1">
                            <div ID="DIVL66" TYPE="TEXT">
                                <fptr>
                                    <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00028"/>
                                </fptr>
                            </div>
                        </div>
                        <div ID="DIVL67" TYPE="PARAGRAPH" ORDER="2">
                            <div ID="DIVL68" TYPE="TEXT">
                                <fptr>
                                    <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00029"/>
                                </fptr>
                            </div>
                        </div>
                        <div ID="DIVL69" TYPE="PARAGRAPH" ORDER="3">
                            <div ID="DIVL70" TYPE="TEXT">
                                <fptr>
                                    <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00030"/>
                                </fptr>
                            </div>
                        </div>
                        <div ID="DIVL71" TYPE="PARAGRAPH" ORDER="4">
                            <div ID="DIVL72" TYPE="TEXT">
                                <fptr>
                                    <seq>
                                        <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00031"/>
                                        <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00032"/>
                                    </seq>
                                </fptr>
                            </div>
                        </div>
                        ...
                        <div ID="DIVL73" TYPE="ILLUSTRATION" ORDER="1" DMDID="MODSMD_PICT2" LABEL="...(caption text ...)">
                            <div ID="DIVL74" TYPE="IMAGE">
                                <fptr>
                                    <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_CB00002"/>
                                </fptr>
                            </div>
                            <div ID="DIVL75" TYPE="CAPTION">
                                <fptr>
                                    <area BETYPE="IDREF" FILEID="ALTO00001" BEGIN="P1_TB00033"/>
                                </fptr>
                            </div>
                        </div>
                    </div>
                
                    <!-- Articles might also be grouped ... -->

image

                    <div ID="DIVL209" TYPE="SECTION" DMDID="MODSMD_SECTION1" LABEL="Ein Querschnitt durchs Tagesgeschehen">
                        <div ID="DIVL210" TYPE="HEADING"> ...
                        <div ID="DIVL212" TYPE="ARTICLE" ORDER="1" DMDID="MODSMD_ARTICLE19" LABEL="200 Millionen Mark vom Bund für die Förderung der Begabten">
                        <div ID="DIVL218" TYPE="ARTICLE" ORDER="2" DMDID="MODSMD_ARTICLE20" LABEL="Privatuniversität Herdecke in finanziellen Schwierigkeiten">
                        ...
                    </div>
...

==> So, there is a view from KITODO.PRESENTATION needed, which shall be "Article"-based, in addition to the paged based view.
E.g. like this:
image
Please be aware, that this means in general to show something which might belong to more than one page!

In addition to the viewing part, the search function also needs to be able to search in the text and find the according Articles (not just the text on the page itself).

Please look at this example, how this might look like in total: https://cambridge.dlconsulting.com/?a=d&d=Chronicle19120127-01

You might see this feature also more general:
KITODO.PRESENTATION shall be able to highlight from a chosen logical structure at the left side all ALTO-Elements on the page(s) which are referenced via <fptr> <area>.

Expected Benefits of this Development

This would bring KITODO.PRESENTATION to a status where it can compete with commercial presentation tools in the area of Newspaper portals.

Estimated Costs and Complexity

The estimated costs are high for this feature.

@stefanCCS stefanCCS added the ⭐ development fund 2022 A candidate for the Kitodo e.V. development fund. label Jan 18, 2023
@sebastian-meyer sebastian-meyer added ⭐ development fund 2023 A candidate for the Kitodo e.V. development fund. and removed ⭐ development fund 2022 A candidate for the Kitodo e.V. development fund. labels Jan 18, 2023
@sebastian-meyer sebastian-meyer changed the title [FUND] Enable KITODO.PRESENTATION to handle Newspapers with Artical Structure [FUND] Enable KITODO.PRESENTATION to handle Newspapers with Article Structure Jan 30, 2023
@sebastian-meyer sebastian-meyer changed the title [FUND] Enable KITODO.PRESENTATION to handle Newspapers with Article Structure [FUND] Enable handling of Newspapers with Article Structure Jan 30, 2023
@sebastian-meyer sebastian-meyer added the ⚙ feature A new feature or enhancement. label Mar 20, 2023
@sebastian-meyer
Copy link
Member

Votes: 1

@sebastian-meyer sebastian-meyer changed the title [FUND] Enable handling of Newspapers with Article Structure Enable handling of Newspapers with Article Structure Jul 21, 2023
@sebastian-meyer sebastian-meyer removed the ⭐ development fund 2023 A candidate for the Kitodo e.V. development fund. label Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚙ feature A new feature or enhancement.
Projects
None yet
Development

No branches or pull requests

2 participants