Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running with incomplete descriptions #58

Merged
merged 35 commits into from
Jan 10, 2022

Conversation

bertsky
Copy link
Member

@bertsky bertsky commented Dec 6, 2021

Expands on #56, additionally fixes #57.

@bertsky bertsky requested a review from wrznr December 6, 2021 07:58
@bertsky
Copy link
Member Author

bertsky commented Dec 6, 2021

27dffe8 is warranted IMO because of DTABf details for pb attributes.

71fd269 is useful if your input does not contain the images under DEFAULT (but, say, ORIGINAL or OCR-D-IMG). Also, having a parameter for the output file makes integration into scripts and makefiles easier.

@bertsky bertsky requested a review from kba December 6, 2021 23:01
.circleci/config.yml Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
mets_mods2tei/api/mets.py Show resolved Hide resolved
mets_mods2tei/scripts/mets_mods2tei.py Show resolved Hide resolved
@bertsky bertsky mentioned this pull request Dec 7, 2021
- for `sourceDesc/biblFull/titleStmt/title/@level`, only use allowed values
  (m/a/j/s/u), and try mapping from top-level logical `div/@TYPE`
- for `sourceDesc/bibl/@type`, try mapping from top-level logical `div/@TYPE`
- instead of ignoring `titleInfo` main and part/volume titles,
  - prefer main title from titleInfo over top-level logical `div/@LABEL`
  - prefer `titleInfo/@type=uniform` or empty over abbrev/alternative/translated
  - also parse and add `partNumber/partName` or `part`
- instead of spilling titleInfo between `fileDesc/titleStmt` and `biblFull/titleStmt`,
  copy the former to the latter when complete, and then add `@level` etc
@bertsky
Copy link
Member Author

bertsky commented Dec 13, 2021

The last 2 commits improve the coverage and conformance of title and identifier metadata. It affects #36, but there is still much to do.

README.md Outdated Show resolved Hide resolved
creation.text = collection
profile_desc = self.tree.xpath('//tei:msDesc/tei:msIdentifier', namespaces=ns)[0]
coll = etree.SubElement(profile_desc, "%scollection" % TEI)
coll.text = collection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check whether this is DTABf? I think, I used this more abstract solution since collection of the digital work is not necessarily collection of the original.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not in DTABf at all – sry. But neither is creation (and it seems like a misfit).

You're right in that it's not clear whether relatedItem/@type=series applies to the physical copy or digital presentation (which could perhaps be differentiated on the TEI side by msIdentifier vs objectIdentifier IIUC).

Perhaps the whole thing should rather enter biblFull/seriesStmt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right in that it's not clear whether relatedItem/@type=series applies to the physical copy or digital presentation (which could perhaps be differentiated on the TEI side by msIdentifier vs objectIdentifier IIUC).

Perhaps the whole thing should rather enter biblFull/seriesStmt?

I am now certain that's the right place. And to differentiate between series of physical copy and series of digital presentation, we could use fileDesc/sourceDesc/biblFull/seriesStmt vs. fileDesc/seriesStmt (which is also allowed by DTABf RNG, but not documented).

Let's discuss further under #44!

Copy link
Member

@wrznr wrznr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all: Many thanks for this great contribution.

Personal communication on some details requested.

README.md Outdated Show resolved Hide resolved
@wrznr wrznr merged commit 04148d5 into slub:master Jan 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fulltext: fallback to physical structmap if no logical is available
3 participants