Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PICA: subject indexing analysis #154

Closed
pkiraly opened this issue Aug 16, 2022 · 3 comments
Closed

PICA: subject indexing analysis #154

pkiraly opened this issue Aug 16, 2022 · 3 comments
Assignees

Comments

@pkiraly
Copy link
Owner

pkiraly commented Aug 16, 2022

According to Maquis plan PICA contains subject indexing information in the following fields 045A, 045B, 045F, 045R.

@pkiraly
Copy link
Owner Author

pkiraly commented Aug 16, 2022

@nichtich Could you adjust this list?

pkiraly added a commit that referenced this issue Aug 16, 2022
@nichtich
Copy link
Collaborator

The K10plus subset limited to subject information, published as lists the following fields, related to content of a publication in its broadest sense:

003@ with internal record identifier “PPN” in subfield $0
013D type of content
013F target audience
041A keywords
044. all subject indexing fields starting with 044
045. all subject indexing fields starting with 045
144Z local library keywords
145S local library classification
145Z local library classification

Meanwhile we further indentified:

010@ language
013E musical type of document
013H additional type of document
017G and 017HURL for catalog enrichment (e.g. table of contents)
047I abstract

The definition of subject fields for MQA is likely more strict, so I'd say:

041A keywords
044. all subject indexing fields starting with 044
045. all subject indexing fields starting with 045
144Z local library keywords
145S local library classification

The current list of 044. and 045. fields can be obtained via:

curl https://format.k10plus.de/avram.pl?profile=k10plus-title | jq -r '.fields|keys[]' | grep '^04[45]'

@pkiraly
Copy link
Owner Author

pkiraly commented Sep 23, 2022

@nichtich wrote:

Attached a file that could be used as configuration file for MQA for subject headings. Each entry has

  • PICA : which pica field(s) (e.g. 045B
  • URI : which vocabulary (BARTOC entry, more details about vocabulary can be found there)
  • prefLabel: name of vocabulary
  • ID: where to take local identifier/notation from
  • notationPattern : regular expression to check local identifier/notation
  • namespace : URI namespace (for some vocabularies)
  • SRC : not relevant to this question
  • VOC : not relevant to this question

The ID is given as regular expression including subfield code and with capturing group, e.g.

"PICA": "044L/00-99",
"ID": "^7gnd/(.+)"

means that all fields 044L (any occurrence) are GND and the GND identifier is in subfield $7 preceeded by "gnd/".

The regular expression to check valid GND identifier could be part of the Avram schema but this is another task.

[
  {
    "ID": "^a(.+)",
    "PICA": "045A",
    "SRC": "^A(.+)",
    "VOC": "lcc",
    "notationPattern": "[A-Z]{1,3}([0-9]+(\\.[0-9]+)?( *\\.?[A-Z]{0,3}[0-9]*([ -]\\.?[A-Z]{0,3}[0-9]+)?)?( *[0-9]+[a-z]*)?)?",
    "namespace": "http://id.loc.gov/authorities/classification/",
    "prefLabel": { "en": "Library of Congress Classification" },
    "uri": "http://bartoc.org/en/node/486"
  },
  ...
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants