Skip to content

Replace caret (visual whitespace) with whitespace in MARC LCCN subfields#12468

Merged
hornc merged 2 commits intointernetarchive:masterfrom
hornc:lccn-trim
Apr 28, 2026
Merged

Replace caret (visual whitespace) with whitespace in MARC LCCN subfields#12468
hornc merged 2 commits intointernetarchive:masterfrom
hornc:lccn-trim

Conversation

@hornc
Copy link
Copy Markdown
Collaborator

@hornc hornc commented Apr 28, 2026

Replace caret (visual whitespace) with whitespace in MARC LCCN subfields

Example import MARC:
https://openlibrary.org/show-records/harvard_bibliographic_metadata/20220215_038.bib.mrc:21930613:2077

This is non-standard and hopefully rare, but seems to occur occasionally.

Technical

Testing

Screenshot

Stakeholders

Copilot AI review requested due to automatic review settings April 28, 2026 23:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates MARC LCCN parsing to remove whitespace and caret (^) characters from 010$a values before extracting the LCCN.

Changes:

  • Normalize LCCN subfield values by removing spaces and ^ prior to regex extraction.

Comment thread openlibrary/catalog/marc/parse.py Outdated
@hornc hornc changed the title strip whitespace and caret (visual whitespace) from LCCNs Replace caret (visual whitespace) with whitespace in MARC LCCN subfields Apr 28, 2026
@hornc hornc merged commit 0fb88c3 into internetarchive:master Apr 28, 2026
3 checks passed
@hornc hornc deleted the lccn-trim branch April 28, 2026 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants