Skip to content
This repository has been archived by the owner on May 11, 2022. It is now read-only.

Remove modsulator dependency #181

Closed
wants to merge 1 commit into from
Closed

Remove modsulator dependency #181

wants to merge 1 commit into from

Conversation

cbeer
Copy link
Contributor

@cbeer cbeer commented Feb 23, 2016

No description provided.

@cbeer
Copy link
Contributor Author

cbeer commented Feb 25, 2016

This copies the only the relevant normalization from modsulator into a Dor::Utils::Normalizer class. @tingulfsen and I discussed alternatives (including copying normalizer into a separate gem), but we agreed the similarity is relatively trivial and may not be worth doing.

I have a separate question for @LynnMcRae -- it's not clear to me why we're trying to normalize (only) rightsMetadata (or, it's not clear to me where unnormalized XML would actually come from). It's very possible we don't need normalization here at all.

@LynnMcRae
Copy link

@cbeer re: it's not clear to me why we're trying to normalize (only) rightsMetadata

It's not just rightsMetadata, it's also MODS (hence it's reference from modsulator), and generic normalization could be applied to identityMetadata (which currently does its own cleanup I think), contentMetadata, etc. I would go so far as to say inserting it into the chain of saving XML in most places would be useful. We're motivated by a number of factors ... foremost is that we have processes and tools producing really crappy XML ... empty tags (rightsMetadata happens to be a prime offender), random blank lines, inconsistent formatting, etc. Partially it's just formatting niceties, which would normally be less important than any number of things, but the fact that they become preserved documents makes us want consistent human readable artifacts. We also don't want syntax differences that introduce no semantic change to trigger diffs and versioning; consistent cleanup and stable formatting helps in that. We worked on common normalization so that achieving this did not require policing every source and every program and every process that produces XML, e.g., activeFedora, to "do the right thing".

@cbeer
Copy link
Contributor Author

cbeer commented Feb 25, 2016

What processes and tools are producing bad rightsMetadata?

@LynnMcRae
Copy link

Revs has a lot of this (e.g., https://argo.stanford.edu/catalog/druid:bb000kq3835)

  <use>
    <human type="creativeCommons"/>
    <machine type="creativeCommons"/>
  </use>

Assembly tools produced inconsistent results over time -- long history, many hands. MDToolkit was a nightmare for MODS and the origin of the normalizing approach. APOs, which often supply default rights, were being made in Argo, Hydrus, and by hand, and those programs were not until recently careful about that (only new Argo produces the clean defaults we want, thanks to normalizer!). I mentioned activeFedora because I understood that somewhere in the processing chain a memory model was persisted and defined elements could be materialized whether they were used or not. We've probably compensated over time for a lot of the fallout, like PURL skipping such things rather than showing empty labels, but we still occasionally get tripped up.

All this becomes moot with Fedora4 and RDF, where we get consistent output on the serializations we produce for PURL and Preservation. This is just the same principle applied further upstream since we currently traffic in the XML as a concrete resource.

@atz atz closed this Feb 26, 2016
@jcoyne jcoyne deleted the no-modsulator branch March 14, 2018 18:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants