Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

work.is_part_of metadata attribute usage in forms conflicts with AdminSet behavior #1461

Open
little9 opened this issue Aug 9, 2017 · 37 comments
Labels
Milestone

Comments

@little9
Copy link
Contributor

little9 commented Aug 9, 2017

Descriptive summary

In Hyrax 2.x, isPartOf is used for AdminSet membership. If you create a Hyrax app that uses
isPartOf and include it in a form, you're unable to save the work and if you edit an existing object
the property is replaced with a string.

Rationale

isPartOf is a common metadata term and it would be reasonable to think that institutions may want
to have the term be editable in forms.

Expected behavior

Adding is_part_of to the work's form object behaves like any other attribute.

Actual behavior

After adding work.is_part_of to the form:

29122234-8d8446c6-7cdf-11e7-8b44-f4ac6bdaf885

Trying to save a new work that has a work.is_part_of attribute raises this error:

Couldn't find Sipity::Workflow with [WHERE `sipity_workflows`.`active` = ? AND `sipity_workflows`.`permission_template_id` IN (SELECT `permission_templates`.`id` FROM `permission_templates` WHERE `permission_templates`.`admin_set_id` = '#<ActiveTriples::Resource:0x007f78451adc08>')]

Steps to reproduce the behavior

  1. In the work's Form class add is_part_of to self.terms
  2. Save or edit a work

Related work

curationexperts/epigaea#149

@mjgiarlo
Copy link
Member

mjgiarlo commented Aug 9, 2017

Short of rejiggering the Work/AdminSet relationship to use a different predicate -- which I'm not sure we want to change in Hyrax 2.0.0, and Hyrax 3.0.0 shouldn't come out until summer 2018 -- could folks use a different predicate for their descriptive metadata?

Consulting the @samvera/hyrax-metadataists!

@mark-dce
Copy link
Contributor

@mjgiarlo yes, folks obviously could use a different predicate; however, none that we are aware of, stand out as obvious or clear candiates. Hence this ticket.

We've discussed alternatives internally to DCE and have not come up with useful alternatives - partially because the specific current use case also implements dc.has_part, dc.has_format, dc.is_format_of, dc.replaces, and dc.is_replaced_by. In this broader context, using a non-dc predicate for is_part_of (because of the collision with internal Admin Set use) breaks a clear pattern present with other descriptive metadata - i.e. there's a clear parallelism when describing things using dc.is_part_of / dc.has_part that is broken when you use other_vocab.is_subunit_of / dc.has_part.

I guess my questions would be "How much are AdminSets meant to be externally referenced or to support behavior outside the repository?" and "Are there characteristics of the AdminSet relationship that would be more clearly represented by a narrower or application specific predicate?"

If the AdminSet relations are meant to be first class relationships that are actionable outside of Samvera, then dc.is_part_of probably does make sense. If AdminSets really aren't meaningful outside the context of Samvera and it's workflows, then I'd lobby to free up the current predicate and choose something more narrowly defined.

@ghost
Copy link

ghost commented Aug 10, 2017

I tend to agree with @mark-dce - for what is essentially an internal relationship, it would seem better to have an internally minted predicate like fedora does. What would be the impact of changing it? ... would it mean any live Hyraxes would need to run an update script to 'refresh' the metadata?

@jcoyne
Copy link
Member

jcoyne commented Aug 10, 2017

When using RDF we should be able to have multiple subsystems making assertions using the same predicate. It seems to me that the bug here is that belongs_to is replacing assertions it shouldn't.

@no-reply
Copy link
Contributor

@jcoyne can you say more?

The behavior we are seeing is that, given a work with property :part_of, predicate: RDF::Vocab::DC.isPartOf, my_work.part_of returns the admin sets. This is the expected behavior, right?

I think it's possible, in principle, to filter the Relation returned by #part_of to exclude resources of type AdminSet. However, it also looks to me like that would require round trips to fedora.

Is there a solution you have in mind?

@jcoyne
Copy link
Member

jcoyne commented Aug 10, 2017

Yes, I would think it's encumbent on the form to filter just the types it expects. I don't think it's improper for Works to express their relationships to AdminSets by using DC.isPartOf. Perhaps your usage needs a more narrowly scoped predicate?

@no-reply
Copy link
Contributor

Yes, I would think it's encumbent on the form to filter just the types it expects.

As noted, this would involve retrieving any fedora objects in the Relation to check their types. My sense is that this would be prohibitively expensive.

@Alexandermay
Copy link

This is just a comment from the Tufts' perspective: isPartOf is commonly defined, and frequently used, by most cultural heritage institutions, and by limiting it to an AdminSet membership, I worry that you would be forcing institutions to pick new a predicate for this concept. That institutions would all pick the same predicate seems unlikely, and that may impact future interoperability. It strikes us that a backend local predicate for the AdminSet makes more sense, as this would allow for more straightforward mappings.

@jlhardes
Copy link
Contributor

I don't know if it's a good idea for AdminSet to move away from using dcterms:isPartOf but it does seem to be causing conflicts. The following suggested predicates can either be looked at as a different predicate to use for Work/AdminSet relationships or offered as something that could be used by folks for their descriptive metadata when adding a field to express this kind of connection:

frbr:partOf - http://purl.org/vocab/frbr/core#partOf
schema:isPartOf - http://schema.org/isPartOf (although that seems to want the object of the statement to be a CreativeWork which might not quite fit with all objects that are part of something else)
dbpedia-owl:isPartOf - http://dbpedia.org/ontology/isPartOf

All of these require the object of the statement to be a URI of some kind but that makes sense to expect that if an object in Hyrax is part of something else that something else is also a digital object that can be referenced via URI. If the use of AdminSet is as internal as folks here are describing, it might be worth considering an internally-defined predicate for the Work/AdminSet connection so this kind of conflict is avoided altogether.

@mjgiarlo
Copy link
Member

mjgiarlo commented Aug 10, 2017

@no-reply 💬

As noted, this would involve retrieving any fedora objects in the Relation to check their types. My sense is that this would be prohibitively expensive.

Could we get that information from Solr?

If not -- and I suspect the concerns about performance are shared -- I'm fine with changing this predicate in Hyrax master for the 2.0.0 release per the arguments made above.

If we do make that decision, I would politely ask that whoever makes this change also commit to the following bits of work:

  1. Create a migration tool that alters this relationship for folks who are already using DC.isPartOf for the Work-AdminSet relationship (anyone running Sufia 7, CC 1/2, Hyrax 1/master in production). It's important to have this in Hyrax master since we've said that Hyrax 2 will be the bridge release for CC 2 folks.
  2. Test the tool with a Samvera community member who's got such data to confirm it works in a real production context and doesn't have unintended side effects.

Do folks find that reasonable?

@jlhardes
Copy link
Contributor

@elrayle - Is there anything from the Collections Extensions work that would impact or inform how this relationship should be defined between AdminSet and Work?

@mark-dce
Copy link
Contributor

@mjgiarlo - The DCE team would like to make a counter-proposal that might allow us to approach the issue in a more incremental and flexible way:

PHASE ONE
As part of the Hyrax 2.0.0 release

  1. Make the predicate used to associate Works to AdminSets an easily configurable option
  2. Set the default for the predicate to match the current implementation in Hyrax 1.x (i.e. ::RDF::Vocab::DC.isPartOf)

This is sufficient to meet the needs of the Greenfield 2.x adopters who wish to use DC.isPartOf as part of their own modeling, and use a different predicate for AdminSet relationships. At the same time, it does not interfere with users migrating production implementations from Hyrax 1.x based repositories.

PHASE TWO
At a later date in the 2.x release stream
3. Select a more narrowly defined predicate, perhaps something as specific as ::RDF::Vocab::Samvera.belongsToAdminSet
4. Provide a deprecation notice that the usage will be changed in 3.x
5. Provide migration tooling as described in Mike's comment above

This allows folks with existing 1.x based implementation to plan a scheduled data (predicate) migration independently of updating their application to use the 2.x codebase. The sooner we can do steps 3 and 4 the better, but the wouldn't have to be in place at the 2.0.0 release. The hardest part here might be reaching consensus on the new predicate.

PHASE THREE
As part of the 3.0.0 release
6. Change the default for the predicate used internally to indicate AdminSet membership to something other than DC.isPartOf

If folks want to carry forward the use of DC.isPartOf from their 1.x or 2.x implementation, they're free to keep the default configuration they've been using. Greenfield deployers of 3.0.0 get a more narrowly defined predicate for the AdminSet relationship when the application is initially generated.

===========
DCE can definitely signs up for PHASE ONE. We can probably help on a significant portion of PHASE TWO. After that, PHASE THREE feels almost trivial (other than remembering to do it).

@jeremyf
Copy link
Contributor

jeremyf commented Aug 23, 2017

I like @mark-dce's breakdown and phased implementation. It makes sense to narrow the administrative scope of this predicate

@mjgiarlo
Copy link
Member

mjgiarlo commented Aug 23, 2017

@mark-dce @no-reply :shipit:

@no-reply no-reply self-assigned this Aug 23, 2017
@no-reply no-reply modified the milestones: 2.0.0, Backlog Aug 23, 2017
@vantuyls vantuyls modified the milestones: 2.0.0, Backlog Sep 20, 2017
@mjgiarlo
Copy link
Member

@mark-dce @no-reply do y'all want this in the 2.0.0 release? We're getting close to 2.0.0.rc1, so we'd need this to drop pretty soon to include it in 2.0.0. Else it can go into 2.1.0, which I hope will follow 2.0.0 pretty soon afterward.

Heads-up also to @vantuyls.

@mjgiarlo mjgiarlo added this to the 2.x series milestone Sep 21, 2017
@mark-dce
Copy link
Contributor

@mjgiarlo can we get back with a PR by Monday AM 10/25 if possible and otherwise let it slide to 2.1? (We need it in a project that is shipping immanently and it would be cooler to have it here than in the local project, but I'm not 100% sure what else we need to get done first)

@mjgiarlo
Copy link
Member

@mark-dce 9/25? Absolutely.

no-reply pushed a commit that referenced this issue Sep 25, 2017
Allow a custom predicate to be used to relate objects to their administrative
sets. The default remains `dcterms:isPartOf`. This is the first step in a
process to change the default predicate for this relation. A follow-up PR, to be
merged after the Hyrax 2.0.0 release, will provide warnings for changed usage
and add migration tooling.

This addresses "Phase 1" of #1461 as decscibed at
#1461 (comment).
@elrayle
Copy link
Contributor

elrayle commented Sep 29, 2017

@no-reply Can you point me to code where...

The behavior we are seeing is that, given a work with property :part_of, predicate: RDF::Vocab::DC.isPartOf, my_work.part_of returns the admin sets. This is the expected behavior, right?

I think it's possible, in principle, to filter the Relation returned by #part_of to exclude resources of type AdminSet. However, it also looks to me like that would require round trips to fedora.

Here is what I see...

>> my_work = GenericWork.find('2r36tx526')
>> my_work.admin_set_id
=> "w66343603"

>> my_work.part_of
NoMethodError: undefined method `part_of' for #<GenericWork:0x007f9a04af22d8>
>> my_work.isPartOf
NoMethodError: undefined method `isPartOf' for #<GenericWork:0x007f9a04af22d8>

@elrayle
Copy link
Contributor

elrayle commented Oct 1, 2017

@no-reply I am hoping you can respond soon. I would like to have this resolved before 2.0 is released.

I would like to be sure I understand the issue you are seeing, especially, with respect to the need for round trips to fedora.

@no-reply
Copy link
Contributor

no-reply commented Oct 3, 2017

@elrayle you can reproduce the original issue defining a property using the predicate. e.g. GenericWork.property :isPartOf, predicate: ::RDF::DC.isPartOf.

The issue with respect to data round trips is that the member object does not know the type of the objects in the dc:isPartOf relation. Any solution for filtering will require fetching that data from elsewhere when the property is accessed. This will either need to apply to all predicates, or whatever predicate is in place for the admin set relation will need to be a special case.

I'm confused about the need to resolve this prior to 2.0. Can you explain the issue the collections group has with a configurable predicate?

@elrayle
Copy link
Contributor

elrayle commented Oct 3, 2017

@no_reply Asked the question in Slack...

I'm not sure I understand the issue with the original three-step plan on #1461.
particularly, I'm not clear on how the changes already merged for it negatively impact the collections wg's work.

Short Term Reality

The Collection Extensions (CE) work currently mimics Admin Sets as Collections in the UI. Processing for new/edit/show are still handled separately with Admin Sets using the original admin set code and collections using the CE code. Processing for workflows and visibility remain only available to Admin Sets. The reasoning was two-fold...

  1. Although the requirements gathering process suggested that there is a desire to have collections support workflows and visibility controls, the actual implementation is not straight forward and would have over complicated the initial implementation of CE.
  2. A lot of work has been completed in existing applications that make use of Admin Sets. Any changes to Admin Sets would require migration. The CE work is attempting to keep the migration impact to a minimum.

Long Term Goal

The long term goal is still to explore having Admin Sets be just another collection type. The CE work has kept this in mind and made some choices that set the stage for that work.

The advantages are...

  • one code base to maintain for all things grouping together works
  • closer alignment with the original PCDM model
  • allow Admin Sets to be more visibility policy oriented (their original intent)
  • creating a space, or at least a conversation, around what sets a workflow for a work (some sites have a preference for setting workflow by means other than an object living in an admin set, e.g. based on file type of document uploaded, based on organizational unit responsible for the works, based on purpose of the work (etd, research, curated work, etc.)
  • supporting sites in configuring their apps to meet local needs without extensive customization required

Conflict with the proposed phased change to Admin Set predicate

If Admin Set becomes just another collection type, then according to the PCDM model, the relationship between the Admin Set (i.e. Collection) and the Works in it would be...

<work> <pcdm:memberOf> <admin_set>

To use any other predicate sets the stage for requiring a migration of data at a later date.

@no-reply
Copy link
Contributor

no-reply commented Oct 3, 2017

All of that background makes sense to me, but I still can't connect the dots to how it changes the assessment of the issue we arrived at in early August.

The issue at that point was that we currently use a common predicate dc:isPartOf for AdminSet relations. People wanted to use another one (I suspect the stakeholders would be happy with pcdm:memberOf), but doing so would require a data migration. We had requirements with respect to migration tooling that we wanted to hold ourselves to before changing the default behavior. The timeline for developing that tooling seemed unrealistic for a 2.0 release.

Where we landed was that we would make the predicate configurable in 2.0.0, with an eye toward changing the default in a later release, after migration tooling had been released.

The questions I have are:

@elrayle
Copy link
Contributor

elrayle commented Oct 3, 2017

I see where you are coming from with the desire to have the migration tooling in place prior to requiring a change in the predicate.

I would be more comfortable with this proposal if...

Phase 1: Set the default to pcdm:memberOf and include release notes that indicated to avoid migration, override the default and set it to dc:isPartOf.

This would allow existing sites to avoid migration issues, but encourage new sites to use the predicate from the PCDM:model.

@no-reply
Copy link
Contributor

no-reply commented Oct 3, 2017

Phase 1: Set the default to pcdm:memberOf and include release notes that indicated to avoid migration, override the default and set it to dc:isPartOf.

Does this work out of the box? I believe this runs into the same predicate conflict issues that gave rise to the original issue. I.E. #admin_set would return all of the collections or objects that a given object is pcdm:memberOf.

@elrayle
Copy link
Contributor

elrayle commented Oct 3, 2017

@no-reply Can you show me the exact code that is causing the problem of returning all objects with dc:isPartOf? Ideally pointing me to the code in github so I can see it in context.

@elrayle
Copy link
Contributor

elrayle commented Oct 4, 2017

@no-reply I see where .object is called all over ActiveFedora. I do not see it called in Hyrax. Is there a particular part of Hyrax where this was giving you difficulties? Most of the UI is driven off Solr and not Fedora. And the search builders generally limit on model. Where in Hyrax is this causing a difficulty?

@no-reply
Copy link
Contributor

no-reply commented Oct 4, 2017

@elrayle the issue, its manifestation in the UI, and the steps to reproduce are documented in the issue text. I'm not sure what to add.

@mark-dce
Copy link
Contributor

mark-dce commented Oct 4, 2017

@elrayle - let me see if I can provide my "manager's hat" understanding of the issue and proposal. Other folks ( @mjgiarlo @no-reply @jeremyf @geekscruff ) may want to jump in with more technical or historical background.

Background
There is currently a usage of ::RDF::Vocab::DC.isPartOf at https://github.com/samvera/hyrax/blob/v2.0.0.beta4/app/models/admin_set.rb#L43 for AdminSet relationships that was originally inherited from the AdminSetBehaviors implementation in CurationConcerns here: https://github.com/samvera/curation_concerns/blob/v2.0.0.rc1/app/models/concerns/curation_concerns/admin_set_behavior.rb#L26

PROBLEM
Attempting to define a new metadata term in Hyrax that uses ::RDF::Vocab::DC.isPartOf as it's predicate causes a collision with the existing usage for admin sets.

Deeper Analysis
It's probably possible to filter the use of the predicate specifically in relation to admin sets in such a way that a more general use of is_part_of could be made, but there are concerns this kind of solution would not be performant. Beyond that, and aligning with your comments, it seems like a narrower, more specific predicate would be better for defining the AdminSet relationship. However, there are folks with both CC 2.x and Hyrax 1.x implementations in place that would be broken if we change the existing predicate. There isn't time to produce and test the necessary migration tool to address a predicate change that we could include in the Hyrax 2.0.0 release.

Proposed immediate solution

  1. Make the predicate for the admin set configurable so that folks building greenfield Hyrax 2.0.0 applications can choose a predicate other than ::RDF::Vocab::DC.isPartOf so is_part_of is freed up for other usage.
  2. Set the default to the existing predicate used in CC 2.x and Hyrax 1.x for backward compatibility for the folks who want to upgrade existing repos.

Proposed long term solution

  1. Agree on a common default predicate for admin set relations. I like pcdm:memberOf as part of the larger collection work - this seems really sane for the proposed collection modeling changes.
  2. Provide a migration tool that allows folks to migrate CC 2.x, Hyrax 1.x, and Hyrax 2.0.x collections to the collection model implemented in Hyrax 3.x (since by definition a change to the collection model which would require a data migration would be a breaking major version change).

The prior discussion centered around the two-phased solution since there's general consensus not to delay the 2.0.0 release for community agreement on a new predicate.

@mark-dce
Copy link
Contributor

mark-dce commented Oct 5, 2017

@elrayle Re-reading the entire thread an your comments, it seems like there is general consensus that the community would like to move away from the current implementation's use of ::RDF::Vocab::DC.isPartOf. So, it feels like the crux of the issue is

  1. Whether to set the default for backward compatibility or toward a preferred future predicate (with appropriate release notes)
  2. Whether the community can reach consensus on the forward-looking predicate in the very near future.

Does that seem like the right set of questions?

@elrayle
Copy link
Contributor

elrayle commented Oct 9, 2017

@mark-dce Yes, I agree with your summary. Several additional considerations.

  1. Moving toward the preferred future predicate now means any new apps created now would not face a known-migration in the near future.
  2. With the configurable predicate, existing apps can avoid the need for migration now in favor of a later migration when tools are available.
  3. Selecting pcdm:memberOf as the default predicate sets the stage for admin sets to become just another collection type

@mjgiarlo
Copy link
Member

mjgiarlo commented Oct 9, 2017

@elrayle @mark-dce @no-reply @jlhardes cc: @vantuyls Good points, all.

I'm hesitant to change the default this late in the release process of 2.0.0. Could we address this via documentation in the meantime, letting folks know of collection extension changes that are coming and recommending usage of the community-preferred predicate (from @mark-dce's 2. above)? If so... we need to figure out what the community-preferred predicate is, i.e., have a process for getting this done and a volunteer for taking the lead.

@no-reply
Copy link
Contributor

Bumping this to 3.x. The customizability included in the original work seems to have addressed the immediate issue.

@pscdodd
Copy link

pscdodd commented Mar 8, 2019

Added to help with finding this issue. Also get the following error in samvera 2.4.1:
ActiveRecord::RecordNotFound (Couldn't find Sipity::Workflow with [WHERE sipity_workflows.active = ? AND sipity_workflows.permission_template_id IN (SELECT permission_templates.id FROM permission_templates WHERE permission_templates.source_id IS NULL)])

no-reply pushed a commit that referenced this issue Sep 21, 2019
no-reply pushed a commit that referenced this issue Sep 21, 2019
elrayle pushed a commit that referenced this issue Sep 24, 2019
orangewolf pushed a commit to scientist-softserv/atla_digital_library that referenced this issue Nov 14, 2022
There are two changes to `BasicMetadata` between `CurationConcerns` 1.7.x and
`Hyrax` 2.3.x:

  - `dc:isPartOf` is now bound to AdminSets by
    default; (see: samvera/hyrax#1461). We need to
    determine if the existing app is using `dc:isPartOf`. If so, we can set
    `config.admin_set_predicate = some_other_predicate`, and reinstate the
    `part_of` property.
  - The property `#rights` (predicate: `dc:rights`) has been renamed
    `#license`. The best solution is may be to alias `#rights` and
    `#rights=`. If we need to retain the display name, that can be handled with
    `i18n` values.

There's a third issue which seems to exist in the legacy app:

  - In `BasicMetadata`, `#keyword` is mapped to `dc11:relation`. We have a
  custom mapping from `#relation` to the same predicate. This makes these
  properties duplicative. We need to look closer at the existing code to
  determine whether we should remove `#keyword`, `#relation`, or allow them to
  remain as properties projecting on the same BGP.

Connected to #45.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests