Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand merging to Localities, Paleo Ctxt, Collecting Event #4606

Open
wants to merge 40 commits into
base: production
Choose a base branch
from

Conversation

CarolineDenis
Copy link
Contributor

@CarolineDenis CarolineDenis commented Mar 4, 2024

Fixes #4046

Checklist

  • Self-review the PR after opening it to make sure the changes look good
    and self-explanatory (or properly documented)
  • Add automated tests
  • Add relevant issue to release milestone

Testing instructions

  • perform same tests as agent merging but for new tables
  • Verify that we can’t merge PC and CE records when they are configured to be embedded

Utils:
https://discourse.specifysoftware.org/t/shared-vs-embedded-records-configuration/1712/1

@CarolineDenis
Copy link
Contributor Author

CarolineDenis commented Mar 5, 2024

What would be the logical new code?
@maxpatiiuk , @realVinayak

MergeSubviewButton line 52, CompareSubView.tsx

    () =>
      relationshipIsToMany(relationship)
        ? (resource as SpecifyResource<Accession>).getDependentResource(
            relationship.name as 'accessionAgents'
          )?.models.length ?? 0
        : resource.get(relationship.name) === undefined
        ? 0
        : 1,
    [relationship, resource]
  );

In the code above, relationship is a 'zero-to-one'.
relationshipIsToMany(relationship) returns true because

export const relationshipIsToMany = (
  relationship: Relationship | undefined
): boolean =>
  relationship?.type.includes('-to-many') === true ||
  relationship?.type === 'zero-to-one';

models.length is now throwing an error for other tables

Why do we have zero-to-one being consider as a IsToMany?

@grantfitzsimmons grantfitzsimmons self-requested a review March 5, 2024 16:31
@maxpatiiuk
Copy link
Member

maxpatiiuk commented Mar 6, 2024

In the code above, relationship is a 'zero-to-one'.
relationshipIsToMany(relationship) returns true because

zero-to-one is a "hack" relationship type introduced at some point in sp6 to do a quick fix for some user request, and now we have to live with the consequences (that's why quick fixes aren't a good idea most of the time)
I must admit, while Ben explained it to me several times, I didn't fully understand what zero-to-one means or why it's necessary. If I remember correctly, zero-to-one means "one-to-one" from one direction, but "many-to-many" in the reverse duration, or something like that, which is why in relationshipIsToMany we say true for zero-to-one fields.

But, @realVinayak or @melton-jason might be able to provide a better description that me based on their knowledge of back-end, business rules, databases and sp6

If it becomes really necessary, and we can't infer what zero-to-one means from the back-end code, we could contact Ben to ask him to explain it one more time


edit:

my messages to Ben about this:

Ok, I implemented zero-to-one. Tell me if I did it correctly:
they are handled as -to-many's:
image
but if you addded a -to-many item, you can not add another one (there is no Add button)

image

Ben:

That'll work. thanks


See also helpful information in https://specifydev.slack.com/archives/CC6V12D3J/p1617992659051300


Max:

I just found a relationship type I haven't seen before: zero-to-one
How should I treat these? are those like one-to-one but one directional?

Ben:

Yes. I don't know why it is called that. It is kind of stupid. It is just a one-to-one.

@melton-jason
Copy link
Contributor

melton-jason commented Mar 6, 2024

might be able to provide a better description that me based on their knowledge of back-end, business rules, databases and sp6

#4606 (comment)

The assesment @maxpatiiuk provided is correct to my knowledge.

There are two zero-to-one relationships in Specify and both on the Locality table:
localityDetails and geoCoordDetails.

I don't think the backend of Specify 7 has any dedicated logic to handling zero-to-one relationships, besides marking the to-"many" side of the relationship as dependent on the to-one side.

'Locality.geocoorddetails',
'Locality.latlonpolygons',
'Locality.localitycitations',
'Locality.localitydetails',

At a database level, it is indistinguishable from a one-to-many relationship.
That is, the LocalityDetail and GeoCoordDetail tables contain a ForeignKey to the Locality table. There are no other constraints in place, so it is perfectly valid from the database's perspective for a Locality to have more than one LocalityDetails or GeocoordDetails.

However, for whatever reason the restriction was needed in Specify 6 to enforce the relationship as a one-to-one. (i.e. a maximum of one LocalityDetails/GeocoordDetails per Locality).
This restriction is enforced and managed entirely within the application (MySQL or Mariadb has no concept of the relationship type or the restriction that is desired).

In the Specify 6 code, the relationship was denoted ZeroOrOne, but used zero-to-one when writing the specify_datamodel.xml.

https://github.com/specify/specify6/blob/945d28f00288d2d64979848fb2fb63fda34bdd1b/src/edu/ku/brc/af/core/db/DBRelationshipInfo.java#L36

Although, when working with the ORM Specify6 uses (Hibernate), the columns still needed to be mapped as OneToMany (on Locality) and ManyToOne (on LocalityDetails/GeocoordDetails)

localityDetails relationship on Locality:
https://github.com/specify/specify6/blob/945d28f00288d2d64979848fb2fb63fda34bdd1b/src/edu/ku/brc/specify/datamodel/Locality.java#L1083-L1091

locality relationship on LocalityDetail:
https://github.com/specify/specify6/blob/945d28f00288d2d64979848fb2fb63fda34bdd1b/src/edu/ku/brc/specify/datamodel/LocalityDetail.java#L935-L940

It seems they were first added in specify/specify6@3126d02, but I can not find any documenttation or reasoning as to why this restriction was needed or desired in the first place.

The relationship can be approached conceptually as "A Locality can have zero or one associated LocalityDetail/GeocoordDetail records".

@maxpatiiuk
Copy link
Member

Thanks for the detailed analysis @melton-jason!

The relationship can be approached conceptually as "A Locality can have zero or one associated LocalityDetail/GeocoordDetail records".

Interesting, because it seems like this could have been represented as an optional one-to-one, unless it has to be different for some reason specific to sp6 codebase

@maxpatiiuk
Copy link
Member

@melton-jason would it make sense to use the schemaoverwrites files to make sp7 front-end treat this as a -to-one relationship?
or would this cause issues because back-end would still return it as if it's a nested resources (thus will be an array rather than a direct resource in the API response)?
or should front-end act as if it's a one-to-many, but then have a business rule that causes a crash when more than one is added (causes a crash, so that it becomes obvious where in the code we forgot to restrict adding more than 1 resource for a zero-to-one)

@melton-jason
Copy link
Contributor

melton-jason commented Mar 7, 2024

Currently, it would be easiest for the frontend to treat zero-to-one relationships as a dependent one-to-many relationships.
This is how the backend currently serializes and returns records with the relationship type: as an array containing one or no records.

While it would require modifications to both the frontend and backend, It might be easier to treat the relationship as a dependent many-to-one relationship, like the collectionObjectAttribute field on CollectionObject. Instead of returning an array with zero or one element, the value of the field would either be the serialized resource or null.

However, there is a small concern regarding the name of the relationships: localityDetails and geocoordDetails. The pluralized name implies there can be more than one and that the relationship is a to-many.
So perhaps the most ideal solution is to make the relationship a true one-to-many and remove the one record maximum restriction.
Of course, this would require compatibility with Specify 6 to be broken and assumes it makes sense to allow more than one LocalityDetail and GeocoordDetail record per Locality.

@realVinayak
Copy link
Collaborator

Currently, it would be easiest for the frontend to treat zero-to-one relationships as a dependent one-to-many relationships. This is how the backend currently serializes and returns records with the relationship type: as an array containing one or no records.

While it would require modifications to both the frontend and backend, It might be easier to treat the relationship as a dependent many-to-one relationship, like the collectionObjectAttribute field on CollectionObject. Instead of returning an array with zero or one element, the value of the field would either be the serialized resource or null.

However, there is a small concern regarding the name of the relationships: localityDetails and geocoordDetails. The pluralized name implies there can be more than one and that the relationship is a to-many. So perhaps the most ideal solution is to make the relationship a true one-to-many and remove the one record maximum restriction. Of course, this would require compatibility with Specify 6 to be broken and assumes it makes sense to allow more than one LocalityDetail and GeocoordDetail record per Locality.

the current pr patches that for this. see relevant changes

Copy link
Member

@maxpatiiuk maxpatiiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small simplification, otherwise looking good

@CarolineDenis CarolineDenis requested a review from a team March 20, 2024 16:37
@CarolineDenis CarolineDenis marked this pull request as ready for review March 20, 2024 16:37
@acwhite211
Copy link
Member

acwhite211 commented May 20, 2024

When merging several collecting event duplicates, the collectors are merged together, though each collector should only appear once on a single CE. It should automatically remove duplicate collector records with the same agent for the final merged record.

Here is the current unit test to Collecting Event.

def test_replace_collecting_events(self):
c = Client()
c.force_login(self.specifyuser)
# Create collecting events
collecting_event_1 = models.Collectingevent.objects.create(discipline=self.discipline)
collecting_event_2 = models.Collectingevent.objects.create(discipline=self.discipline)
collecting_event_attribute = models.Collectingeventattribute.objects.create(discipline=self.discipline)
collecting_trip = models.Collectingtrip.objects.create(discipline=self.discipline)
locality = models.Locality.objects.create(discipline=self.discipline)
paleo_context = models.Paleocontext.objects.create(discipline=self.discipline)
collection_object = self.collectionobjects[0]
collection_object.collectingevent = collecting_event_1
collection_object.save()
collecting_event_1.collectingeventattribute = collecting_event_attribute
collecting_event_1.collectingtrip = collecting_trip
collecting_event_1.paleocontext = paleo_context
collecting_event_1.save()
# Assert that the api request ran successfully
response = c.post(
f'/api/specify/collectingevent/replace/{collecting_event_2.id}/',
data=json.dumps({
'old_record_ids': [collecting_event_1.id],
'background': False
}),
content_type='application/json'
)
self.assertEqual(response.status_code, 204)
# Assert that the old collecting event was deleted
with self.assertRaises(models.Collectingevent.DoesNotExist):
models.Collectingevent.objects.get(id=collecting_event_1.id)
# Assert that the dependent records of the old collecting event were deleted. (collectingeventattribute)
self.assertFalse(models.Collectingeventattribute.objects.filter(id=collecting_event_attribute.id).exists())
# Assert that the old collecting event's independent relationships, with a corresponding real database column,
# were updated correctly to the new collecting event. (collectingtrip, locality, paleocontext)
self.assertEqual(collecting_trip.collectingevents.filter(id=collecting_event_2.id).count(), 0)
self.assertEqual(locality.collectingevents.filter(id=collecting_event_2.id).count(), 0)
self.assertEqual(paleo_context.collectingevents.filter(id=collecting_event_2.id).count(), 0)
collecting_event_2 = models.Collectingevent.objects.get(id=collecting_event_2.id)
collecting_event_2.collectingtrip = collecting_trip
collecting_event_2.locality = locality
collecting_event_2.paleocontext = paleo_context
collecting_event_2.save()
collecting_trip = models.Collectingtrip.objects.get(id=collecting_trip.id)
locality = models.Locality.objects.get(id=locality.id)
paleo_context = models.Paleocontext.objects.get(id=paleo_context.id)
self.assertEqual(collecting_trip.collectingevents.filter(id=collecting_event_2.id).count(), 1)
self.assertEqual(locality.collectingevents.filter(id=collecting_event_2.id).count(), 1)
self.assertEqual(paleo_context.collectingevents.filter(id=collecting_event_2.id).count(), 1)
# Assert that the collecting event's independent relationships, without a corresponding real database column,
# were not deleted. (collectingobjects)
self.assertTrue(models.Collectionobject.objects.filter(id=collection_object.id).exists())
# Assert that a new api request will not find the old collection event
response = c.post(
f'/api/specify/collectingevent/replace/{collecting_event_2.id}/',
data=json.dumps({
'old_record_ids': [collecting_event_1.id],
'background': False
}),
content_type='application/json'
)
self.assertEqual(response.status_code, 404)

@grantfitzsimmons I'm working on getting this working in the record merging code. I've added to the unit test and just want to make sure these are the right test assertions that will show that this problem is solved:

# Setup test collectors, where each collector points to the same agent
collector_1 = models.Collector.objects.create(
    isprimary=True, ordernumber=1, agent=self.agent, collectingevent=collecting_event_1)
collector_2 = models.Collector.objects.create(
    isprimary=True, ordernumber=1, agent=self.agent, collectingevent=collecting_event_2)
collector_3 = models.Collector.objects.create(
    isprimary=True, ordernumber=1, agent=self.agent, collectingevent=collecting_event_3)

# ... after the merge request where collecting_event_2 if the old record and collecting_event_1 and collecting_event_3 are the new records.

# Assert that the collectors were updated correctly to the new collecting event.
# Also, assert that remove duplicate collector records with the same agent were deleted.
self.assertEqual(models.Collector.objects.filter(agent=self.agent).count(), 1) # only one collector is left
self.assertEqual(models.Collector.objects.filter(id=collector_1.id).count(), 0) # assert collector_1 does not exist
self.assertEqual(models.Collector.objects.filter(id=collector_2.id).count(), 1) # assert collector_2 still exists
self.assertEqual(models.Collector.objects.filter(id=collector_3.id).count(), 0) # assert collector_3 does not exist

@realVinayak
Copy link
Collaborator

Hmm, #4606 (comment), wasn't frontend supposed to warn if duplicate collectors were being inserted? should have done so, as part of save blockers (uniqueness rules)

Copy link
Collaborator

@realVinayak realVinayak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests + #4940

specifyweb/specify/api_tests.py Outdated Show resolved Hide resolved
specifyweb/specify/record_merging.py Outdated Show resolved Hide resolved
@CarolineDenis
Copy link
Contributor Author

When merging several collecting event duplicates, the collectors are merged together, though each collector should only appear once on a single CE. It should automatically remove duplicate collector records with the same agent for the final merged record.

@grantfitzsimmons,
Can you rephrase this? Can you provide the steps / examples to reproduce?

Simplify deletion of dependent resources + Move tests
Copy link
Collaborator

@realVinayak realVinayak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some things that'll be helpful to be clarified.

specifyweb/specify/record_merging.py Outdated Show resolved Hide resolved
specifyweb/specify/record_merging.py Outdated Show resolved Hide resolved
@realVinayak
Copy link
Collaborator

realVinayak commented May 30, 2024

@acwhite211 see the simplifications 🤷 🤦‍♂️ . hack wasn't needed after all. didn't change any of the tests you said failed without this hack. also removed the need for the http response (the caller should handle it).

@CarolineDenis
Copy link
Contributor Author

NOTES:
At this moment, the collector bug is fixed only when the records owe try to merge have identical data, meaning, if the created by agent is different or if an agent as a remark in one record and not the other it will not work

@realVinayak
Copy link
Collaborator

NOTES: At this moment, the collector bug is fixed only when the records owe try to merge have identical data, meaning, if the created by agent is different or if an agent as a remark in one record and not the other it will not work

that case seems fine. if they aren't exact duplicates, it won't combine them into one. it flags duplicate agents now, so users can pick one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Dev Attention Needed
Development

Successfully merging this pull request may close these issues.

Expand merging table support
7 participants