Create parent company extended fields #1602
Conversation
3e85fef
to
ca9af42
Compare
src/django/api/models.py
Outdated
@@ -85,6 +86,33 @@ def create_superuser(self, email, password, **extra_fields): | |||
return self._create_user(email, password, **extra_fields) | |||
|
|||
|
|||
class ContributorManager(models.Manager): | |||
TRIGRAM_SIMILARY_THRESHOLD = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was chosen via trial an error with development data. We may adjust this in the future if non enough matching occurs or an excessive number of mismatches need to be corrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is working well and has a thorough test set. I tested a few different variants of parent company names and the results seemed reasonable in terms of matching threshold.
I left a few suggestions regarding variable names, but otherwise this is good to go.
src/django/api/models.py
Outdated
False is less than True so we order_by boolean fields in descending | ||
order | ||
""" | ||
threhold = ContributorManager.TRIGRAM_SIMILARY_THRESHOLD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this threshold
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected in fixup 945925c
src/django/api/models.py
Outdated
@@ -85,6 +86,33 @@ def create_superuser(self, email, password, **extra_fields): | |||
return self._create_user(email, password, **extra_fields) | |||
|
|||
|
|||
class ContributorManager(models.Manager): | |||
TRIGRAM_SIMILARY_THRESHOLD = 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this variable to TRIGRAM_SIMILARITY_THRESHOLD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.annotate(active_source_count=models.Count( | ||
Q(source__is_active=True))) \ | ||
.annotate( | ||
has_active_sources=ExpressionWrapper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice two-step annotation
else: | ||
field_value = { | ||
'raw_value': field_value, | ||
'name': field_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like using different values here (contributor_name and _id vs just name) to indicate whether we've matched it directly to a contributor or not. If we want to make the list items links in the future, we will be able to do that pretty easily.
75954e9
to
a44f6f5
Compare
To support fuzzy matching contributors by a submitted parent company extended field we add a custom manager to the Contributor model. We set the TRIGRAM_SIMILARY_THRESHOLD by trial and error testing with development data. In the real world multiple people register accounts with the same contributor name to we prioritize matches to a verified account, then to one under which data has been submitted.
Update the extended field handling to process `parent_company` values by using the custom `filter_by_name` manager method and setting the appropriate keys in the `ExtendedField.value` dictionary based on whether there is a match or not.
These were mistakenly left in the code after testing.
a44f6f5
to
1f7a7b0
Compare
Thanks for the review. |
Overview
Create parent company extended fields when a contributor includes a
parent_company
in their CSV or API submission. Use a trigram search of existing contributor names to try an link the submitted value to a known contributor.Connects #1584
Demo
Testing Instructions
These instruction assume
./scripts/resetdb
has been run.Token 1d18b962d6f976b0b7e8fcf9fcc39b56cf278051
c1@example.com
and verify that the extended field created includes but the raw value and aname
key set to the same value.c3@example.com
and contribute parent-company.csvChecklist
fixup!
commits have been squashed