Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Create parent company extended fields #1602

Merged
merged 4 commits into from Jan 27, 2022
Merged

Conversation

jwalgran
Copy link
Contributor

@jwalgran jwalgran commented Jan 25, 2022

Overview

Create parent company extended fields when a contributor includes a parent_company in their CSV or API submission. Use a trigram search of existing contributor names to try an link the submitted value to a known contributor.

Connects #1584

Demo

Screen Shot 2022-01-26 at 4 12 21 PM

Screen Shot 2022-01-26 at 4 13 27 PM

Testing Instructions

These instruction assume ./scripts/resetdb has been run.

{
  "country": "US",
  "name": "Azavea",
  "address": "990 Spring Garden Street, 5th Floor Philadelphia, PA 19123",
  "parent_company": "A non matching value"
}
  • Browse the facility and verify that the extended field value is displayed in the sidebar
  • In a separate browser session browse http://localhost:8081/admin/api/extendedfield/1/change/ , log in as c1@example.com and verify that the extended field created includes but the raw value and a name key set to the same value.
  • Log in as c3@example.com and contribute parent-company.csv
  • Fully process the list
./scripts/manage batch_process --list-id 16 --action parse
./scripts/manage batch_process --list-id 16 --action geocode
./scripts/manage batch_process --list-id 16 --action match
  • Browse the new facility and verify that the parent company was fuzzy matched to an existing contributor and that contributor is displayed in the sidebar.
  • In a the admin browser session browse http://localhost:8081/admin/api/extendedfield/2/ and verify that the extended field created includes the raw value and matched contributor details.

Checklist

  • fixup! commits have been squashed
  • CI passes after rebase
  • CHANGELOG.md updated with summary of features or fixes, following Keep a Changelog guidelines

@jwalgran jwalgran force-pushed the feature/jcw/parent-company branch 2 times, most recently from 3e85fef to ca9af42 Compare January 26, 2022 22:02
@@ -85,6 +86,33 @@ def create_superuser(self, email, password, **extra_fields):
return self._create_user(email, password, **extra_fields)


class ContributorManager(models.Manager):
TRIGRAM_SIMILARY_THRESHOLD = 0.5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was chosen via trial an error with development data. We may adjust this in the future if non enough matching occurs or an excessive number of mismatches need to be corrected.

@jwalgran jwalgran marked this pull request as ready for review January 26, 2022 23:39
Copy link
Contributor

@TaiWilkin TaiWilkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working well and has a thorough test set. I tested a few different variants of parent company names and the results seemed reasonable in terms of matching threshold.

I left a few suggestions regarding variable names, but otherwise this is good to go.

False is less than True so we order_by boolean fields in descending
order
"""
threhold = ContributorManager.TRIGRAM_SIMILARY_THRESHOLD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this threshold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected in fixup 945925c

@@ -85,6 +86,33 @@ def create_superuser(self, email, password, **extra_fields):
return self._create_user(email, password, **extra_fields)


class ContributorManager(models.Manager):
TRIGRAM_SIMILARY_THRESHOLD = 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this variable to TRIGRAM_SIMILARITY_THRESHOLD?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 👍

Copy link
Contributor Author

@jwalgran jwalgran Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected in fixup 75954e9 and a44f6f5

.annotate(active_source_count=models.Count(
Q(source__is_active=True))) \
.annotate(
has_active_sources=ExpressionWrapper(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice two-step annotation

else:
field_value = {
'raw_value': field_value,
'name': field_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like using different values here (contributor_name and _id vs just name) to indicate whether we've matched it directly to a contributor or not. If we want to make the list items links in the future, we will be able to do that pretty easily.

@TaiWilkin TaiWilkin assigned jwalgran and unassigned TaiWilkin Jan 27, 2022
To support fuzzy matching contributors by a submitted parent company extended
field we add a custom manager to the Contributor model.

We set the TRIGRAM_SIMILARY_THRESHOLD by trial and error testing with
development data.

In the real world multiple people register accounts with the same contributor
name to we prioritize matches to a verified account, then to one under which
data has been submitted.
Update the extended field handling to process `parent_company` values by using
the custom `filter_by_name` manager method and setting the appropriate keys in
the `ExtendedField.value` dictionary based on whether there is a match or not.
These were mistakenly left in the code after testing.
@jwalgran
Copy link
Contributor Author

Thanks for the review.

@jwalgran jwalgran merged commit 4c30cf8 into develop Jan 27, 2022
@jwalgran jwalgran deleted the feature/jcw/parent-company branch January 27, 2022 18:29
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants