Add facility/processing type taxonomy and look-up function #1601
Conversation
106e151
to
e7c1f37
Compare
@@ -0,0 +1,397 @@ | |||
from api.matching import clean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this stuff in it's own file since there are so many items.
@@ -600,6 +608,68 @@ def test_incorrect_country_code_raises_error(self, mock_get): | |||
) | |||
|
|||
|
|||
class FacilityAndProcessingTypeTest(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the issue said to create tests using a loop, but I found it easier to incrementally add functionality to the look up function with individual tests. I can condense if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice implementation. I was excited to see how well this works. I made one suggestion on how we return when there is not match but we could defer that until we actually have client code using it.
|
||
# Match must score 85 or higher to be considered usable. | ||
if not matched_value or matched_value[1] < 85: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to consider returning (None, None, None, None)
here so that the "shape" of the return values from the function is always the same, which may make the calling code a litter clearer.
>>> (a, b, c, d) = get_facility_and_processing_type('WILL NOT MATCH ANYTHING')
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: cannot unpack non-iterable NoneType object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that at first, but changed it to None
because (None, None, None, None)
evaluates to not None
. But as you point out, I have some questions about the client code implementation that may require changing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this. Thanks.
FUZZY_MATCH = 'FUZZY' | ||
|
||
ALL_PROCESSING_TYPES = { | ||
**OFFICE_PROCESSING_TYPES, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of **
here to compactly build these derived data structures.
The look-up function takes the input value, cleans it up by removing non-letter characters and extra spaces, and then attempts to find a match in the taxonomy using various methods. Refs #1582
e7c1f37
to
29de8fa
Compare
Overview
The look-up function takes the input value, cleans it up by removing non-letter characters and extra spaces, and then attempts to find a match in the taxonomy using various methods.
Connects #1582
Demo
Note
FUZZY_ALIAS_MATCH
was not implemented because it ended up not being useful. All non-exact entries were either matched using the alias or fuzzy methods.Testing Instructions
./scripts/update
to install the new pip dependency../scripts/test
to ensure the new tests pass../scripts/manage shell
) and manually experiment with the matching function to see if it otherwise is working as intended.Checklist
fixup!
commits have been squashed