Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Add facility type and production type taxonomy to code and write helper functions to look up values #1582

Closed
4 tasks done
jwalgran opened this issue Jan 14, 2022 · 0 comments
Assignees

Comments

@jwalgran
Copy link
Contributor

jwalgran commented Jan 14, 2022

Overview

Add facility type and production type taxonomy to code and write helper functions to look up values

Taxonomy: https://docs.google.com/spreadsheets/d/1HlGoYgj0rbtxnhgm0LAWk5lgFCSw3A2V/edit?pli=1#gid=432729308

Describe the solution you'd like

  • Add the taxonomy to the codebase as constant data structures (https://docs.google.com/spreadsheets/d/1HlGoYgj0rbtxnhgm0LAWk5lgFCSw3A2V/edit#gid=432729308)
    • Use clean values as lookup keys
  • Add alias lookup list for common alternate representations of each facility type and processing type
  • Add https://pypi.org/project/thefuzz/ for string distance calculation
  • Add a function to look up facility and processing types
    • Arguments is a string that can be either a Facility Type or a Processing Type
    • Returns tuple of (field_type, match_type, facility_type_value, production_type_value)
      • Field type is either FACILITY_TYPE or PROCESSING_TYPE or None
      • Match type is either EXACT or ALIAS or FUZZY or ALIAS_FUZZY or None
      • _value fields are the matched taxonomy values
    • Proposed logic
      • clean the input value
      • Try to match exact processing type or facility type
      • If no match try to match alias for production type or facility type
      • If no match try using thefuzzz.process.extractOne to match against, in turn, processing types, facility types, processing type alias and facility type alias, only accepting a value with a score exceeding a threshold
    • Add a unit test that uses a loop to assert matching inputs and outputs for many test cases
@caseycesari caseycesari self-assigned this Jan 21, 2022
caseycesari added a commit that referenced this issue Jan 28, 2022
The look-up function takes the input value, cleans it up by removing
non-letter characters and extra spaces, and then attempts to find a
match in the taxonomy using various methods.

Refs #1582
caseycesari added a commit that referenced this issue Jan 28, 2022
The look-up function takes the input value, cleans it up by removing
non-letter characters and extra spaces, and then attempts to find a
match in the taxonomy using various methods.

Refs #1582
caseycesari added a commit that referenced this issue Jan 28, 2022
The look-up function takes the input value, cleans it up by removing
non-letter characters and extra spaces, and then attempts to find a
match in the taxonomy using various methods.

Refs #1582
caseycesari added a commit that referenced this issue Jan 31, 2022
The look-up function takes the input value, cleans it up by removing
non-letter characters and extra spaces, and then attempts to find a
match in the taxonomy using various methods.

Refs #1582
@jwalgran jwalgran closed this as completed Feb 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants