Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store a table of common pattern value resolutions with Similarity Percentage (confidence) #70

Open
gelliottrsg opened this issue Mar 9, 2021 · 1 comment

Comments

@gelliottrsg
Copy link

Users love the pattern detection but would like to leverage those patterns against a dataset that keeps the most common resolution to those patterns as a potential 1 to many name value lookup. For instance patterns of 9999999999, 999-999-9999, +9 9999999999 would have values in this new dataset that flag it as a potential phone number. A sample of the output could look like the attached image.

image

@dcamper
Copy link
Collaborator

dcamper commented Mar 10, 2021

@gelliottrsg This is a good idea. A couple of questions:

  1. I feel that the meaning behind patterns is possibly specific to a use-case. There are very few patterns that would actually be globally true (latitude/longitude comes to mind as one example). Phone numbers are not global but there is a finite set of patterns for them, so they would be harder but doable. SSN patterns could be easily confused with other things. The point is, does it make sense for this functionality to have a dictionary of pattern->meaning pairs built in, or require the caller to supply the dictionary?
  2. How do you envision the 'similarity percent' and 'resolution ranking' values in your example to be computed? The similarity value could be "number of records matching that pattern out of the total number of records" but that is not clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants