New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ratio_min() function to the difflib library #90244
Comments
Here I propose a new function, namely .ratio_min(self,m). .ratio_min(self,m) is an extension of the difflib's function .ratio(self). Equivalently to .ratio(self), .ratio_min(self,m) returns a measure of two sequences' similarity (float in [0,1]). In addition to .ratio(), it can ignore matched substrings if these substrings have length less than a given threshold m. m is the second variable of the function. It is very useful to avoid spurious high similarity scores.
def ratio_min(self,m):
"""Return a measure of the sequences' similarity (float in [0,1]).
Where T is the total number of elements in both sequences, and
M_min is the number of matches with every single match has length at least m, this is 2.0*M_min / T.
Note that this is 1 if the sequences are identical, and 0 if
they have no substring of length m or more in common.
.ratio_min() is similar to .ratio().
.ratio_min(1) is equivalent to .ratio().
>>> s = SequenceMatcher(None, "abcd", "bcde")
>>> s.ratio_min(1)
0.75
>>> s.ratio_min(2)
0.75
>>> s.ratio_min(3)
0.75
>>> s.ratio_min(4)
0.0
"""
matches = sum(triple[-1] for triple in self.get_matching_blocks() if triple[-1] >=m)
return _calculate_ratio(matches, len(self.a) + len(self.b)) |
I am removing 3.10 from the "versions" field, since additions to the standard library are only considered for unreleased versions of Python. |
Thanks for the suggestion and the PR, Giacomo! However, in my opinion, this is better suited to be something like a cookbook recipe. The number of use cases for this will be low, and there would be little advantage to having this in the stdlib rather than elsewhere. |
I'm closing this for now since nobody has followed up and to the best of my understanding this wouldn't be an appropriate addition to the stdlib. This can be re-opened in the future if needed, of course. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: