Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ratio_min() function to the difflib library #90244

Closed
gi-ba-bu mannequin opened this issue Dec 15, 2021 · 4 comments
Closed

Add ratio_min() function to the difflib library #90244

gi-ba-bu mannequin opened this issue Dec 15, 2021 · 4 comments
Assignees
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@gi-ba-bu
Copy link
Mannequin

gi-ba-bu mannequin commented Dec 15, 2021

BPO 46086
Nosy @tim-one, @taleinat, @gi-ba-bu
PRs
  • bpo-46086: Add ratio_min(self,m) function #30125
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/tim-one'
    closed_at = <Date 2022-01-20.22:15:08.930>
    created_at = <Date 2021-12-15.17:37:14.440>
    labels = ['type-feature', 'library', '3.11']
    title = 'Add ratio_min() function to the difflib library'
    updated_at = <Date 2022-01-20.22:15:08.929>
    user = 'https://github.com/gi-ba-bu'

    bugs.python.org fields:

    activity = <Date 2022-01-20.22:15:08.929>
    actor = 'taleinat'
    assignee = 'tim.peters'
    closed = True
    closed_date = <Date 2022-01-20.22:15:08.930>
    closer = 'taleinat'
    components = ['Library (Lib)']
    creation = <Date 2021-12-15.17:37:14.440>
    creator = 'gibu'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46086
    keywords = ['patch']
    message_count = 4.0
    messages = ['408622', '408629', '410104', '411048']
    nosy_count = 4.0
    nosy_names = ['tim.peters', 'taleinat', 'python-dev', 'gibu']
    pr_nums = ['30125']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue46086'
    versions = ['Python 3.11']

    @gi-ba-bu
    Copy link
    Mannequin Author

    gi-ba-bu mannequin commented Dec 15, 2021

    Here I propose a new function, namely .ratio_min(self,m).

    .ratio_min(self,m) is an extension of the difflib's function .ratio(self). Equivalently to .ratio(self), .ratio_min(self,m) returns a measure of two sequences' similarity (float in [0,1]). In addition to .ratio(), it can ignore matched substrings if these substrings have length less than a given threshold m. m is the second variable of the function.

    It is very useful to avoid spurious high similarity scores.

    # NEW FUNCTION: 
    
        def ratio_min(self,m):
            """Return a measure of the sequences' similarity (float in [0,1]).
            Where T is the total number of elements in both sequences, and
            M_min is the number of matches with every single match has length at least m, this is 2.0*M_min / T.
            Note that this is 1 if the sequences are identical, and 0 if
            they have no substring of length m or more in common.
            .ratio_min() is similar to .ratio(). 
            .ratio_min(1) is equivalent to .ratio().
            
            >>> s = SequenceMatcher(None, "abcd", "bcde")
            >>> s.ratio_min(1)
            0.75
            >>> s.ratio_min(2)
            0.75
            >>> s.ratio_min(3)
            0.75
            >>> s.ratio_min(4)
            0.0
            """
    
            matches = sum(triple[-1] for triple in self.get_matching_blocks() if triple[-1] >=m)
            return _calculate_ratio(matches, len(self.a) + len(self.b))

    @gi-ba-bu gi-ba-bu mannequin added 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Dec 15, 2021
    @AlexWaygood
    Copy link
    Member

    I am removing 3.10 from the "versions" field, since additions to the standard library are only considered for unreleased versions of Python.

    @AlexWaygood AlexWaygood removed 3.10 only security fixes labels Dec 15, 2021
    @taleinat
    Copy link
    Contributor

    taleinat commented Jan 8, 2022

    Thanks for the suggestion and the PR, Giacomo!

    However, in my opinion, this is better suited to be something like a cookbook recipe. The number of use cases for this will be low, and there would be little advantage to having this in the stdlib rather than elsewhere.

    @taleinat
    Copy link
    Contributor

    I'm closing this for now since nobody has followed up and to the best of my understanding this wouldn't be an appropriate addition to the stdlib.

    This can be re-opened in the future if needed, of course.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants