Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SequenceMatcher bug #77293

Closed
mcft mannequin opened this issue Mar 20, 2018 · 2 comments
Closed

SequenceMatcher bug #77293

mcft mannequin opened this issue Mar 20, 2018 · 2 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@mcft
Copy link
Mannequin

mcft mannequin commented Mar 20, 2018

BPO 33112
Nosy @tim-one

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-03-20.20:35:51.540>
created_at = <Date 2018-03-20.20:09:08.532>
labels = ['type-bug']
title = 'SequenceMatcher bug'
updated_at = <Date 2018-03-20.20:35:51.534>
user = 'https://bugs.python.org/mcft'

bugs.python.org fields:

activity = <Date 2018-03-20.20:35:51.534>
actor = 'tim.peters'
assignee = 'none'
closed = True
closed_date = <Date 2018-03-20.20:35:51.540>
closer = 'tim.peters'
components = []
creation = <Date 2018-03-20.20:09:08.532>
creator = 'mcft'
dependencies = []
files = []
hgrepos = []
issue_num = 33112
keywords = []
message_count = 2.0
messages = ['314163', '314165']
nosy_count = 2.0
nosy_names = ['tim.peters', 'mcft']
pr_nums = []
priority = 'normal'
resolution = 'duplicate'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue33112'
versions = ['Python 3.5']

@mcft
Copy link
Mannequin Author

mcft mannequin commented Mar 20, 2018

difflib.SequenceMatcher fails to make a proper alignment between 2 sequences with only 3 single letter changes. Its performance is completely off with a similarity ratio of 0.16, in stead of the more accurate 0.99.

Here is a snippet to replicate the failure:
>>> aa_ref = 'MTLFTTLLVLIFERLFKLGEHWQLDHRLEAFFRRVKHFSLGRTLGMTIIAMGVTFLLLRALQGVLFNVPTLLVWLLIGLLCIGAGKVRLHYHAYLTAASRNDSHARATMAGELTMIHGVPAGCDEREYLRELQNALLWINFRFYLAPLFWLIVGGTWGPVTLMGYAFLRAWQYWLARYQTPHHRLQSGIDAVLHVLDWVPVRLAGVVYALIGHGEKALPAWFASLGDFHTSQYQVLTRLAQFSLAREPHVDKVETPKAAVSMAKKTSFVVVVVIALLTIYGALV'
>>> aa_seq = 'MTLFTTLLVLIFERLFKLGEHWQLDHRLEAFFRRVKHFSLGRTLCMTIIAMGVTFLLLRALQGVLFNVPTLLVWLLIGLLCIGAGKVRLHYHAYLTAASRNDSHAHATMAGELTMIHGVPAGCDEREYLRELQNALLWINFRFYLAPLFWLIVGGTWGPVTLMGYAFLRAWQYWLARYQTPHHRLQSGIDAVLHALDWVPVRLAGVVYALIGHGEKALPAWFASLGDFHTSQYQVLTRLAQFSLAREPHVDKVETPKAAVSMAKKTSFVVVVVIALLTIYGALV'
>>> sum(a!=b for a, b in zip(aa_ref, aa_seq))
3
>>> match = SequenceMatcher(a=aa_ref, b=aa_seq)
>>> match.ratio()
0.1619718309859155
>>> match.get_opcodes()
[('equal', 0, 43, 0, 43), ('delete', 43, 79, 43, 43), ('equal', 79, 81, 43, 45), ('replace', 81, 122, 45, 80), ('equal', 122, 123, 80, 81), ('replace', 123, 284, 81, 284)]

@mcft mcft mannequin added the type-bug An unexpected behavior, bug, or error label Mar 20, 2018
@tim-one
Copy link
Member

tim-one commented Mar 20, 2018

Please see the response to bpo-31889. Short course: you need to pass autojunk=False to the SequenceMatcher constructor.

@tim-one tim-one closed this as completed Mar 20, 2018
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant