TODO in tailmatch(): it does not support backward in all cases #60485

vstinner · 2012-10-19T00:51:24Z

BPO	16281
Nosy	@loewis, @vstinner, @serhiy-storchaka

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-01-03.02:20:27.254>
created_at = <Date 2012-10-19.00:51:23.526>
labels = ['interpreter-core']
title = 'TODO in tailmatch(): it does not support backward in all cases'
updated_at = <Date 2013-01-03.13:38:55.832>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2013-01-03.13:38:55.832>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2013-01-03.02:20:27.254>
closer = 'python-dev'
components = ['Interpreter Core']
creation = <Date 2012-10-19.00:51:23.526>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 16281
keywords = []
message_count = 9.0
messages = ['173301', '174441', '174847', '174884', '174893', '178901', '178902', '178941', '178944']
nosy_count = 4.0
nosy_names = ['loewis', 'vstinner', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue16281'
versions = ['Python 3.3', 'Python 3.4']

vstinner · 2012-10-19T00:51:22Z

Oh oh, it looks like the implementation of tailmatch() was not finished:

        /* If both are of the same kind, memcmp is sufficient */
        if (kind_self == kind_sub) {
            return ...;
        }
        /* otherwise we have to compare each character by first accesing it */
        else {
            /* We do not need to compare 0 and len(substring)-1 because
               the if statement above ensured already that they are equal
               when we end up here. */
            /* TODO: honor direction and do a forward or backwards search */
            for (i = 1; i < end_sub; ++i) {
                if (PyUnicode_READ(kind_self, data_self, offset + i) !=
                    PyUnicode_READ(kind_sub, data_sub, i))
                    return 0;
            }
            return 1;
        }

serhiy-storchaka · 2012-11-01T19:08:05Z

The result does not depend on the direction of comparison. This only affects speed. But who can to say in which direction comparison will be faster?

Here I see a one obvious opportunity for optimization:

if (kind_self < kind_sub)
    return 0;

After that and after processing the case (kind_self == kind_sub) only 3 special cases left: UCS1 in UCS2, UCS1 in UCS4, and UCS2 in UCS4. Get rid of slow PyUnicode_READ() for this cases will speed up the code. Also note that comparing first and last characters before memcmp can be a slowdown (because PyUnicode_READ() is slow). Try to compare first and last bytes.

vstinner · 2012-11-04T23:51:09Z

Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function can fail.

serhiy-storchaka · 2012-11-05T09:22:00Z

Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function
can fail.

But it does.

.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, \
Py_ssize_t start, Py_ssize_t end, int direction)

Return 1 if substr matches str[start:end] at the given tail end
(direction == -1 means to do a prefix match, direction == 1 a suffix match),
0 otherwise. Return -1 if an error occurred.

vstinner · 2012-11-05T11:36:09Z

Oh, PyUnicode_Tailmatch() documentation doesn't mention that the function
can fail.

But it does.

.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr,
Py_ssize_t start, Py_ssize_t end, int direction)

Return 1 if substr matches str[start:end] at the given tail end
(direction == -1 means to do a prefix match, direction == 1 a suffix match),
0 otherwise. Return -1 if an error occurred.

Oh, I read the "documentation" in unicodeobject.h:

/* Return 1 if substr matches str[start:end] at the given tail end, 0
otherwise. */

The problem is that tailmatch() returns 0 if PyUnicode_READY() failed.
It is a bug, even if it cannot occur because PyUnicode_Tailmatch()
checks that the both strings are ready.

python-dev · 2013-01-03T02:20:27Z

New changeset 49eb2488145d by Victor Stinner in branch 'default':
Close bpo-16281: handle tailmatch() failure and remove useless comment
http://hg.python.org/cpython/rev/49eb2488145d

vstinner · 2013-01-03T02:21:46Z

"Here I see a one obvious opportunity for optimization: ..."

@serhiy: Can you please open a new issue for this? I consider the issue as fixed: I just removed the TODO (for the reason explained in the changeset).

serhiy-storchaka · 2013-01-03T12:43:01Z

Shouldn't this be applied to 3.3?

As for optimization, I made some benchmarks and didn't saw any significant difference. Usually this function used to check short ASCII heads and tails and any optimization will not be seen even under a microscope.

vstinner · 2013-01-03T13:38:56Z

Shouldn't this be applied to 3.3?

It's just a cleanup, it doesn't fix any real bug. I prefer to not
pollute old versions with cleanup.

As for optimization, I made some benchmarks and didn't saw any significant difference. Usually this function used to check short ASCII heads and tails and any optimization will not be seen even under a microscope.

Ok, agreed.

serhiy-storchaka added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 15, 2012

python-dev mannequin closed this as completed Jan 3, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO in tailmatch(): it does not support backward in all cases #60485

TODO in tailmatch(): it does not support backward in all cases #60485

vstinner commented Oct 19, 2012

vstinner commented Oct 19, 2012

serhiy-storchaka commented Nov 1, 2012

vstinner commented Nov 4, 2012

serhiy-storchaka commented Nov 5, 2012

vstinner commented Nov 5, 2012

python-dev mannequin commented Jan 3, 2013

vstinner commented Jan 3, 2013

serhiy-storchaka commented Jan 3, 2013

vstinner commented Jan 3, 2013

TODO in tailmatch(): it does not support backward in all cases #60485

TODO in tailmatch(): it does not support backward in all cases #60485

Comments

vstinner commented Oct 19, 2012

vstinner commented Oct 19, 2012

serhiy-storchaka commented Nov 1, 2012

vstinner commented Nov 4, 2012

serhiy-storchaka commented Nov 5, 2012

vstinner commented Nov 5, 2012

python-dev mannequin commented Jan 3, 2013

vstinner commented Jan 3, 2013

serhiy-storchaka commented Jan 3, 2013

vstinner commented Jan 3, 2013