Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118184: Support tuples for find, index, rfind & rindex #119501

Closed
wants to merge 86 commits into from

Conversation

nineteendo
Copy link
Contributor

@nineteendo nineteendo commented May 24, 2024

From @pfmoore on Discourse:

One other option is for someone just to submit one or more PRs implementing the proposed feature(s). The PRs will either get accepted or rejected, and then you have your answer. The lack of response might just be because there’s not a lot that’s interesting to say.

I don’t personally think this is worth the effort to implement, and I’m not convinced I’d find it very useful in practice. But I also don’t think it’s such a big deal that it needs a big debate, or communty consensus, or a PEP. So if you want to put in the effort, just go for it.

Benchmark for 1,000,000 characters

script
# find_tuple.py
def find0(p, chars):
    for i, c in enumerate(p):
        if c in chars:
            break
    else:
        i = -1
    return i

def find1(p, subs):
    for i in range(len(p)):
        if p.startswith(subs, i):
            break
    else:
        i = -1
    return i

def find2(p, pattern):
    match = pattern.search(p)
    i = match.start() if match else -1
    return i

def find3(p, subs):
    i = -1
    for sub in subs:
        new_i = p.find(sub, 0, None if i == -1 else i)
        if new_i != -1:
            i = new_i
    return i

def find4(p, subs):
    i = p.find(subs)
    return i

def rfind0(p, chars):
    i = len(p) - 1
    while i >= 0 and p[i] not in chars:
        i -= 1
    return i

def rfind1(p, subs):
    for i in range(len(p), -1, -1):
        if p.startswith(subs, i):
            break
    else:
        i = -1
    return i

rfind2 = find2

def rfind3(p, subs):
    i = -1
    for sub in subs:
        new_i = p.rfind(sub, 0 if i == -1 else i)
        if new_i != -1:
            i = new_i
    return i

def rfind4(p, subs):
    i = p.rfind(subs)
    return i
# find_tuple.sh
echo find chars best case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'ab' + '_' * 999_998; chars   = 'ab'"               "find_tuple.find0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'ab' + '_' * 999_998; subs    = tuple('ab')"        "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = 'ab' + '_' * 999_998; pattern = re.compile('[ab]')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'ab' + '_' * 999_998; subs    = 'ab'"               "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'ab' + '_' * 999_998; subs    = tuple('ab')"        "find_tuple.find4(string, subs)"
echo find chars mixed case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'b' + '_' * 999_999; chars   = 'ab'"               "find_tuple.find0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'b' + '_' * 999_999; subs    = tuple('ab')"        "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = 'b' + '_' * 999_999; pattern = re.compile('[ab]')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'b' + '_' * 999_999; subs    = 'ab'"               "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'b' + '_' * 999_999; subs    = tuple('ab')"        "find_tuple.find4(string, subs)"
echo find chars worst case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; chars   = 'ab'"               "find_tuple.find0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple('ab')"        "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = '_' * 1_000_000; pattern = re.compile('[ab]')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = 'ab'"               "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple('ab')"        "find_tuple.find4(string, subs)"
echo find subs best case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'abcd' + '_' * 999_996; subs    = 'ab', 'cd'"          "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = 'abcd' + '_' * 999_996; pattern = re.compile('ab|cd')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'abcd' + '_' * 999_996; subs    = 'ab', 'cd'"          "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'abcd' + '_' * 999_996; subs    = 'ab', 'cd'"          "find_tuple.find4(string, subs)"
echo find subs mixed case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'cd' + '_' * 999_998; subs    = 'ab', 'cd'"          "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = 'cd' + '_' * 999_998; pattern = re.compile('ab|cd')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'cd' + '_' * 999_998; subs    = 'ab', 'cd'"          "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = 'cd' + '_' * 999_998; subs    = 'ab', 'cd'"          "find_tuple.find4(string, subs)"
echo find subs worst case
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = 'ab', 'cd'"          "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = '_' * 1_000_000; pattern = re.compile('ab|cd')" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = 'ab', 'cd'"          "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = 'ab', 'cd'"          "find_tuple.find4(string, subs)"
echo find many prefixes
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'prefix{i}' for i in range(100))"                "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = '_' * 1_000_000; pattern = re.compile('|'.join(f'prefix{i}' for i in range(100)))" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'prefix{i}' for i in range(100))"                "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'prefix{i}' for i in range(100))"                "find_tuple.find4(string, subs)"
echo find many infixes
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                "find_tuple.find1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, re; string = '_' * 1_000_000; pattern = re.compile('|'.join(f'{i}infix{i}' for i in range(100)))" "find_tuple.find2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                "find_tuple.find3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;     string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                "find_tuple.find4(string, subs)"

echo ---

echo rfind chars best case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'ba'; chars   = 'ab'"                      "find_tuple.rfind0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'ba'; subs    = tuple('ab')"               "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 999_998 + 'ba'; pattern = regex.compile('(?r)[ab]')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'ba'; subs    = 'ab'"                      "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'ba'; subs    = tuple('ab')"               "find_tuple.rfind4(string, subs)"
echo rfind chars mixed case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_999 + 'b'; chars   = 'ab'"                      "find_tuple.rfind0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_999 + 'b'; subs    = tuple('ab')"               "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 999_999 + 'b'; pattern = regex.compile('(?r)[ab]')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_999 + 'b'; subs    = 'ab'"                      "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_999 + 'b'; subs    = tuple('ab')"               "find_tuple.rfind4(string, subs)"
echo rfind chars worst case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; chars   = 'ab'"                      "find_tuple.rfind0(string, chars)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple('ab')"               "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 1_000_000; pattern = regex.compile('(?r)[ab]')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = 'ab'"                      "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple('ab')"               "find_tuple.rfind4(string, subs)"
echo rfind subs best case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_996 + 'cdab'; subs    = 'ab', 'cd'"                 "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 999_996 + 'cdab'; pattern = regex.compile('(?r)ab|cd')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_996 + 'cdab'; subs    = 'ab', 'cd'"                 "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_996 + 'cdab'; subs    = 'ab', 'cd'"                 "find_tuple.rfind4(string, subs)"
echo rfind subs mixed case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'cd'; subs    = 'ab', 'cd'"                 "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 999_998 + 'cd'; pattern = regex.compile('(?r)ab|cd')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'cd'; subs    = 'ab', 'cd'"                 "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 999_998 + 'cd'; subs    = 'ab', 'cd'"                 "find_tuple.rfind4(string, subs)"
echo rfind subs worst case
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = 'ab', 'cd'"                 "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 1_000_000; pattern = regex.compile('(?r)ab|cd')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = 'ab', 'cd'"                 "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = 'ab', 'cd'"                 "find_tuple.rfind4(string, subs)"
echo rfind many suffixes
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}suffix' for i in range(100))"                            "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 1_000_000; pattern = regex.compile(f'(?r){'|'.join(f'{i}suffix' for i in range(100))}')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}suffix' for i in range(100))"                            "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}suffix' for i in range(100))"                            "find_tuple.rfind4(string, subs)"
echo rfind many infixes
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                            "find_tuple.rfind1(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple, regex; string = '_' * 1_000_000; pattern = regex.compile(f'(?r){'|'.join(f'{i}infix{i}' for i in range(100))}')" "find_tuple.rfind2(string, pattern)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                            "find_tuple.rfind3(string, subs)"
find-tuple/python.exe -m timeit -s "import find_tuple;        string = '_' * 1_000_000; subs    = tuple(f'{i}infix{i}' for i in range(100))"                            "find_tuple.rfind4(string, subs)"
find chars best case - 2.33x faster
2000000 loops, best of 5: 177 nsec per loop
1000000 loops, best of 5: 213 nsec per loop
1000000 loops, best of 5: 277 nsec per loop
1000000 loops, best of 5: 227 nsec per loop
5000000 loops, best of 5: 76 nsec per loop
find chars mixed case - 1.80x faster
2000000 loops, best of 5: 177 nsec per loop
1000000 loops, best of 5: 219 nsec per loop
1000000 loops, best of 5: 276 nsec per loop
20000 loops, best of 5: 16.2 usec per loop
5000000 loops, best of 5: 98.4 nsec per loop
find chars worst case - 1.69x slower
5 loops, best of 5: 41.7 msec per loop
5 loops, best of 5: 53.8 msec per loop
50 loops, best of 5: 4.32 msec per loop
10000 loops, best of 5: 32 usec per loop
5000 loops, best of 5: 54.1 usec per loop
find subs best case - 2.93x faster
1000000 loops, best of 5: 213 nsec per loop
1000000 loops, best of 5: 306 nsec per loop
1000000 loops, best of 5: 217 nsec per loop
5000000 loops, best of 5: 72.7 nsec per loop
find subs mixed case - 3.75x slower
1000000 loops, best of 5: 220 nsec per loop
1000000 loops, best of 5: 285 nsec per loop
500 loops, best of 5: 733 usec per loop
500000 loops, best of 5: 824 nsec per loop
find subs worst case - 1.04x slower
5 loops, best of 5: 53.2 msec per loop
50 loops, best of 5: 4.79 msec per loop
200 loops, best of 5: 1.46 msec per loop
200 loops, best of 5: 1.52 msec per loop
find many prefixes - 56.3x slower
1 loop, best of 5: 602 msec per loop
500 loops, best of 5: 480 usec per loop
10 loops, best of 5: 36.8 msec per loop
10 loops, best of 5: 27 msec per loop
find many infixes - 5.69x slower
1 loop, best of 5: 603 msec per loop
50 loops, best of 5: 4.32 msec per loop
10 loops, best of 5: 33.2 msec per loop
10 loops, best of 5: 24.6 msec per loop

rfind chars best case - 1.24x faster
2000000 loops, best of 5: 114 nsec per loop
1000000 loops, best of 5: 294 nsec per loop
500000 loops, best of 5: 517 nsec per loop
1000000 loops, best of 5: 221 nsec per loop
5000000 loops, best of 5: 91.9 nsec per loop
rfind chars mixed case - 6.16x slower
2000000 loops, best of 5: 115 nsec per loop
1000000 loops, best of 5: 301 nsec per loop
500000 loops, best of 5: 518 nsec per loop
500 loops, best of 5: 598 usec per loop
500000 loops, best of 5: 708 nsec per loop
rfind chars worst case - 1.06x slower
5 loops, best of 5: 51.3 msec per loop
5 loops, best of 5: 53.3 msec per loop
50 loops, best of 5: 7 msec per loop
200 loops, best of 5: 1.19 msec per loop
200 loops, best of 5: 1.26 msec per loop
rfind subs best case - 2.41x faster
1000000 loops, best of 5: 359 nsec per loop
500000 loops, best of 5: 574 nsec per loop
1000000 loops, best of 5: 229 nsec per loop
2000000 loops, best of 5: 94.9 nsec per loop
rfind subs mixed case - 2.26x slower
1000000 loops, best of 5: 368 nsec per loop
500000 loops, best of 5: 542 nsec per loop
500 loops, best of 5: 724 usec per loop
500000 loops, best of 5: 832 nsec per loop
rfind subs worst case - 1.04x slower
5 loops, best of 5: 53.9 msec per loop
50 loops, best of 5: 7 msec per loop
200 loops, best of 5: 1.44 msec per loop
200 loops, best of 5: 1.5 msec per loop
rfind many suffixes - 54.8x slower
1 loop, best of 5: 605 msec per loop
500 loops, best of 5: 484 usec per loop
10 loops, best of 5: 24.6 msec per loop
10 loops, best of 5: 26.5 msec per loop
rfind many infixes - 2.60x slower
1 loop, best of 5: 603 msec per loop
50 loops, best of 5: 9.37 msec per loop
10 loops, best of 5: 22.5 msec per loop
10 loops, best of 5: 24.4 msec per loop

Old benchmark on Ubuntu

expand
find best case - 2.02x faster - regex - 2.77x slower
1000000 loops, best of 5: 303 nsec per loop
1000000 loops, best of 5: 357 nsec per loop
1000000 loops, best of 5: 260 nsec per loop
2000000 loops, best of 5: 129 nsec per loop
find mixed case - 1.25x faster - regex - 1.54x slower
1000000 loops, best of 5: 301 nsec per loop
1000000 loops, best of 5: 370 nsec per loop
10000 loops, best of 5: 23.9 usec per loop
1000000 loops, best of 5: 240 nsec per loop
find worst case - 1.33x faster - regex - 500x slower
5 loops, best of 5: 47.2 msec per loop
20 loops, best of 5: 17.1 msec per loop
5000 loops, best of 5: 45.5 usec per loop
10000 loops, best of 5: 34.2 usec per loop

rfind best case - 1.28x faster - regex - 84,049x slower
1000000 loops, best of 5: 209 nsec per loop
20 loops, best of 5: 13.7 msec per loop
1000000 loops, best of 5: 265 nsec per loop
2000000 loops, best of 5: 163 nsec per loop
rfind mixed case - 1.27x slower - regex - 62,673x slower
1000000 loops, best of 5: 217 nsec per loop
20 loops, best of 5: 13.6 msec per loop
10000 loops, best of 5: 23.8 usec per loop
1000000 loops, best of 5: 275 nsec per loop
rfind worst case - 1.30x faster - regex - 1,068x slower
5 loops, best of 5: 85.9 msec per loop
10 loops, best of 5: 37.8 msec per loop
5000 loops, best of 5: 44.6 usec per loop
10000 loops, best of 5: 35.4 usec per loop

📚 Documentation preview 📚: https://cpython-previews--119501.org.readthedocs.build/

Copy link
Contributor Author

@nineteendo nineteendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix bytes & bytearray test

Lib/test/string_tests.py Outdated Show resolved Hide resolved
Lib/test/string_tests.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@nineteendo nineteendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix space

@nineteendo nineteendo marked this pull request as ready for review May 24, 2024 12:32
@nineteendo
Copy link
Contributor Author

nineteendo commented May 24, 2024

Could someone run the benchmark on Linux? I believe it will make it faster in all cases (at least for relatively small strings).

@eendebakpt
Copy link
Contributor

Could someone run the benchmark on Linux? I believe it will make it faster in all cases (at least for relatively small strings).

What about find where the argument is not a tuple but a string? Will that become slower?

@nineteendo
Copy link
Contributor Author

nineteendo commented May 24, 2024

EDIT: 2ns faster:

script
# find_tuple.sh
echo find && main/python.exe -m timeit "'foobar'.find('foo')" && find-tuple/python.exe -m timeit "'foobar'.find('foo')"
echo rfind && main/python.exe -m timeit "'foobar'.rfind('foo')" && find-tuple/python.exe -m timeit "'foobar'.rfind('foo')"
find
10000000 loops, best of 5: 34 nsec per loop
10000000 loops, best of 5: 32.2 nsec per loop
rfind
10000000 loops, best of 5: 36.8 nsec per loop
10000000 loops, best of 5: 35.9 nsec per loop

@elisbyberi
Copy link

@nineteendo There is an issue that has not been addressed in the discussion: https://discuss.python.org/t/add-tuple-support-to-more-str-functions/50628/66

Why? If we provide such APIs, why we can ignore long input + many words use cases?

General purpose methods in Python are expected to be works well for non small input.
I expect user may do one_mega_string.count(tuple(thousands_of_words)).

@nineteendo
Copy link
Contributor Author

nineteendo commented May 24, 2024

  • While it doesn't short-circuit for long input, it performs a lot better than re in the absolute worst case.
  • For many words you can use re as there will likely be patterns.
  • We can put in the docs that rfind(subs) equivalent to max(string.rfind(sub1), string.rfind(sub2), ...) and you shouldn't expect a huge improvement.

@nineteendo nineteendo marked this pull request as draft May 24, 2024 19:55
@nineteendo
Copy link
Contributor Author

I'm going to try to improve the mixed case.

@nineteendo nineteendo marked this pull request as ready for review May 24, 2024 20:56
@nineteendo
Copy link
Contributor Author

Could someone run the benchmark on Linux?

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved
Lib/test/string_tests.py Show resolved Hide resolved
Lib/test/string_tests.py Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/unicodeobject.c Outdated Show resolved Hide resolved
Objects/bytes_methods.c Outdated Show resolved Hide resolved
@erlend-aasland
Copy link
Contributor

Adding do-not-merge, since the linked issue is closed as wont-implement.

Objects/bytes_methods.c Outdated Show resolved Hide resolved
Objects/bytes_methods.c Outdated Show resolved Hide resolved
@nineteendo
Copy link
Contributor Author

nineteendo commented Jun 2, 2024

@serhiy-storchaka all your issues are now addressed. I don't like the way the heap_subs are cleaned up, but it does work.

if (heap_subs) {
    for (Py_ssize_t i = 0; i < subs_len; i++) {
        PyMem_Free((void *)heap_subs[i]);
    }
}

Could you please review again? If you think the code is too complex, remember you asked for this. I wanted to keep this as simple as possible, but I have been repeatedly asked to further optimise it.

@erlend-aasland
Copy link
Contributor

Reminding the reviewers that there are mixed opinions regarding adding this feature at all (hence the linked issue closed as wont-implement); AFAIK, no core dev has expressed immense support of the idea.

@erlend-aasland
Copy link
Contributor

Ideally, this PoC should be kept at your fork; the CPython repo is not the place for experimentation. Consider closing this PR and continue working on it on your fork.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think the code is too complex, remember you asked for this. I wanted to keep this as simple as possible, but I have been repeatedly asked to further optimise it.

I did not ask for this. I only pointed out that the current version was not optimal and did repeatedly unnecessary things. If the simpler version be merged, we would spend months or years on optimizing it. I given some hints about how it could be optimized, but predicted that the result will be complex.

I am not enthusiastic about this feature, because, on one hand, it is not algorithmically optimal (the optimal algorithm needs the costly preparation step, the set of needles should be "compiled" before use), and on other hand, it is too complex for "practicality beats purity".

Objects/bytes_methods.c Outdated Show resolved Hide resolved
@erlend-aasland erlend-aasland added the pending The issue will be closed if no feedback is provided label Jun 3, 2024
@nineteendo
Copy link
Contributor Author

nineteendo commented Jun 3, 2024

the optimal algorithm needs the costly preparation step

Costly it is indeed. The only area where it's faster is where the performance wasn't needed (the worst case)!
We can get better performance there by simply setting the CHUNK_SIZE to 10,000 which affects the other cases much less.
I wanted to be cooperative and follow your suggestions, but that was a huge waste of my time. Is it fine to revert this?
How was I supposed to know PyMem_RawMalloc() is very slow?

find chars best case - 3.26x slower
5000000 loops, best of 5: 76 nsec per loop
1000000 loops, best of 5: 248 nsec per loop
find chars mixed case - 2.71x slower
5000000 loops, best of 5: 98.4 nsec per loop
1000000 loops, best of 5: 267 nsec per loop
find chars worst case - 1.22x faster
5000 loops, best of 5: 54.1 usec per loop
5000 loops, best of 5: 44.4 usec per loop
find subs best case - 3.45x slower
5000000 loops, best of 5: 72.7 nsec per loop
1000000 loops, best of 5: 251 nsec per loop
find subs mixed case - 1.21x slower
500000 loops, best of 5: 824 nsec per loop
500000 loops, best of 5: 999 nsec per loop
find subs worst case - no difference
200 loops, best of 5: 1.52 msec per loop
200 loops, best of 5: 1.51 msec per loop
find many prefixes - 1.02x faster
10 loops, best of 5: 27 msec per loop
10 loops, best of 5: 26.4 msec per loop
find many infixes - 1.03x faster
10 loops, best of 5: 24.6 msec per loop
10 loops, best of 5: 24 msec per loop

rfind chars best case - 3x slower
5000000 loops, best of 5: 91.9 nsec per loop
1000000 loops, best of 5: 276 nsec per loop
rfind chars mixed case - 1.25x slower
500000 loops, best of 5: 708 nsec per loop
500000 loops, best of 5: 887 nsec per loop
rfind chars worst case - no difference
200 loops, best of 5: 1.26 msec per loop
200 loops, best of 5: 1.26 msec per loop
rfind subs best case - 2.95x slower
2000000 loops, best of 5: 94.9 nsec per loop
1000000 loops, best of 5: 280 nsec per loop
rfind subs mixed case - 1.23x slower
500000 loops, best of 5: 832 nsec per loop
200000 loops, best of 5: 1.02 usec per loop
rfind subs worst case - no difference
200 loops, best of 5: 1.5 msec per loop
200 loops, best of 5: 1.49 msec per loop
rfind many suffixes - 1.02x faster
10 loops, best of 5: 26.5 msec per loop
10 loops, best of 5: 25.9 msec per loop
rfind many infixes - 1.03x faster
10 loops, best of 5: 24.4 msec per loop
10 loops, best of 5: 23.8 msec per loop

@erlend-aasland
Copy link
Contributor

It looks indeed like this is not going anywhere. I suggest you continue your experiments on your own fork. If you manage to get an improved version up and running, try first to gain traction for the feature on Discord, before you create a new issue/PR. So far, no core dev is super enthusiastic about this, which means that this PR (and any other like it) is only a waste of CI and review resources.

@serhiy-storchaka
Copy link
Member

I am sorry, but yes, without support of at least one of core developers it is a waste of your and our time.

I worked on similar code, so I can estimate how much it could cost. I could be wrong, and I'd like to be wrong in this case, but it seems that at this stage the cost/benefit ratio is too high. Your current code likely has bugs, and fixing them can make it even more complicated. It has a potential for simplification (the forward and the backward loops can be merged), but you need to understand the code from top to bottom to make it simpler and efficient, and this may be not enough. I hope you learned something new and can write better code from beginning next time and better estimate the cost of future changes.

Raw memory management is relatively slow. In this case you can use an array of constants size allocated on the stack for small number of needles. It is also more efficient to allocate a single buffer in dynamic memory than several buffers.

Keeping a reference to the object providing a buffer is not enough, for example the bytearray object can be resized if the buffer is released, and its internal buffer can be allocated in different place.

@nineteendo
Copy link
Contributor Author

nineteendo commented Jun 3, 2024

If you manage to get an improved version up and running, try first to gain traction for the feature on Discord, before you create a new issue/PR.

I have an improved version here using dynamic chunk sizes: nineteendo#2. It's now the fastest algorithm in the benchmark for cases where you wouldn't use regex (if you ignore that rfind() is 9% slower because memrchr() doesn't exist on macOS). The code is also a lot more readable now. I've asked dgrigonis to post a message on Discourse.

In this case you can use an array of constants size allocated on the stack for small number of needles.

That really didn't feel right to me. It's a lot of code for a very small improvement in the worst case. While the new pull request destroys this strategy using a lot less code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting change review DO-NOT-MERGE pending The issue will be closed if no feedback is provided
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants