Skip to content

gh-118184: Support tuples for find, index, rfind & rindex #119501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 86 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
3632624
Support tuples for `find` & `rfind`
nineteendo May 24, 2024
e39b040
Update docs
nineteendo May 24, 2024
cb905bc
Add tests
nineteendo May 24, 2024
1807fd8
📜🤖 Added by blurb_it.
blurb-it[bot] May 24, 2024
cca08fa
Apply suggestions from code review
nineteendo May 24, 2024
302faa3
Apply suggestions from code review
nineteendo May 24, 2024
cb95578
Fix signature tests
nineteendo May 24, 2024
a35d3ae
Short circuit
nineteendo May 24, 2024
65c0a9e
Fix start for `rfind`
nineteendo May 25, 2024
5cbb1f0
Refactor checks
nineteendo May 25, 2024
00b2b04
Fix end for `rfind`
nineteendo May 25, 2024
e124603
Adjust indices
nineteendo May 25, 2024
41b0cd8
Micro optimisation
nineteendo May 25, 2024
7b83a22
Fix conversion
nineteendo May 25, 2024
c905458
Fix condition
nineteendo May 25, 2024
5c79f24
Add tests
nineteendo May 25, 2024
148b471
Clarify documentation
nineteendo May 25, 2024
351dc83
Add constant
nineteendo May 25, 2024
ddaf4b4
Duplicate constant
nineteendo May 25, 2024
2b044a1
Add tests
nineteendo May 25, 2024
a632f25
Remove newline
nineteendo May 25, 2024
ef28dab
Update Lib/test/string_tests.py
nineteendo May 25, 2024
4207d54
Update Lib/test/string_tests.py
nineteendo May 25, 2024
0dff482
Update Lib/test/string_tests.py
nineteendo May 25, 2024
fc0d9ea
Update Lib/test/string_tests.py
nineteendo May 25, 2024
cd317fd
Don't check twice on boundary
nineteendo May 25, 2024
43e8259
Apply suggestions from code review
nineteendo May 25, 2024
2524dc1
Apply suggestions from code review
nineteendo May 25, 2024
dbc8c94
Test bytes
nineteendo May 25, 2024
49a28a0
Add more bytes tests
nineteendo May 25, 2024
0bd606d
Support tuples for index & rindex
nineteendo May 25, 2024
b337fdc
Update Objects/bytes_methods.c
nineteendo May 25, 2024
e43373f
Update Misc/NEWS.d/next/Core and Builtins/2024-05-24-11-07-16.gh-issu…
nineteendo May 25, 2024
b47b0e0
Update docs
nineteendo May 25, 2024
6f71b39
Refactor code
nineteendo May 26, 2024
64ef311
Fix error message
nineteendo May 26, 2024
a116f33
Add asserts
nineteendo May 26, 2024
e29828d
Remove unnecessary check
nineteendo May 26, 2024
a85f84a
Revert "Remove unnecessary check"
nineteendo May 26, 2024
ac19e87
Optimise length of 0 & 1
nineteendo May 26, 2024
b62e8b4
Avoid testing with tuples of 1 item
nineteendo May 26, 2024
b6492db
Simplify news.
nineteendo May 26, 2024
dd23e04
Fix indentation
nineteendo May 26, 2024
38d2df8
Handle -2
nineteendo May 26, 2024
223cb1b
Update Misc/NEWS.d/next/Core and Builtins/2024-05-24-11-07-16.gh-issu…
nineteendo May 27, 2024
bc29c92
Guard overflow
nineteendo May 27, 2024
f14ee7d
Tweak `FIND_CHUNK_SIZE`
nineteendo May 27, 2024
3606e00
Refer to `re` & `regex`
nineteendo May 28, 2024
9e2006c
Release buffer
nineteendo May 28, 2024
fb48c41
Release other buffer
nineteendo May 29, 2024
308174c
Save lengths
nineteendo May 29, 2024
6a3d651
malloc
nineteendo May 29, 2024
3227e63
Fix malloc
nineteendo May 29, 2024
70d673f
Store needles for bytes
nineteendo Jun 1, 2024
7b205b3
Revert test
nineteendo Jun 1, 2024
0664ced
Restructure code
nineteendo Jun 1, 2024
b132742
Fix smelly symbol
nineteendo Jun 1, 2024
8189c66
Make static
nineteendo Jun 1, 2024
53d3a07
Remove variable
nineteendo Jun 1, 2024
648725d
Reverse comparison
nineteendo Jun 1, 2024
4fe06fb
Add brackets
nineteendo Jun 1, 2024
145f45d
Remove continue
nineteendo Jun 1, 2024
c96775c
2 arguments per line
nineteendo Jun 1, 2024
b4722c4
Exclude long needles
nineteendo Jun 1, 2024
5c8751a
Include needles with a larger kind
nineteendo Jun 1, 2024
c219cf5
fast find for strings
nineteendo Jun 2, 2024
ccbfa0e
Fix argument type
nineteendo Jun 2, 2024
aada7f5
Rename argument
nineteendo Jun 2, 2024
090ddee
Decrease diff
nineteendo Jun 2, 2024
6b85fd7
Decrease diff 2
nineteendo Jun 2, 2024
41a6c20
Decrease diff 3
nineteendo Jun 2, 2024
460effa
Remove continue
nineteendo Jun 2, 2024
d1c4af6
Parentheses
nineteendo Jun 2, 2024
c19ddcf
Store converted needles on the heap
nineteendo Jun 2, 2024
6beae49
cleanup
nineteendo Jun 2, 2024
ff514be
Fix uninitialised variable
nineteendo Jun 2, 2024
ff6eea2
Try to prevent segmentation fault
nineteendo Jun 2, 2024
0cbf03a
Fix cast
nineteendo Jun 2, 2024
d412046
Revert "Fix cast"
nineteendo Jun 2, 2024
168fe84
Revert "Try to prevent segmentation fault"
nineteendo Jun 2, 2024
41b11e5
Uninitialised memory?
nineteendo Jun 3, 2024
ffe1152
More tests
nineteendo Jun 3, 2024
ac91f79
Rename parameter
nineteendo Jun 3, 2024
6751992
Unnest
nineteendo Jun 3, 2024
44aebd1
Keep buffers acquired during search
nineteendo Jun 3, 2024
9a51fd9
Add `buffers_len`
nineteendo Jun 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 54 additions & 10 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1724,8 +1724,13 @@ expression support in the :mod:`re` module).
.. method:: str.find(sub[, start[, end]])

Return the lowest index in the string where substring *sub* is found within
the slice ``s[start:end]``. Optional arguments *start* and *end* are
interpreted as in slice notation. Return ``-1`` if *sub* is not found.
the slice ``s[start:end]``. *sub* can also be a tuple of substrings to look
for. In this case the returned index, if found, will be the index of the
first match. Optional arguments *start* and *end* are interpreted as in
slice notation. Return ``-1`` if *sub* is not found.

.. seealso::
The :mod:`re` module, which provides advanced pattern matching.

.. note::

Expand All @@ -1736,6 +1741,9 @@ expression support in the :mod:`re` module).
>>> 'Py' in 'Python'
True

.. versionchanged:: 3.14
*sub* can now be a tuple of substrings.


.. method:: str.format(*args, **kwargs)

Expand Down Expand Up @@ -1789,6 +1797,9 @@ expression support in the :mod:`re` module).
Like :meth:`~str.find`, but raise :exc:`ValueError` when the substring is
not found.

.. versionchanged:: 3.14
*sub* can now be a tuple of substrings.


.. method:: str.isalnum()

Expand Down Expand Up @@ -2030,15 +2041,26 @@ expression support in the :mod:`re` module).
.. method:: str.rfind(sub[, start[, end]])

Return the highest index in the string where substring *sub* is found, such
that *sub* is contained within ``s[start:end]``. Optional arguments *start*
and *end* are interpreted as in slice notation. Return ``-1`` on failure.
that *sub* is contained within ``s[start:end]``. *sub* can also be a tuple
of substrings to look for. In this case the returned index, if found, will
be the index of the last match. Optional arguments *start* and *end* are
interpreted as in slice notation. Return ``-1`` on failure.

.. seealso::
The third-party :pypi:`regex` module, which provides advanced pattern matching.

.. versionchanged:: 3.14
*sub* can now be a tuple of substrings.


.. method:: str.rindex(sub[, start[, end]])

Like :meth:`rfind` but raises :exc:`ValueError` when the substring *sub* is not
found.

.. versionchanged:: 3.14
*sub* can now be a tuple of substrings.


.. method:: str.rjust(width[, fillchar])

Expand Down Expand Up @@ -2859,13 +2881,18 @@ arbitrary binary data.
bytearray.find(sub[, start[, end]])

Return the lowest index in the data where the subsequence *sub* is found,
such that *sub* is contained in the slice ``s[start:end]``. Optional
arguments *start* and *end* are interpreted as in slice notation. Return
``-1`` if *sub* is not found.
such that *sub* is contained in the slice ``s[start:end]``. *sub* can
also be a tuple of subsequences to look for. In this case the returned
index, if found, will be the index of the first match. Optional arguments
*start* and *end* are interpreted as in slice notation. Return ``-1`` if
*sub* is not found.

The subsequence to search for may be any :term:`bytes-like object` or an
integer in the range 0 to 255.

.. seealso::
The :mod:`re` module, which provides advanced pattern matching.

.. note::

The :meth:`~bytes.find` method should be used only if you need to know the
Expand All @@ -2878,6 +2905,9 @@ arbitrary binary data.
.. versionchanged:: 3.3
Also accept an integer in the range 0 to 255 as the subsequence.

.. versionchanged:: 3.14
*sub* can now be a tuple of subsequences.


.. method:: bytes.index(sub[, start[, end]])
bytearray.index(sub[, start[, end]])
Expand All @@ -2891,6 +2921,9 @@ arbitrary binary data.
.. versionchanged:: 3.3
Also accept an integer in the range 0 to 255 as the subsequence.

.. versionchanged:: 3.14
*sub* can now be a tuple of subsequences.


.. method:: bytes.join(iterable)
bytearray.join(iterable)
Expand Down Expand Up @@ -2947,16 +2980,24 @@ arbitrary binary data.
bytearray.rfind(sub[, start[, end]])

Return the highest index in the sequence where the subsequence *sub* is
found, such that *sub* is contained within ``s[start:end]``. Optional
arguments *start* and *end* are interpreted as in slice notation. Return
``-1`` on failure.
found, such that *sub* is contained within ``s[start:end]``. *sub* can
also be a tuple of subsequences to look for. In this case the returned
index, if found, will be the index of the last match. Optional arguments
*start* and *end* are interpreted as in slice notation. Return ``-1`` on
failure.

The subsequence to search for may be any :term:`bytes-like object` or an
integer in the range 0 to 255.

.. seealso::
The third-party :pypi:`regex` module, which provides advanced pattern matching.

.. versionchanged:: 3.3
Also accept an integer in the range 0 to 255 as the subsequence.

.. versionchanged:: 3.14
*sub* can now be a tuple of subsequences.


.. method:: bytes.rindex(sub[, start[, end]])
bytearray.rindex(sub[, start[, end]])
Expand All @@ -2970,6 +3011,9 @@ arbitrary binary data.
.. versionchanged:: 3.3
Also accept an integer in the range 0 to 255 as the subsequence.

.. versionchanged:: 3.14
*sub* can now be a tuple of subsequences.


.. method:: bytes.rpartition(sep)
bytearray.rpartition(sep)
Expand Down
67 changes: 67 additions & 0 deletions Lib/test/string_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,8 +180,10 @@ def test_find(self):

if self.contains_bytes:
self.checkequal(-1, 'hello', 'find', 42)
self.checkequal(-1, 'hello', 'find', (42,))
else:
self.checkraises(TypeError, 'hello', 'find', 42)
self.checkraises(TypeError, 'hello', 'find', (42,))

self.checkequal(0, '', 'find', '')
self.checkequal(-1, '', 'find', '', 1, 1)
Expand Down Expand Up @@ -217,6 +219,33 @@ def test_find(self):
if loc != -1:
self.assertEqual(i[loc:loc+len(j)], j)

# test tuple arguments
N = 1000 # FIND_CHUNK_SIZE
self.checkequal(0, 'foo', 'find', ('foo',))
self.checkequal(-1, 'foo', 'find', ('bar',))
self.checkequal(2, '__aa__bb__', 'find', ('aa', 'bb'))
self.checkequal(2, '__aa__bb__', 'find', ('bb', 'aa'))
self.checkequal(-1, '__aa__bb__', 'find', ('cc', 'dd'))
self.checkequal(-1, '__aa__bb__', 'find', ())
self.checkequal(6, '__aa__bb__', 'find', ('aa', 'bb'), 3)
self.checkequal(-1, '__aa__bb__', 'find', ('aa', 'cc'), 3)
self.checkequal(2, '__aa__bb__', 'find', ('aa', 'bb'), 0, 10)
self.checkequal(-1, '__aa__bb__', 'find', ('aa', 'bb'), 0, 3)
self.checkequal(2, '__aa__bb__', 'find', ('aa', 'bb'), 0, 4)
self.checkraises(TypeError, 'hello', 'find', (None,))
s = '_' * (N - 2) + 'aaaa' + '_' * (N - 2)
self.checkequal((N - 2), s, 'find', ('aaaa', 'bb'))
self.checkequal(2, 'foobar', 'find', ('ob', 'oba'))
self.checkequal(1, 'foobar', 'find', ('ob', 'oob'))
self.checkequal(0, '', 'find', ('', '_'))
self.checkequal(2, '__abcd__', 'find', ('cd', 'ab'))
self.checkequal(2, '__abc__', 'find', ('bc', 'ab'))
self.checkequal(1, 'a' + 'b' * N, 'find', ('b' * N, 'c'))
s = 'ab' + 'c' * (10 * N)
self.checkequal(1, s, 'find', ('c' * (10 * N), 'b' + 'c' * (10 * N)))
self.checkequal(0, 'foobar', 'find', ('foo', 'bar'))
self.checkequal(-1, 'foo', 'find', ('foobar',))

def test_rfind(self):
self.checkequal(9, 'abcdefghiabc', 'rfind', 'abc')
self.checkequal(12, 'abcdefghiabc', 'rfind', '')
Expand All @@ -238,8 +267,10 @@ def test_rfind(self):

if self.contains_bytes:
self.checkequal(-1, 'hello', 'rfind', 42)
self.checkequal(-1, 'hello', 'rfind', (42,))
else:
self.checkraises(TypeError, 'hello', 'rfind', 42)
self.checkraises(TypeError, 'hello', 'rfind', (42,))

# For a variety of combinations,
# verify that str.rfind() matches __contains__
Expand Down Expand Up @@ -270,6 +301,32 @@ def test_rfind(self):
# issue #15534
self.checkequal(0, '<......\u043c...', "rfind", "<")

# test tuple arguments
N = 1000 # RFIND_CHUNK_SIZE
self.checkequal(0, 'foo', 'rfind', ('foo',))
self.checkequal(-1, 'foo', 'rfind', ('bar',))
self.checkequal(6, '__aa__bb__', 'rfind', ('aa', 'bb'))
self.checkequal(6, '__aa__bb__', 'rfind', ('bb', 'aa'))
self.checkequal(-1, '__aa__bb__', 'rfind', ('cc', 'dd'))
self.checkequal(-1, '__aa__bb__', 'rfind', ())
self.checkequal(-1, '__aa__bb__', 'rfind', ('aa', 'cc'), 3)
self.checkequal(6, '__aa__bb__', 'rfind', ('aa', 'bb'), 0, 10)
self.checkequal(-1, '__aa__bb__', 'rfind', ('aa', 'bb'), 7, 10)
self.checkequal(6, '__aa__bb__', 'rfind', ('aa', 'bb'), 6, 10)
self.checkraises(TypeError, 'hello', 'rfind', (None,))
s = '_' * (N - 2) + 'aaaa' + '_' * (N - 2)
self.checkequal((N - 2), s, 'rfind', ('aaaa', 'bb'))
self.checkequal(2, 'foobar', 'rfind', ('oba', 'ob'))
self.checkequal(2, 'foobar', 'rfind', ('oob', 'ob'))
self.checkequal(1, '_', 'rfind', ('', 'a'))
self.checkequal(4, '__abcd__', 'rfind', ('ab', 'cd'))
self.checkequal(3, '__abc__', 'rfind', ('ab', 'bc'))
self.checkequal(0, 'b' * N + 'a', 'rfind', ('b' * N, 'c'))
s = 'ab' + 'c' * (10 * N)
self.checkequal(2, s, 'rfind', ('c' * (10 * N), 'b' + 'c' * (10 * N)))
self.checkequal(3, 'foo', 'rfind', ('', 'foo'))
self.checkequal(-1, 'foo', 'rfind', ('foobar',))

def test_index(self):
self.checkequal(0, 'abcdefghiabc', 'index', '')
self.checkequal(3, 'abcdefghiabc', 'index', 'def')
Expand All @@ -295,6 +352,11 @@ def test_index(self):
else:
self.checkraises(TypeError, 'hello', 'index', 42)

# test tuple arguments (should be wrapper around find)
self.checkequal(2, '__aa__bb__', 'index', ('aa', 'bb'))
self.checkequal(2, '__aa__bb__', 'index', ('aa', 'bb'))
self.checkraises(ValueError, '__aa__bb__', 'index', ('cc', 'dd'))

def test_rindex(self):
self.checkequal(12, 'abcdefghiabc', 'rindex', '')
self.checkequal(3, 'abcdefghiabc', 'rindex', 'def')
Expand All @@ -321,6 +383,11 @@ def test_rindex(self):
else:
self.checkraises(TypeError, 'hello', 'rindex', 42)

# test tuple arguments (should be wrapper around rfind)
self.checkequal(6, '__aa__bb__', 'rindex', ('aa', 'bb'))
self.checkequal(6, '__aa__bb__', 'rindex', ('bb', 'aa'))
self.checkraises(ValueError, '__aa__bb__', 'rindex', ('cc', 'dd'))

def test_find_periodic_pattern(self):
"""Cover the special path for periodic patterns."""
def reference_find(p, s):
Expand Down
12 changes: 12 additions & 0 deletions Lib/test/test_bytes.py
Original file line number Diff line number Diff line change
Expand Up @@ -644,6 +644,12 @@ def test_find(self):
ValueError, r'byte must be in range\(0, 256\)',
b.find, index)

# test tuple arguments
self.assertEqual(b.find((i,)), 1)
self.assertEqual(b.find((w,)), -1)
self.assertEqual(b.find((i, w)), 1)
self.assertEqual(b.find((w, i)), 1)

def test_rfind(self):
b = self.type2test(b'mississippi')
i = 105
Expand All @@ -663,6 +669,12 @@ def test_rfind(self):
self.assertEqual(b.rfind(i, 3, 9), 7)
self.assertEqual(b.rfind(w, 1, 3), -1)

# test tuple arguments
self.assertEqual(b.rfind((i,)), 10)
self.assertEqual(b.rfind((w,)), -1)
self.assertEqual(b.rfind((i, w)), 10)
self.assertEqual(b.rfind((w, i)), 10)

def test_index(self):
b = self.type2test(b'mississippi')
i = 105
Expand Down
4 changes: 2 additions & 2 deletions Lib/test/test_inspect/test_inspect.py
Original file line number Diff line number Diff line change
Expand Up @@ -5414,7 +5414,7 @@ def test_builtins_have_signatures(self):
'dict': {'pop'},
'int': {'__round__'},
'memoryview': {'cast', 'hex'},
'str': {'count', 'endswith', 'find', 'index', 'maketrans', 'rfind', 'rindex', 'startswith'},
'str': {'count', 'endswith', 'maketrans', 'startswith'},
}
self._test_module_has_signatures(builtins,
no_signature, unsupported_signature,
Expand Down Expand Up @@ -5589,7 +5589,7 @@ def test_typing_module_has_signatures(self):
'Generic': {'__class_getitem__', '__init_subclass__'},
}
methods_unsupported_signature = {
'Text': {'count', 'find', 'index', 'rfind', 'rindex', 'startswith', 'endswith', 'maketrans'},
'Text': {'count', 'startswith', 'endswith', 'maketrans'},
}
self._test_module_has_signatures(typing, no_signature,
methods_no_signature=methods_no_signature,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Support tuples for :class:`str`, :class:`bytes` and :class:`bytearray` methods ``find()``, ``index()``, ``rfind()`` and ``rindex()``.
Loading