Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-96354: Port docstring of unicode_find family to Argument Clinic #96356

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ofey404
Copy link
Contributor

@ofey404 ofey404 commented Aug 28, 2022

There are 9 method docstring in str, which is not consistent to the rest.

  • They are using old PyDoc_STRVAR docstring generation.
  • count, endswith, find, format, format_map, index, rfind, rindex and startswith

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@ofey404
Copy link
Contributor Author

ofey404 commented Aug 28, 2022

I'm fixing failure in Lib/test/string_tests.py.

It seems not so straightforward for the Argument Clinic to do the same thing as parse_args_finds_unicode.

$ ./python -m unittest -v test.test_userstring.UserStringTest.test_find
test_find (test.test_userstring.UserStringTest.test_find) ... ERROR

======================================================================
ERROR: test_find (test.test_userstring.UserStringTest.test_find)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ofey/Code/cpython-source-reading/cpython/Lib/test/string_tests.py", line 176, in test_find
    self.checkequal(12, 'rrarrrrrrrrra', 'find', 'a', 4, None)
  File "/home/ofey/Code/cpython-source-reading/cpython/Lib/test/test_userstring.py", line 24, in checkequal
    realresult = getattr(object, methodname)(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ofey/Code/cpython-source-reading/cpython/Lib/collections/__init__.py", line 1465, in find
    return self.data.find(sub, start, end)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'NoneType' object cannot be interpreted as an integer

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (errors=1)

@ofey404
Copy link
Contributor Author

ofey404 commented Aug 28, 2022

parse_args_finds_unicode would accept None on the position start and end, then:

  • Converting them to 0 and PY_SSIZE_T_MAX under the hood.

Should we implement this behavior in Argument Clinic?

  • Or another ugly way is to accept start and end as PyObject, then do the conversion in the implementation function.
  • I would prototype this.

Py_LOCAL_INLINE(int)
STRINGLIB(parse_args_finds)(const char * function_name, PyObject *args,
PyObject **subobj,
Py_ssize_t *start, Py_ssize_t *end)
{
PyObject *tmp_subobj;
Py_ssize_t tmp_start = 0;
Py_ssize_t tmp_end = PY_SSIZE_T_MAX;
PyObject *obj_start=Py_None, *obj_end=Py_None;
char format[FORMAT_BUFFER_SIZE] = "O|OO:";
size_t len = strlen(format);
strncpy(format + len, function_name, FORMAT_BUFFER_SIZE - len - 1);
format[FORMAT_BUFFER_SIZE - 1] = '\0';
if (!PyArg_ParseTuple(args, format, &tmp_subobj, &obj_start, &obj_end))
return 0;
/* To support None in "start" and "end" arguments, meaning
the same as if they were not passed.
*/
if (obj_start != Py_None)
if (!_PyEval_SliceIndex(obj_start, &tmp_start))
return 0;
if (obj_end != Py_None)
if (!_PyEval_SliceIndex(obj_end, &tmp_end))
return 0;
*start = tmp_start;
*end = tmp_end;
*subobj = tmp_subobj;
return 1;
}

@arhadthedev
Copy link
Member

Should we implement this behavior in Argument Clinic

I believe we should implements it as argument converters of str functions:

    start: Py_ssize_t = 0
    end: Py_ssize_t(c_default="PY_SSIZE_T_MAX") = sys.maxsize

@arhadthedev
Copy link
Member

By the way, thank you for using ensure_unicode() in your PR; I tried to enforce the clinic to generate the check on its side (https://github.com/python/cpython/compare/main...arhadthedev:cpython:asciilibparseargsfinds-ac?expand=1).

In addition to the porting, I've clarified the docstrings a little; you can do this too:

  • Sa string
  • With optional start, test S beginning at that position. With optional end, stop comparing S at that position.Optional arguments start and end are interpreted as in slice notation. (so all str-related functions have the same wording)
  • The substring is contained within S[start:end].The substring is contained within self[start:end].

@arhadthedev
Copy link
Member

Another useful feature of the clinic: you can set a return type to Py_ssize_t (like in str.count as unicode_count -> Py_ssize_t) and the tool will generate if ((_return_value == -1) && PyErr_Occurred()) { ... } for you; so you can effectively replace return NULL with return -1 (NULL means that the called function already set a Python exception so the generated check will trigger) and return PyLong_FromSsize_t(foo) with return foo.

@arhadthedev
Copy link
Member

Actually, you can freely take parts from my branch.

@ofey404
Copy link
Contributor Author

ofey404 commented Aug 28, 2022

Another useful feature of the clinic: you can set a return type to Py_ssize_t (like in str.count as unicode_count -> Py_ssize_t) and the tool will generate if ((_return_value == -1) && PyErr_Occurred()) { ... } for you; so you can effectively replace return NULL with return -1 (NULL means that the called function already set a Python exception so the generated check will trigger) and return PyLong_FromSsize_t(foo) with return foo.

Wow, that's neat. You are a really Argument Clinic Ninja.

Actually, you can freely take parts from my branch.

It really helps! And would you mind I add you as an author in my future commit?

@arhadthedev
Copy link
Member

And would you mind I add you as an author in my future commit?

Sure, no problem.

@ofey404 ofey404 force-pushed the ofey404/clinic-on-unicode-find-family branch from f494673 to 08b8bdd Compare August 28, 2022 11:57
@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@ofey404
Copy link
Contributor Author

ofey404 commented Aug 28, 2022

Or another ugly way is to accept start and end as PyObject, then do the conversion in the implementation function.

An early version, just find. Push just for triggering CI, in order to make a quick check of correctness.

@arhadthedev And I add your name as co-author ;)

Another useful feature of the clinic: you can set a return type to Py_ssize_t (like in str.count as unicode_count -> Py_ssize_t) and the tool will generate if ((_return_value == -1) && PyErr_Occurred()) { ... } for you; so you can effectively replace return NULL with return -1 (NULL means that the called function already set a Python exception so the generated check will trigger) and return PyLong_FromSsize_t(foo) with return foo.

I search in the codebase, and few function generated by clinic is using this feature. And no function in Objects/unicodeobject.c used it.

  • For consistency, maybe we just keep the return value as it is? Maybe save it to a general refactor in the future?

About the test I changed: Lib/test/string_tests.py:test_find_etc_raise_correct_error_messages:

Remove the left parenthesis in test cases like r'^find\(' is fine, since Argument Clinic generate error without ().

Take lstrip as an example:

 $ ./python -c '"".lstrip("", None)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: lstrip expected at most 1 argument, got 2

@ofey404 ofey404 force-pushed the ofey404/clinic-on-unicode-find-family branch from b68e866 to 43d94ec Compare August 28, 2022 15:20
@ofey404
Copy link
Contributor Author

ofey404 commented Aug 28, 2022

7 out of 9 methods are ported to Argument Clinic.

  • count, find, index, rfind, rindex, startswith and endswith

I think this PR is in a good enough shape to be reviewed.


format and format_map are kept as they are. Since they are imported from Objects/stringlib/unicode_format.h, more effort should be done if we want to port them.

  • Besides, I want to test if this kind of refactor would be accepted by the community.
  • So I should not dig too deep, but move fast.

About STRINGLIB(parse_args_finds)

I once tried to remove STRINGLIB(parse_args_finds), since most of its usage are eliminated in this PR.

  • But there is another usage in Objects/bytes_methods.c. So save it for the future.

parse_args_finds_byte(const char *function_name, PyObject *args,
PyObject **subobj, char *byte,
Py_ssize_t *start, Py_ssize_t *end)
{
PyObject *tmp_subobj;
Py_ssize_t ival;
if(!stringlib_parse_args_finds(function_name, args, &tmp_subobj,
start, end))

Co-authored-by: Oleg Iarygin <oleg@arhadthedev.net>
@ofey404 ofey404 force-pushed the ofey404/clinic-on-unicode-find-family branch from 43d94ec to 0792275 Compare August 29, 2022 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants