Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C API: Check in PyTuple_SET_ITEM() and PyList_SET_ITEM() #106168

Closed
vstinner opened this issue Jun 28, 2023 · 4 comments
Closed

C API: Check in PyTuple_SET_ITEM() and PyList_SET_ITEM() #106168

vstinner opened this issue Jun 28, 2023 · 4 comments
Labels
type-bug An unexpected behavior, bug, or error

Comments

@vstinner
Copy link
Member

vstinner commented Jun 28, 2023

The PyTuple_SET_ITEM() and PyList_SET_ITEM() functions don't check that the index is valid. It should be checked with an assertion.

Linked PRs

@vstinner vstinner added the type-bug An unexpected behavior, bug, or error label Jun 28, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Jun 28, 2023
PyTuple_SET_ITEM() and PyList_SET_ITEM() now check the index argument
with an assertion if Python is built in debug mode or is built with
assertions.

* PyStructSequence_GET_ITEM() and PyStructSequence_SET_ITEM() are now
  aliases to PyStructSequence_GetItem() and
  PyStructSequence_SetItem()
* PyStructSequence_GetItem() and PyStructSequence_SetItem() now check
  the index argument: must be lesser than REAL_SIZE(op).
vstinner added a commit to vstinner/cpython that referenced this issue Jun 28, 2023
PyTuple_SET_ITEM() and PyList_SET_ITEM() now check the index argument
with an assertion if Python is built in debug mode or is built with
assertions.

* PyStructSequence_GET_ITEM() and PyStructSequence_SET_ITEM() are now
  aliases to PyStructSequence_GetItem() and
  PyStructSequence_SetItem()
* PyStructSequence_GetItem() and PyStructSequence_SetItem() now check
  the index argument: must be lesser than REAL_SIZE(op).
vstinner added a commit to vstinner/cpython that referenced this issue Jun 28, 2023
PyTuple_SET_ITEM() and PyList_SET_ITEM() now check the index argument
with an assertion if Python is built in debug mode or is built with
assertions.

* PyStructSequence_GET_ITEM() and PyStructSequence_SET_ITEM() are now
  aliases to PyStructSequence_GetItem() and
  PyStructSequence_SetItem()
* PyStructSequence_GetItem() and PyStructSequence_SetItem() now check
  the index argument: must be lesser than REAL_SIZE(op).
vstinner added a commit to vstinner/cpython that referenced this issue Jun 28, 2023
PyTuple_SET_ITEM() and PyList_SET_ITEM() now check the index argument
with an assertion if Python is built in debug mode or is built with
assertions.

* list_extend() and _PyList_AppendTakeRef() now set the list size
  before calling PyList_SET_ITEM().
* PyStructSequence_GetItem() and PyStructSequence_SetItem() now check
  the index argument: must be lesser than REAL_SIZE(op).
* PyStructSequence_GET_ITEM() and PyStructSequence_SET_ITEM() are now
  aliases to PyStructSequence_GetItem() and
  PyStructSequence_SetItem().
vstinner added a commit that referenced this issue Jun 28, 2023
PyTuple_SET_ITEM() and PyList_SET_ITEM() now check the index argument
with an assertion if Python is built in debug mode or is built with
assertions.

* list_extend() and _PyList_AppendTakeRef() now set the list size
  before calling PyList_SET_ITEM().
* PyStructSequence_GetItem() and PyStructSequence_SetItem() now check
  the index argument: must be lesser than REAL_SIZE(op).
* PyStructSequence_GET_ITEM() and PyStructSequence_SET_ITEM() are now
  aliases to PyStructSequence_GetItem() and
  PyStructSequence_SetItem().
@scoder scoder reopened this Oct 29, 2023
@scoder
Copy link
Contributor

scoder commented Oct 29, 2023

The PyTuple_SET_ITEM() and PyList_SET_ITEM() functions don't check that the index is valid. It should be checked with an assertion.

I oppose this statement. Their documentation says that both allow unchecked writing into the list/tuple. Adding these checks breaks their contract.

@scoder
Copy link
Contributor

scoder commented Oct 29, 2023

If you insist on adding such a check, you can assert that the index is within the currently allocated array bounds. Writing outside of that range is definitely incorrect. However, writing beyond the current size may not be.

scoder added a commit to scoder/cython that referenced this issue Oct 29, 2023
scoder added a commit to scoder/cpython that referenced this issue Oct 30, 2023
…) against [0:allocated] instead of [0:size] to re-allow valid use cases that assign within the allocated area.
vstinner pushed a commit that referenced this issue Oct 30, 2023
…_ITEM() (#111480)

Check the index bound assertions in PyList_SET_ITEM() against [0:allocated] instead of [0:size] to re-allow valid use cases that assign within the allocated area.
@scoder
Copy link
Contributor

scoder commented Nov 1, 2023

[Copying my comment from the PR here as IMHO it describes the situation quite well and thus belongs rather into the ticket discussion.]

@vstinner wrote:

I dislike PyList_New() + PyList_SET_ITEM() API since the list is immediately tracked by the GC and so calling gc.get_objects() can expose an invalid list object (ex: calling repr(list) can crash)

That's why I prefer not requiring users to change the size before they set the value. Doing that would bring the list in an invalid state.

With this change, PyList_SetItem() and PyList_SET_ITEM() become inconsistent: PyList_SetItem() uses list->ob_size, whereas PyList_SET_ITEM() uses list->allocated.

I don't see an inconsistency here. They serve different needs, that's why we have two different functions. As you write, PyList_SetItem() decrefs the current value and replaces it with a new one. That can only safely be done within [0:size]. PyList_SET_ITEM() writes a value to an array position that does not yet have a value. That can only safely be done within [0:allocated]. Different needs, different functions, different boundary conditions.

You could rather argue that PyList_SET_ITEM() is only really valid within [size:allocated]. Within [0:size], it would overwrite existing, owned values. However, enforcing that would probably break even more code that might just look slightly brittle but is actually working perfectly fine (because it does what you suggested, it updates the size immediately before setting the value).

Cython uses the _SET_ITEM() functions to initialise freshly created lists and tuples, but it also calls PyList_SET_ITEM() in a fast, inlined PyList_Append() implementation:
https://github.com/cython/cython/blob/master/Cython/Utility/Optimize.c#L29-L77
We use this for list comprehensions, for example, where we control the list internally. As long as the next item fits into the already allocated array, we just put it there and increase the size.

@vstinner
Copy link
Member Author

vstinner commented Nov 1, 2023

I changed PyList the same way than PyTuple, but as you described, PyList is different and requires special treatement. I'm fine with #111480 that's why I merged it 👍

vstinner added a commit to vstinner/cpython that referenced this issue Nov 1, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Nov 1, 2023
FullteaR pushed a commit to FullteaR/cpython that referenced this issue Nov 3, 2023
…st_SET_ITEM() (python#111480)

Check the index bound assertions in PyList_SET_ITEM() against [0:allocated] instead of [0:size] to re-allow valid use cases that assign within the allocated area.
scoder added a commit to scoder/cpython that referenced this issue Nov 3, 2023
…d temporary inconsistencies.

Also remove the "what's new" sentence regarding the size setting since tuples cannot grow after allocation.
vstinner pushed a commit that referenced this issue Nov 3, 2023
gh-106168: Update the size only after setting the item, to avoid temporary inconsistencies.
Also remove the "what's new" sentence regarding the size setting since tuples cannot grow after allocation.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…st_SET_ITEM() (python#111480)

Check the index bound assertions in PyList_SET_ITEM() against [0:allocated] instead of [0:size] to re-allow valid use cases that assign within the allocated area.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
pythongh-106168: Update the size only after setting the item, to avoid temporary inconsistencies.
Also remove the "what's new" sentence regarding the size setting since tuples cannot grow after allocation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants