Skip to content

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 11, 2025

@vstinner
Copy link
Member Author

cc @scoder @davidhewitt

@davidhewitt
Copy link
Contributor

davidhewitt commented Oct 11, 2025

For consuming from PyO3 / Rust, I can see this function being obviously useful for cases of small dictionaries with statically known keys (think producing things that look like TypedDict to Python code). Many functions produce such dictionaries and we could generate optimal code which creates arrays on the stack for each of keys and items before performing the insert.

I think for arbitrary-sized collections, it's probably the case (in Rust) that either:

  • it's easier to create a single array containing (Rust) tuples of key/value pairs, which the current private _PyDict_FromItems taking offsets of each of keys and values would be able to consume. (Maybe that could be a separate PyDict_FromItemsAndOffsets?)
  • or we just want to pass a hint to how large we expect the dictionary to be and then pass all the items in, e.g. to consume a Rust iterator (PyDictWriter? 😉)

@vstinner
Copy link
Member Author

it's easier to create a single array containing (Rust) tuples of key/value pairs, which the current private _PyDict_FromItems taking offsets of each of keys and values would be able to consume. (Maybe that could be a separate PyDict_FromItemsAndOffsets?)

Do you mean producing an array of (key1, value1, key2, value2, ..., keyN, valueN) and then use an offset of 2?

@vstinner
Copy link
Member Author

Adding this function would avoid having to make the private _PyStack_AsDict() function public, since its code is simple. It's basically a call to _PyDict_FromItems():

PyObject *
_PyStack_AsDict(PyObject *const *values, PyObject *kwnames)
{
    Py_ssize_t nkwargs;

    assert(kwnames != NULL);
    nkwargs = PyTuple_GET_SIZE(kwnames);
    return _PyDict_FromItems(&PyTuple_GET_ITEM(kwnames, 0), 1,
                             values, 1, nkwargs);
}

@davidhewitt
Copy link
Contributor

I was thinking more like 2-tuples, the type might be written in Rust as Vec<(*mut PyObject, *mut PyObject)>. I think in practice it would be laid out in memory like the array of alternating key/value you propose but I think that's not necessarily guaranteed. With the API taking offsets I could query the actual layout information from the Rust compiler and use that to calculate the offsets correctly.

The 2-tuples are quite a natural structure for Rust producers of "items" (it's what they would expect when iterating a mapping type, for example).

But maybe the more common case would be the second one I suggest - a rust iterator producing item 2-tuples with a size hint. At the moment we just start from PyDict_New and call PyDict_SetItem for each value produced by the iterator. It would be nice to have an API which enables using the size hint well to minimise allocations.

I could of course use the PyDict_FromItems proposed in this PR, just would need to collect two temporary allocations for all the items produced by the iterator first.

@scoder
Copy link
Contributor

scoder commented Oct 12, 2025

  • create a single array containing (Rust) tuples of key/value pairs, which the current private _PyDict_FromItems taking offsets of each of keys and values would be able to consume. (Maybe that could be a separate PyDict_FromItemsAndOffsets?)

Or name the function in this PR PyDict_FromKeysAndValues() and add a PyDict_FromItems() that takes a single pointer and two strides, one for the items and one for the values in the items. (EDIT: Actually, a single pointer won't suffice due to the initial irregular offset of the values, so that brings us basically to the interface of the current _PyDict_FromItems().)

Note that the current private _PyDict_FromItems() calculates the offsets from pointer sized steps, so if that remains the implementation, then the C structures would need to be pointer aligned. That seems ok from my side but I can't reason about Rust here.

The case of building small literal dicts could also use a PyDict_FromItems() with alternating keys and values, BTW, so maybe that's generally the better API?

  • or we just want to pass a hint to how large we expect the dictionary to be and then pass all the items in, e.g. to consume a Rust iterator (PyDictWriter? 😉)

That's really not much different from _PyDict_NewPresized() plus repeated PyDict_SetItem() calls.

Create a dictionary from *keys* and *values* of *length* items.
Return a new empty dictionary, or ``NULL`` on failure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Return a new empty dictionary, or ``NULL`` on failure.
Return a new dictionary, or ``NULL`` on failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants