Skip to content

Conversation

@colesbury
Copy link
Contributor

@colesbury colesbury commented Aug 22, 2023

This adds a new header that provides atomic operations on common data types.

Implementing PEP 703 requires use of atomic operations on more data types than provided by pycore_atomic.h. Additionally, pycore_atomic.h is only usable from Py_BUILD_CORE modules; it can't be used in public headers. PEP 703 will require atomic operations in object.h for Py_INCREF/DECREF, for example. The intention is that this will be exposed through Python.h, although that is not the case yet.

To avoid build issues in third-party extensions, the pyatomic.h header generally does not require -std=gnu11 or -std=c11 to be passed to the compiler (for GCC or Clang). When compiling C, MSVC will use the pyatomic_msc.h, which uses compiler intrinsics. When compiled in C++ mode, MSVC will use the pyatomic_std.h implementation, which uses C++11 atomics.

This adds a new header that provides atomic operations on common data
types. The intention is that this will be exposed through Python.h,
although that is not the case yet. The only immediate use is in
the test file.
@colesbury colesbury added topic-C-API 3.13 bugs and security fixes labels Aug 22, 2023
@colesbury colesbury requested a review from vstinner August 22, 2023 20:12
@colesbury colesbury requested a review from a team as a code owner August 22, 2023 20:12
@vstinner
Copy link
Member

it can't be used in public headers

Is this API public or private? Names are prefixed by _Py, so it looks private. I'm trying to remove private APIs from the public API, rather than adding them :-) Should it be usable by 3rd party projects? Or is it ok to restrict its usage to Python?

At least, I suggest to exclude it from the limited C API for now: so move it to cpython/ directory, and check for Py_LIMITED_API. See other header files like Include/cpython/pydebug.h, the ones included with cpython/ in Python.h.


I like the approach using functions rather than atomic types: it's the approach used by the glib library: https://developer-old.gnome.org/glib/stable/glib-Atomic-Operations.html I'm curious why/how some function parameters don't need volatile, like g_atomic_int_inc():

While atomic has a volatile qualifier, this is a historical artifact and the pointer passed to it should not be volatile.

I wrote Include/internal/pycore_atomic_funcs.h which tries to address Include/internal/pycore_atomic.h compiler issues. But pycore_atomic_funcs.h is incomplete and was only used for a very few things (is it still used?).


Using atomic variables is hard :-( Do you have any kind of documentation? Or links to other documentations?

It would be good to have a doc, at least in pyatomic.h.

@encukou
Copy link
Member

encukou commented Aug 23, 2023

Is this API public or private? Names are prefixed by _Py, so it looks private. I'm trying to remove private APIs from the public API, rather than adding them :-) Should it be usable by 3rd party projects? Or is it ok to restrict its usage to Python?

They are private, but they're static inline functions that will be used by public static inline functions/macros (Py_INCREF/DECREF). So they need to be visible to the compiler without Py_BUILD_CORE.
The leading underscore is the way to go here.

Although, Include/pyatomic.h should go in Include/cpython too -- it's not stable ABI.

But pycore_atomic_funcs.h is incomplete and was only used for a very few things (is it still used?).

Yes, but AFAICS the uses can be replaced.

@colesbury
Copy link
Contributor Author

As @encukou wrote, the intention is that they're private and only used directly by CPython for now. I'll move them to the Include/cpython directory.

Regarding volatile, it's only meaningful for the MSVC implementation, which doesn't support C11 atomics yet (*). This can be pushed down into the MSVC function bodies so it's no longer part of the common function signatures.

I'll add some code documentation and links in pyatomic.h

(*) There's experimental support in the most recent builds.

@colesbury colesbury requested a review from encukou as a code owner August 23, 2023 19:01
@vstinner
Copy link
Member

vstinner commented Aug 23, 2023

the intention is that they're private and only used directly by CPython for now

Unless there is a good reason to expose this API to 3rd party code, I would suggest to move it to Include/internal/. For example, override pycore_atomic_funcs.h.

If the header file is only exposed in the internal C API, we will have less compilation issues.

Even after I moved Include/pyatomic.h to Include/internal/pycore_pyatomic.h, people get compilation errors... Someone wants to use pycore_dict.h in C++, but pycore_dict.h indirectly includes pycore_pyatomic.h which is incompatible with C++ :-( See #108216 bug report (feature request? :-)).

Is your header file, declaration and implementation (since there are static inline functions, the implementation is public!), compatible with C++?

To avoid compilation issues, would it be technically possible to have:

  • internal C API: fast static inline implementation, name prefixed by Py_, implemented in header files.
  • public C API: slow regular functions (opaque function calls, similar to the glib library), name prefixed by Py. Implementation calls the internal C API.

@encukou
Copy link
Member

encukou commented Aug 23, 2023

Oh, right, atomics are an optional C11 feature!

As far as I can see, the current pycore_atomic.h has an unsafe fallback to volatile, which is presumably OK-ish for systems/compilers without atomics? That feels a bit cheaty. IMO, we should just explicitly require C11/C++/MSVC atomics.

PEP-7 should be updated, and this should get a What's New entry similar to the 3.11 one for "Building CPython". You might want have PEP-703 state the compiler requirement explicitly.

@zooba
Copy link
Member

zooba commented Aug 23, 2023

To avoid compilation issues, would it be technically possible to have:

  • internal C API: fast static inline implementation, name prefixed by Py_, implemented in header files.
  • public C API: slow regular functions (opaque function calls, similar to the glib library), name prefixed by Py. Implementation calls the internal C API.

I agree with this, but I suspect we're going to need these available for macros/inlines. So they'll have to be available in the public API, even if they're not intended for direct use.

Stabe API shouldn't include them, obviously, which means any that's currently a macro/inline for the stable API can't use them either and needs an opaque function call.

@colesbury
Copy link
Contributor Author

colesbury commented Aug 23, 2023

@vstinner:

  • The code will be used indirectly in other static inline functions that are public so it needs to be visible; it can't be in Include/internal because it will need to be used by, for example, the static inline Py_INCREF function.
  • This implementation avoids the compilation errors people saw with pycore_pyatomic.h because of the preference builtins/intrinsics. (i.e., it doesn't require -std=c11 and can be compiled by a C++ compiler)

@encukou:

  • I'll update the what's new, but practically, this does not change compiler requirements from 3.11.

@zooba:

  • Yeah, as you wrote, we're going to need these available for macros/inlines. I think we can avoid using them in the stable ABI.

@vstinner
Copy link
Member

The code will be used indirectly in other static inline functions that are public

Oh ok, now I get it. It wasn't clear when I first reviewed your PR. In that case, it can be in Include/cpython/.

Do you know if you need any atomic function in Include/ header files (limited C API)?

@colesbury
Copy link
Contributor Author

@vstinner:

Do you know if you need any atomic function in Include/ header files (limited C API)?

In Include/object.h but only when Py_LIMITED_API is not defined. (For reference counting in --disable-gil builds.)


// Performs an atomic compare-and-exchange. If `*address` and `expected` are equal,
// then `value` is stored in `*address`. Returns 1 on success and 0 on failure.
// These correspond to the "strong" variations of the C11 atomic_compare_exchange_* functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C11 passes expected as a pointer, so that it's updated with the actual value when the latter doesn't match the former. Why not keep that convention here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation was two-fold: First, many of the _Py_atomic_compare_exchange calls in the nogil fork use constants as expected and this would be more verbose if it needs to be a pointer, e.g.:

_Py_atomic_compare_exchange_uint8(&m->v, LOCKED, UNLOCKED)

vs.

uint8_t expected = LOCKED:
_Py_atomic_compare_exchange_uint8(&m->v, &expected, UNLOCKED)

Second, I find this style (no pointer for expected) to be a bit less error-prone. I've been tripped up once or twice by having expected be modified when I didn't expect it.

I don't feel terribly strongly about this, so if there is a general preference for sticking closer to the C11-style API here, I can change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU the main motivation for the C11 style APIs is for retry loops in lockless data structure implementations. A simplistic example (this may be embarassingly wrong):

struct ListNode;

typedef struct ListNode {
  int value;
  struct ListNode* next;
} ListNode;

void ListAppend(ListNode* list, int new_value) {
  ListNode* new_node = (ListNode*) malloc(sizeof ListNode);
  new_node->value = new_value;
  new_node->next = NULL;
  ListNode* expected = NULL;
  while (_Py_atomic_compare_exchange_ptr(&list->next, &expected, new_node)) {
    list = expected;
    expected = NULL;
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the second argument is a reference (&expected), does it mean that it changes the value of the second argument (*expected)?

For C11 atomic_exchange(), the second argument is not a pointer, but a value (integer), no? https://en.cppreference.com/w/c/atomic/atomic_exchange

C11 atomic_compare_exchange_strong() and atomic_compare_exchange_weak() use a pointer for expected. But this API writes into *expected if the *obj is not equal to *expected.

The behavior of atomic_compare_exchange_* family is as if the following was executed atomically:

if (memcmp(obj, expected, sizeof *obj) == 0) {
    memcpy(obj, &desired, sizeof *obj);
    return true;
} else {
    memcpy(expected, obj, sizeof *obj);
    return false;
}

For this header fie, I would prefer to not have two flavors, the API is already quite long! I would prefer to have a single flavor. If there is an usecase where setting expected is relevant, I suggest to use a pointer for the second argument.

In short, I agree to change the API to int _Py_atomic_compare_exchange_int32(int32_t *obj, int32_t *expected, int32_t desired). The behavior should be well documented.

obj, expected and desired names come from the C11 API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the second argument a pointer like the C11 API.

_Py_atomic_add_uintptr(uintptr_t *address, uintptr_t value);

static inline Py_ssize_t
_Py_atomic_add_ssize(Py_ssize_t *address, Py_ssize_t value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's worth exposing atomic ops for all int sizes and signednesses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect to use at least one atomic operation on each of the data types here (but not every atomic op on every data type). I tried to be consistent on what's defined because it makes understanding what's available easier and testing easier.

_Py_atomic_compare_exchange_ssize(Py_ssize_t *address, Py_ssize_t expected, Py_ssize_t value);

static inline int
_Py_atomic_compare_exchange_ptr(void *address, void *expected, void *value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be void** address for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that void ** requires an explicit cast for almost every use, because things like PyObject ** are not implicitly convertible to void **. In other words, currently we can write things like:

PyObject *old_exc = _Py_atomic_exchange_ptr(&tstate->async_exc, exc);

but if address was void **address, we'd have to write:

PyObject *old_exc = _Py_atomic_exchange_ptr((void **)&tstate->async_exc, exc);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A large part of the Python C API uses macro to convert arguments to PyObject*. Would it make sense to do the same here?

#define _Py_atomic_exchange_ptr(atomic, value) _Py_atomic_exchange_ptr(_Py_CAST(void**, atomic), (value))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the benefit of that style over the current approach, and it would silently allow passing some integer types to _Py_atomic_exchange_ptr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in that case I'm fine with the surprising void* type.

_Py_atomic_store_uint64_release(uint64_t *address, uint64_t value);

static inline void
_Py_atomic_store_ptr_release(void *address, void *value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert, but why is it useful to expose "release" operations if no "acquire" operations are exposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always use stronger orderings for correctness (i.e., "seq_cst" everywhere instead of "acquire"). From a performance view, "release" is substantially faster than "seq_cst" stores on x86/x86-64, but "acquire" generates the same code as "seq_cst" loads on both x86/x86-64 and aarch64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think CPython support is limited to x86 and ARM variants, so the set of atomic ops exposed should probably be made consistent nevertheless?

Also, using "seq_cst" in combination with "release" will probably make the code more difficult to reason about, than if "acquire" is exposed (and memory ordering is already hard to reason about!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a _Py_atomic_load_ptr_acquire for consistency (and I think I can remove the _Py_atomic_store_uint64_release).

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me if volatile must be used on the first parameter in the function definition, or not. For me, it's surprising to be able to cast int* to volatile long* in pyatomic_msc.h. Maybe explain the black magic in the documentation at the top of pyatomic.h?

I'm scared by the _ptr variant which takes a void* and is then casted to a pointer of a pointer (void**). It looks suspicious.

# error "this header file must not be included directly"
#endif

// This is the implementation of Python atomic operations using GCC's built-in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving it at the top of the file.

Comment on lines 10 to 14
static inline int
_Py_atomic_add_int(int *address, int value)
{
return __atomic_fetch_add(address, value, __ATOMIC_SEQ_CST);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, I prefer verbose syntax like the one that you used. But in this header file, you have tons of static inline functions, so I suggest using the compact syntax:

Suggested change
static inline int
_Py_atomic_add_int(int *address, int value)
{
return __atomic_fetch_add(address, value, __ATOMIC_SEQ_CST);
}
static inline int
_Py_atomic_add_int(int *address, int value)
{ return __atomic_fetch_add(address, value, __ATOMIC_SEQ_CST); }

_Py_atomic_store_ptr_release(void *address, void *value);


// Sequential consistency fence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a more elaborated documentation, "fence" is kind of weak.

__faststorefence doc:

Guarantees that every previous memory reference, including both load and store memory references, is globally visible before any subsequent memory reference.

_ReadWriteBarrier doc:

Limits the compiler optimizations that can reorder memory accesses across the point of the call.

dmb ish:

The data memory barrier ensures that all preceding writes are issued before any subsequent memory operations (including speculative memory access).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge I have is that it's really hard to describe what fences do in a way that's helpful and accurate. The above documentation is too strong for C11 fences.

There's https://en.cppreference.com/w/c/atomic/atomic_thread_fence, but I find it vague. And the C++ documentation (https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence) is more detailed but really hard to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one or two of these links here. It's ok to have references to external doc, it's better than no doc :-)

@colesbury
Copy link
Contributor Author

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

@bedevere-bot bedevere-bot requested a review from vstinner August 24, 2023 18:25
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The glib library provides g_atomic_int_dec_and_test() to implement reference counting. How would you reimplement it with your API? I'm not used to atomic variables and I never know how to use them correctly.

_Py_atomic_store_ptr_release(void *address, void *value);


// Sequential consistency fence
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add one or two of these links here. It's ok to have references to external doc, it's better than no doc :-)

@colesbury
Copy link
Contributor Author

The glib library provides g_atomic_int_dec_and_test() to implement reference counting. How would you reimplement it with your API?

The decrement would look like the following. The atomic_add functions return the previous value like the C11/C++11 atomic_fetch_add function.

if (_Py_atomic_add_ssize(&op->ob_refcnt, -1) == 1) {
  // refcnt is zero, dealloc
}

@colesbury
Copy link
Contributor Author

@vstinner, would you please look this over again when you have a chance?

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should decide _Py_atomic_compare_exchange() second argument should be a pointer or not. Apparently, a pointer covers more cases and so should be used.

Comment on lines +429 to +431
<ClInclude Include="..\Include\cpython\pyatomic_msc.h">
<Filter>Include</Filter>
</ClInclude>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add GCC here:

Suggested change
<ClInclude Include="..\Include\cpython\pyatomic_msc.h">
<Filter>Include</Filter>
</ClInclude>
<ClInclude Include="..\Include\cpython\pyatomic_gcc.h">
<Filter>Include</Filter>
</ClInclude>
<ClInclude Include="..\Include\cpython\pyatomic_msc.h">
<Filter>Include</Filter>
</ClInclude>

It's just for the UI, not to build Python.


// Atomically adds `value` to `address` and returns the previous value
static inline int
_Py_atomic_add_int(int *address, int value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the address name. What you pass is not a void* pointer or an uintptr_t address, but a reference to an atomic variable. I suggest to use atomic or obj name.

For add operation, the C11 API uses arg for the second parameter name. I'm fine with value.


// Performs an atomic compare-and-exchange. If `*address` and `expected` are equal,
// then `value` is stored in `*address`. Returns 1 on success and 0 on failure.
// These correspond to the "strong" variations of the C11 atomic_compare_exchange_* functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the second argument is a reference (&expected), does it mean that it changes the value of the second argument (*expected)?

For C11 atomic_exchange(), the second argument is not a pointer, but a value (integer), no? https://en.cppreference.com/w/c/atomic/atomic_exchange

C11 atomic_compare_exchange_strong() and atomic_compare_exchange_weak() use a pointer for expected. But this API writes into *expected if the *obj is not equal to *expected.

The behavior of atomic_compare_exchange_* family is as if the following was executed atomically:

if (memcmp(obj, expected, sizeof *obj) == 0) {
    memcpy(obj, &desired, sizeof *obj);
    return true;
} else {
    memcpy(expected, obj, sizeof *obj);
    return false;
}

For this header fie, I would prefer to not have two flavors, the API is already quite long! I would prefer to have a single flavor. If there is an usecase where setting expected is relevant, I suggest to use a pointer for the second argument.

In short, I agree to change the API to int _Py_atomic_compare_exchange_int32(int32_t *obj, int32_t *expected, int32_t desired). The behavior should be well documented.

obj, expected and desired names come from the C11 API.

_Py_atomic_compare_exchange_ssize(Py_ssize_t *address, Py_ssize_t expected, Py_ssize_t value);

static inline int
_Py_atomic_compare_exchange_ptr(void *address, void *expected, void *value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in that case I'm fine with the surprising void* type.

- Add pyatomic_*.h headers to MSVC filters
- Use a pointer for 2nd argument of compare_exchange functions
- Rename address to ptr
@colesbury
Copy link
Contributor Author

@vstinner, I've made the second argument to _Py_atomic_compare_exchange_ functions a pointer. I renamed the first argument to all the atomic functions to ptr for the reasons in my above comment. I can change it to obj if you prefer.

Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Member

I renamed the first argument to all the atomic functions to ptr for the reasons in my above comment. I can change it to obj if you prefer.

Please rename the argument to obj.


Naming is a hard problem :-( If I want to write documentation for that, I would have issues to explain why I have to pass a "pointer" (ptr) to these functions.

https://en.cppreference.com/w/c/atomic/atomic_load says:

Parameters
obj - pointer to the atomic object to access
order - the memory synchronization ordering for this operation

The first argument is a pointer to the atomic object to access.

It's right that it's a pointer, so "atomic_ptr", "atomic_obj_ptr", "patomic", ... names would be good. But to make the name shorter, I would suggest to omit "pointer", and so just say "object" (or "atomic", but you dislike this name, so let's skip it).

I would prefer to document that _Py_atomic_load_int32(obj) gets the value of the atomic object obj, rather than the value of the atomic pointer ptr.

@pitrou
Copy link
Member

pitrou commented Aug 30, 2023

"object" in a CPython context is really misleading. I also don't understand what the issue with "pointer" or "ptr" is.

@vstinner
Copy link
Member

"object" in a CPython context is really misleading. I also don't understand what the issue with "pointer" or "ptr" is.

I would prefer to be as close as possible to C11 API. It doesn't matter that C11 API made bad decisions, the API is now standardized :-)

@pitrou
Copy link
Member

pitrou commented Aug 30, 2023

As you prefer... This is a private low-level API anyway, so the name of arguments is hardly fundamental.

@colesbury
Copy link
Contributor Author

I've renamed ptr to obj

@vstinner
Copy link
Member

Well. I have "a few more remarks", but I decided to copy this PR and make my changes directly there: please see my PR #108701.

@vstinner
Copy link
Member

I merged #108701 which includes my coding style changes. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants