- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-108337: Add pyatomic.h header #108338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-108337: Add pyatomic.h header #108338
Conversation
This adds a new header that provides atomic operations on common data types. The intention is that this will be exposed through Python.h, although that is not the case yet. The only immediate use is in the test file.
Placates Tools/c-analyzer/check-c-globals.py.
| 
 Is this API public or private? Names are prefixed by  At least, I suggest to exclude it from the limited C API for now: so move it to cpython/ directory, and check for Py_LIMITED_API. See other header files like Include/cpython/pydebug.h, the ones included with cpython/ in Python.h. I like the approach using functions rather than atomic types: it's the approach used by the glib library: https://developer-old.gnome.org/glib/stable/glib-Atomic-Operations.html I'm curious why/how some function parameters don't need volatile, like g_atomic_int_inc(): 
 I wrote Include/internal/pycore_atomic_funcs.h which tries to address Include/internal/pycore_atomic.h compiler issues. But pycore_atomic_funcs.h is incomplete and was only used for a very few things (is it still used?). Using atomic variables is hard :-( Do you have any kind of documentation? Or links to other documentations? It would be good to have a doc, at least in pyatomic.h. | 
| 
 They are private, but they're  Although,  
 Yes, but AFAICS the uses can be replaced. | 
| As @encukou wrote, the intention is that they're private and only used directly by CPython for now. I'll move them to the  Regarding  I'll add some code documentation and links in pyatomic.h (*) There's experimental support in the most recent builds. | 
Instead use volatile casts in MSVC implementation where they are meaningful.
| 
 Unless there is a good reason to expose this API to 3rd party code, I would suggest to move it to Include/internal/. For example, override pycore_atomic_funcs.h. If the header file is only exposed in the internal C API, we will have less compilation issues. Even after I moved  Is your header file, declaration and implementation (since there are static inline functions, the implementation is public!), compatible with C++? To avoid compilation issues, would it be technically possible to have: 
 | 
| Oh, right, atomics are an optional C11 feature! As far as I can see, the current  PEP-7 should be updated, and this should get a What's New entry similar to the 3.11 one for "Building CPython". You might want have PEP-703 state the compiler requirement explicitly. | 
| 
 I agree with this, but I suspect we're going to need these available for macros/inlines. So they'll have to be available in the public API, even if they're not intended for direct use. Stabe API shouldn't include them, obviously, which means any that's currently a macro/inline for the stable API can't use them either and needs an opaque function call. | 
| 
 
 
 | 
| 
 Oh ok, now I get it. It wasn't clear when I first reviewed your PR. In that case, it can be in Include/cpython/. Do you know if you need any atomic function in Include/ header files (limited C API)? | 
| 
 In  | 
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      |  | ||
| // Performs an atomic compare-and-exchange. If `*address` and `expected` are equal, | ||
| // then `value` is stored in `*address`. Returns 1 on success and 0 on failure. | ||
| // These correspond to the "strong" variations of the C11 atomic_compare_exchange_* functions. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C11 passes expected as a pointer, so that it's updated with the actual value when the latter doesn't match the former. Why not keep that convention here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation was two-fold: First, many of the _Py_atomic_compare_exchange calls in the nogil fork use constants as expected and this would be more verbose if it needs to be a pointer, e.g.:
_Py_atomic_compare_exchange_uint8(&m->v, LOCKED, UNLOCKED)
vs.
uint8_t expected = LOCKED:
_Py_atomic_compare_exchange_uint8(&m->v, &expected, UNLOCKED)
Second, I find this style (no pointer for expected) to be a bit less error-prone. I've been tripped up once or twice by having expected be modified when I didn't expect it.
I don't feel terribly strongly about this, so if there is a general preference for sticking closer to the C11-style API here, I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU the main motivation for the C11 style APIs is for retry loops in lockless data structure implementations. A simplistic example (this may be embarassingly wrong):
struct ListNode;
typedef struct ListNode {
  int value;
  struct ListNode* next;
} ListNode;
void ListAppend(ListNode* list, int new_value) {
  ListNode* new_node = (ListNode*) malloc(sizeof ListNode);
  new_node->value = new_value;
  new_node->next = NULL;
  ListNode* expected = NULL;
  while (_Py_atomic_compare_exchange_ptr(&list->next, &expected, new_node)) {
    list = expected;
    expected = NULL;
  }
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the second argument is a reference (&expected), does it mean that it changes the value of the second argument (*expected)?
For C11 atomic_exchange(), the second argument is not a pointer, but a value (integer), no? https://en.cppreference.com/w/c/atomic/atomic_exchange
C11 atomic_compare_exchange_strong() and atomic_compare_exchange_weak() use a pointer for expected. But this API writes into *expected if the *obj is not equal to *expected.
The behavior of atomic_compare_exchange_* family is as if the following was executed atomically:
if (memcmp(obj, expected, sizeof *obj) == 0) {
    memcpy(obj, &desired, sizeof *obj);
    return true;
} else {
    memcpy(expected, obj, sizeof *obj);
    return false;
}For this header fie, I would prefer to not have two flavors, the API is already quite long! I would prefer to have a single flavor. If there is an usecase where setting expected is relevant, I suggest to use a pointer for the second argument.
In short, I agree to change the API to int _Py_atomic_compare_exchange_int32(int32_t *obj, int32_t *expected, int32_t desired). The behavior should be well documented.
obj, expected and desired names come from the C11 API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make the second argument a pointer like the C11 API.
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      | _Py_atomic_add_uintptr(uintptr_t *address, uintptr_t value); | ||
|  | ||
| static inline Py_ssize_t | ||
| _Py_atomic_add_ssize(Py_ssize_t *address, Py_ssize_t value); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's worth exposing atomic ops for all int sizes and signednesses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect to use at least one atomic operation on each of the data types here (but not every atomic op on every data type). I tried to be consistent on what's defined because it makes understanding what's available easier and testing easier.
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      | _Py_atomic_compare_exchange_ssize(Py_ssize_t *address, Py_ssize_t expected, Py_ssize_t value); | ||
|  | ||
| static inline int | ||
| _Py_atomic_compare_exchange_ptr(void *address, void *expected, void *value); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be void** address for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that void ** requires an explicit cast for almost every use, because things like PyObject ** are not implicitly convertible to void **. In other words, currently we can write things like:
PyObject *old_exc = _Py_atomic_exchange_ptr(&tstate->async_exc, exc);
but if address was void **address, we'd have to write:
PyObject *old_exc = _Py_atomic_exchange_ptr((void **)&tstate->async_exc, exc);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A large part of the Python C API uses macro to convert arguments to PyObject*. Would it make sense to do the same here?
#define _Py_atomic_exchange_ptr(atomic, value) _Py_atomic_exchange_ptr(_Py_CAST(void**, atomic), (value))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the benefit of that style over the current approach, and it would silently allow passing some integer types to _Py_atomic_exchange_ptr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in that case I'm fine with the surprising void* type.
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      | _Py_atomic_store_uint64_release(uint64_t *address, uint64_t value); | ||
|  | ||
| static inline void | ||
| _Py_atomic_store_ptr_release(void *address, void *value); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert, but why is it useful to expose "release" operations if no "acquire" operations are exposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can always use stronger orderings for correctness (i.e., "seq_cst" everywhere instead of "acquire"). From a performance view, "release" is substantially faster than "seq_cst" stores on x86/x86-64, but "acquire" generates the same code as "seq_cst" loads on both x86/x86-64 and aarch64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think CPython support is limited to x86 and ARM variants, so the set of atomic ops exposed should probably be made consistent nevertheless?
Also, using "seq_cst" in combination with "release" will probably make the code more difficult to reason about, than if "acquire" is exposed (and memory ordering is already hard to reason about!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a _Py_atomic_load_ptr_acquire for consistency (and I think I can remove the _Py_atomic_store_uint64_release).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me if volatile must be used on the first parameter in the function definition, or not. For me, it's surprising to be able to cast int* to volatile long* in pyatomic_msc.h. Maybe explain the black magic in the documentation at the top of  pyatomic.h?
I'm scared by the _ptr variant which takes a void* and is then casted to a pointer of a pointer (void**). It looks suspicious.
        
          
                Include/cpython/pyatomic_gcc.h
              
                Outdated
          
        
      | # error "this header file must not be included directly" | ||
| #endif | ||
|  | ||
| // This is the implementation of Python atomic operations using GCC's built-in | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving it at the top of the file.
| static inline int | ||
| _Py_atomic_add_int(int *address, int value) | ||
| { | ||
| return __atomic_fetch_add(address, value, __ATOMIC_SEQ_CST); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, I prefer verbose syntax like the one that you used. But in this header file, you have tons of static inline functions, so I suggest using the compact syntax:
| static inline int | |
| _Py_atomic_add_int(int *address, int value) | |
| { | |
| return __atomic_fetch_add(address, value, __ATOMIC_SEQ_CST); | |
| } | |
| static inline int | |
| _Py_atomic_add_int(int *address, int value) | |
| { return __atomic_fetch_add(address, value, __ATOMIC_SEQ_CST); } | 
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      | _Py_atomic_store_ptr_release(void *address, void *value); | ||
|  | ||
|  | ||
| // Sequential consistency fence | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a more elaborated documentation, "fence" is kind of weak.
Guarantees that every previous memory reference, including both load and store memory references, is globally visible before any subsequent memory reference.
Limits the compiler optimizations that can reorder memory accesses across the point of the call.
The data memory barrier ensures that all preceding writes are issued before any subsequent memory operations (including speculative memory access).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The challenge I have is that it's really hard to describe what fences do in a way that's helpful and accurate. The above documentation is too strong for C11 fences.
There's https://en.cppreference.com/w/c/atomic/atomic_thread_fence, but I find it vague. And the C++ documentation (https://en.cppreference.com/w/cpp/atomic/atomic_thread_fence) is more detailed but really hard to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add one or two of these links here. It's ok to have references to external doc, it's better than no doc :-)
| I have made the requested changes; please review again. | 
| Thanks for making the requested changes! @vstinner: please review the changes made to this pull request. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The glib library provides g_atomic_int_dec_and_test() to implement reference counting. How would you reimplement it with your API? I'm not used to atomic variables and I never know how to use them correctly.
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      | _Py_atomic_store_ptr_release(void *address, void *value); | ||
|  | ||
|  | ||
| // Sequential consistency fence | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add one or two of these links here. It's ok to have references to external doc, it's better than no doc :-)
| 
 The decrement would look like the following. The  if (_Py_atomic_add_ssize(&op->ob_refcnt, -1) == 1) {
  // refcnt is zero, dealloc
} | 
| @vstinner, would you please look this over again when you have a chance? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should decide _Py_atomic_compare_exchange() second argument should be a pointer or not. Apparently, a pointer covers more cases and so should be used.
| <ClInclude Include="..\Include\cpython\pyatomic_msc.h"> | ||
| <Filter>Include</Filter> | ||
| </ClInclude> | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add GCC here:
| <ClInclude Include="..\Include\cpython\pyatomic_msc.h"> | |
| <Filter>Include</Filter> | |
| </ClInclude> | |
| <ClInclude Include="..\Include\cpython\pyatomic_gcc.h"> | |
| <Filter>Include</Filter> | |
| </ClInclude> | |
| <ClInclude Include="..\Include\cpython\pyatomic_msc.h"> | |
| <Filter>Include</Filter> | |
| </ClInclude> | 
It's just for the UI, not to build Python.
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      |  | ||
| // Atomically adds `value` to `address` and returns the previous value | ||
| static inline int | ||
| _Py_atomic_add_int(int *address, int value); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- https://en.cppreference.com/w/c/atomic uses objname.
- https://www.ibm.com/docs/en/zos/2.1.0?topic=c11-atomic-load also uses obj name (C11 functions)
- pycore_atomic.h uses ATOMIC_VALname
- pycore_atomic_func.h uses varname
I dislike the address name. What you pass is not a void* pointer or an uintptr_t address, but a reference to an atomic variable. I suggest to use atomic or obj name.
For add operation, the C11 API uses arg for the second parameter name. I'm fine with value.
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      |  | ||
| // Performs an atomic compare-and-exchange. If `*address` and `expected` are equal, | ||
| // then `value` is stored in `*address`. Returns 1 on success and 0 on failure. | ||
| // These correspond to the "strong" variations of the C11 atomic_compare_exchange_* functions. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the second argument is a reference (&expected), does it mean that it changes the value of the second argument (*expected)?
For C11 atomic_exchange(), the second argument is not a pointer, but a value (integer), no? https://en.cppreference.com/w/c/atomic/atomic_exchange
C11 atomic_compare_exchange_strong() and atomic_compare_exchange_weak() use a pointer for expected. But this API writes into *expected if the *obj is not equal to *expected.
The behavior of atomic_compare_exchange_* family is as if the following was executed atomically:
if (memcmp(obj, expected, sizeof *obj) == 0) {
    memcpy(obj, &desired, sizeof *obj);
    return true;
} else {
    memcpy(expected, obj, sizeof *obj);
    return false;
}For this header fie, I would prefer to not have two flavors, the API is already quite long! I would prefer to have a single flavor. If there is an usecase where setting expected is relevant, I suggest to use a pointer for the second argument.
In short, I agree to change the API to int _Py_atomic_compare_exchange_int32(int32_t *obj, int32_t *expected, int32_t desired). The behavior should be well documented.
obj, expected and desired names come from the C11 API.
        
          
                Include/cpython/pyatomic.h
              
                Outdated
          
        
      | _Py_atomic_compare_exchange_ssize(Py_ssize_t *address, Py_ssize_t expected, Py_ssize_t value); | ||
|  | ||
| static inline int | ||
| _Py_atomic_compare_exchange_ptr(void *address, void *expected, void *value); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in that case I'm fine with the surprising void* type.
- Add pyatomic_*.h headers to MSVC filters - Use a pointer for 2nd argument of compare_exchange functions - Rename address to ptr
| @vstinner, I've made the second argument to  | 
Co-authored-by: Victor Stinner <vstinner@python.org>
| 
 Please rename the argument to  Naming is a hard problem :-( If I want to write documentation for that, I would have issues to explain why I have to pass a "pointer" (ptr) to these functions. https://en.cppreference.com/w/c/atomic/atomic_load says: 
 The first argument is a pointer to the atomic object to access. It's right that it's a pointer, so "atomic_ptr", "atomic_obj_ptr", "patomic", ... names would be good. But to make the name shorter, I would suggest to omit "pointer", and so just say "object" (or "atomic", but you dislike this name, so let's skip it). I would prefer to document that _Py_atomic_load_int32(obj) gets the value of the atomic object obj, rather than the value of the atomic pointer ptr. | 
| "object" in a CPython context is really misleading. I also don't understand what the issue with "pointer" or "ptr" is. | 
| 
 I would prefer to be as close as possible to C11 API. It doesn't matter that C11 API made bad decisions, the API is now standardized :-) | 
| As you prefer... This is a private low-level API anyway, so the name of arguments is hardly fundamental. | 
| I've renamed  | 
| Well. I have "a few more remarks", but I decided to copy this PR and make my changes directly there: please see my PR #108701. | 
| I merged #108701 which includes my coding style changes. Thanks! | 
This adds a new header that provides atomic operations on common data types.
Implementing PEP 703 requires use of atomic operations on more data types than provided by pycore_atomic.h. Additionally, pycore_atomic.h is only usable from Py_BUILD_CORE modules; it can't be used in public headers. PEP 703 will require atomic operations in object.h for Py_INCREF/DECREF, for example. The intention is that this will be exposed through Python.h, although that is not the case yet.
To avoid build issues in third-party extensions, the
pyatomic.hheader generally does not require-std=gnu11or-std=c11to be passed to the compiler (for GCC or Clang). When compiling C, MSVC will use thepyatomic_msc.h, which uses compiler intrinsics. When compiled in C++ mode, MSVC will use thepyatomic_std.himplementation, which uses C++11 atomics.