Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object.h uses an anonymous union in a struct (older C incompatible) #105059

Closed
zooba opened this issue May 29, 2023 · 49 comments
Closed

object.h uses an anonymous union in a struct (older C incompatible) #105059

zooba opened this issue May 29, 2023 · 49 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes deferred-blocker

Comments

@zooba
Copy link
Member

zooba commented May 29, 2023

An anonymous union was added recently, making this header produce warnings/errors for C++ compilers that have not enabled the relevant extensions/standards.

https://github.com/python/cpython/blame/1668b41dc477bc9562e4c50ab36a232839b4621b/Include/object.h#L168-L173

This code looks like it's making a few too many assumptions anyway. I suspect we ought to be wrapping up accesses to ob_refcnt_split in a macro and masking, rather than using a union. Or at least setting up the union to not assume that SIZEOF_VOID_P > 4 implies SIZEOF_VOID_P == 8.

(Ping @eduardo-elizondo)

Linked PRs

@zooba
Copy link
Member Author

zooba commented May 29, 2023

Seems the only use of it is later in object.h:

#if SIZEOF_VOID_P > 4
    // Portable saturated add, branching on the carry flag and set low bits
    PY_UINT32_T cur_refcnt = op->ob_refcnt_split[PY_BIG_ENDIAN];
    PY_UINT32_T new_refcnt = cur_refcnt + 1;
    if (new_refcnt == 0) {
        return;
    }
    op->ob_refcnt_split[PY_BIG_ENDIAN] = new_refcnt;

Is this really better than this code?:

    Py_ssize_t new_refcnt = op->ob_refcnt + 1;
    if (new_refcnt & ~0xFFFFFFFF) return;
    // or
    if ((PY_UINT32_T)new_refcnt != new_refcnt) return;

Seems we can rely on the compilers to figure out efficient ways to do these, no?

@zooba zooba added 3.12 bugs and security fixes 3.13 bugs and security fixes labels May 29, 2023
@sunmy2019
Copy link
Member

sunmy2019 commented May 29, 2023

Note

    if (new_refcnt & ~0xFFFFFFFF) return;

    if ((PY_UINT32_T)new_refcnt != new_refcnt) return;

Both change the definition of immortality. Previously refcnt % 2^32 == -1, now refcnt >= 2^32 -1 or refcnt < -1 (impossible).

I don't mean this isn't desired; I just want to bring attention that something has changed here.

@zooba
Copy link
Member Author

zooba commented May 29, 2023

Both change the definition of immortality. Previously refcnt % 2^32 == -1, now refcnt >= 2^32 -1 or refcnt < -1 (impossible).

You mean before you could (manually?) set a refcount to 0x00000001_00000000 and it wouldn't saturate, but now it will? I guess that's true, but also irrelevant. The definition of immortality isn't changed, just the condition under which Py_INCREF stops incrementing the refcount. And at least by my reading, that's going to apply to all objects, it's just that immortal objects start there and non-immortal ones will never reach it.

Incidentally, after a bit of playing with godbolt.org, it seems that gcc generates the same code for these two snippets (the first is the current code, and remember to add -O2 if you're trying to reproduce):

    PY_UINT32_T cur_refcnt = op->ob_refcnt_split[PY_BIG_ENDIAN];
    PY_UINT32_T new_refcnt = cur_refcnt + 1;
    PY_UINT32_T new_refcnt = (PY_UINT32)(op->ob_refcnt + 1);

The latter doesn't require a union at all, and so doesn't depend on the size of void *. Simpler code all around.

@sunmy2019
Copy link
Member

    PY_UINT32_T new_refcnt = (PY_UINT32)(op->ob_refcnt + 1);

The latter doesn't require a union at all, and so doesn't depend on the size of void *. Simpler code all around.

Indeed! But it should be (PY_UINT32)(op->ob_refcnt) + 1;

Since there is indeed a compiler switch to control the behavior of signed int overflow. UB happens at op->ob_refcnt + 1.

But on the other side, unsigned int overflow is a well-defined behavior. So (PY_UINT32)(op->ob_refcnt) + 1; is good.

@zooba
Copy link
Member Author

zooba commented May 29, 2023

We don't actually overflow in the op->ob_refcnt + 1 case (we've taken a different path if it's only 32 bits). So we get truncation at the cast, not an overflow.

Either way, I think we ought to get something else in for b2, as this can block people from testing. Marking as a release blocker, and @Yhg1s can decide.

@sunmy2019
Copy link
Member

So we get truncation at the cast, not an overflow.

If so, (PY_UINT32)(op->ob_refcnt) + 1 should be equal to (PY_UINT32)(op->ob_refcnt + 1). And the former is a little bit safer without any possible UB.

@eduardo-elizondo
Copy link
Contributor

eduardo-elizondo commented May 31, 2023

@zooba @sunmy2019 thanks for bringing up the issue! I think I might be able to get rid of the union and I'll try to get a patch in the next couple of days.

Separately, let's be careful about making any changes into the IncRef logic, the current code was very carefully crafted to maximize performance in windows, linux, and mac and was thoroughly benchmarked. There are also subtle side effects that can happen if we don't perform the operations correctly. For example, the suggested fix:

PY_UINT32_T new_refcnt = (PY_UINT32)(op->ob_refcnt + 1);

Would eventually require us to do something like:

op->ob_refcnt = new_refcnt;

^ this will cause an implicit conversion from 32 to 64 bit as the generated code assigns the 32 bit register to the 64 bit register by padding it with zeros. This causes incorrect behavior since we can have Extension code that has increased the reference count beyond the maximum value and this implicit conversion would cause the reference count of the extension to be lost. Then, it will result in a crash since we will decref this to deletion even when there are valid references to the object. I've seen this exact crash manifested when testing in a large application.

Anyways, give me a couple of days to explore a couple of options and I'll get back to this thread! In the meantime, if we can get some repro examples, that would be great!

@encukou
Copy link
Member

encukou commented May 31, 2023

if we can get some repro examples, that would be great!

It would! Alas, compilers are weird and distutils is pining for fjords, so...

For anonymous structs/unions, Linux/GCC, with Python 3.13 installed (so that python3.13-config is in PATH):

$ cat repro.c
#include <Python.h>

int main () {
    return 0;
}
$ gcc $(python3.13-config --cflags --ldflags) -lpython3.13$(python3.13-config --abiflags) -std=c99 -pedantic repro.c
In file included from .../include/python3.13d/Python.h:44,
                 from repro.c:1:
.../include/python3.13d/object.h:173:6: warning: ISO C99 doesn’t support unnamed structs/unions [-Wpedantic]
  173 |     };
      |      ^

@zooba
Copy link
Member Author

zooba commented May 31, 2023

Yeah, it looks like it's actually C compatibility, not C++. When I dug further into my failing build, it's actually one of the .c files.

Reproing using Petr's code from above with MSVC:

> cl /c /I(python3.12 -c "import sysconfig; print(sysconfig.get_config_var('INCLUDEPY'), end='')") /W4 .\t.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.37.32619.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

t.c
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.177.0_x64__3847v3x7pw1km\Include\object.h(173): warning C4201: nonstandard extension used: nameless struct/union
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.177.0_x64__3847v3x7pw1km\Include\cpython/unicodeobject.h(203): warning C4100: '_unused_op': unreferenced formal parameter
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.177.0_x64__3847v3x7pw1km\Include\cpython/unicodeobject.h(393): warning C4100: '_unused_op': unreferenced formal parameter
C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.177.0_x64__3847v3x7pw1km\Include\cpython/pytime.h(192): warning C4115: 'timeval': named type definition in parentheses

All four warning disappear when you don't use /W4, which is our standard, but we shouldn't be warning even on high levels. The non-standard extension should definitely be avoided - I think the other two have been in there for a while.

I think we need a test that basically includes the headers and spits out warnings or errors. Shouldn't need to be anything more clever than a make target. That doesn't have to be part of this fix.

@zooba
Copy link
Member Author

zooba commented May 31, 2023

Would eventually require us to do something like:

op->ob_refcnt = new_refcnt;

Then perhaps this, if "lower 32-bits set == saturated"?

Py_ssize_t new_refcnt = op->ob_refcnt + 1;
if ((PY_UINT32)new_refcnt == 0) { // might have to be & 0xFFFFFFFF instead of the cast
    return;
}
op->ob_refcnt = new_refcnt;

@eduardo-elizondo
Copy link
Contributor

eduardo-elizondo commented Jun 4, 2023

Some updates, I just removed the union in GH-105275. I will follow-up with a couple of other different options and benchmark numbers. Will let you all know once it's ready to be reviewed.

@gpshead
Copy link
Member

gpshead commented Jun 4, 2023

Per https://peps.python.org/pep-0007/#c-dialect "Python 3.11 and later use C11 without optional features".

Anonymous Structs and Unions are part of C11.

@gpshead gpshead changed the title object.h includes C++ incompatible code object.h uses an anonymous union in a struct (older C incompatible) Jun 4, 2023
@eduardo-elizondo
Copy link
Contributor

eduardo-elizondo commented Jun 4, 2023

Per https://peps.python.org/pep-0007/#c-dialect "Python 3.11 and later use C11 without optional features".
Anonymous Structs and Unions are part of C11.

@gpshead so does that mean that PEP007 guarantees that it's ok to have anonymous unions in Python 3.11? Hence making the original issue invalid now?

@eduardo-elizondo
Copy link
Contributor

eduardo-elizondo commented Jun 4, 2023

Then perhaps this, if "lower 32-bits set == saturated"?

Also @zooba to close the loop, I had tested this in the past but I just did it again to get some fresh numbers here. Looking at the generated code, it is not leveraging the saturated add which removes an extra instruction: https://godbolt.org/z/5rrdjPbMM.

But generated code is just a proxy so I benchmarked it on a Linux machine with GCC-11.1 and it recorded a: Geometric mean: 1.01x slower. I'm testing it with the memset as well which looks neutral (since it generates almost the same code).

Ultimately, I'm indifferent what structure we end up using, so I'm open to use the option you suggested if there's consensus on the regression. However, perf was a core part of the original upstreaming, so I went for the most perf efficient option. But let me know what you think!

Benchmark Results All benchmarks: ===============

async_tree_eager: Mean +- std dev: [benchmark_cpython_baseline] 115 ms +- 1 ms -> [benchmark_cpython_no_union_v2] 116 ms +- 1 ms: 1.01x slower
async_tree_eager_memoization: Mean +- std dev: [benchmark_cpython_baseline] 267 ms +- 6 ms -> [benchmark_cpython_no_union_v2] 265 ms +- 6 ms: 1.01x faster
asyncio_tcp: Mean +- std dev: [benchmark_cpython_baseline] 582 ms +- 12 ms -> [benchmark_cpython_no_union_v2] 566 ms +- 11 ms: 1.03x faster
asyncio_tcp_ssl: Mean +- std dev: [benchmark_cpython_baseline] 2.02 sec +- 0.01 sec -> [benchmark_cpython_no_union_v2] 1.97 sec +- 0.01 sec: 1.03x faster
chaos: Mean +- std dev: [benchmark_cpython_baseline] 64.8 ms +- 0.6 ms -> [benchmark_cpython_no_union_v2] 65.5 ms +- 0.4 ms: 1.01x slower
bench_mp_pool: Mean +- std dev: [benchmark_cpython_baseline] 17.7 ms +- 0.4 ms -> [benchmark_cpython_no_union_v2] 17.2 ms +- 0.6 ms: 1.03x faster
coroutines: Mean +- std dev: [benchmark_cpython_baseline] 25.2 ms +- 0.8 ms -> [benchmark_cpython_no_union_v2] 24.3 ms +- 0.8 ms: 1.04x faster
coverage: Mean +- std dev: [benchmark_cpython_baseline] 99.7 ms +- 2.0 ms -> [benchmark_cpython_no_union_v2] 101 ms +- 4 ms: 1.01x slower
crypto_pyaes: Mean +- std dev: [benchmark_cpython_baseline] 81.7 ms +- 0.6 ms -> [benchmark_cpython_no_union_v2] 82.7 ms +- 0.6 ms: 1.01x slower
deepcopy: Mean +- std dev: [benchmark_cpython_baseline] 377 us +- 2 us -> [benchmark_cpython_no_union_v2] 381 us +- 4 us: 1.01x slower
deepcopy_reduce: Mean +- std dev: [benchmark_cpython_baseline] 3.39 us +- 0.05 us -> [benchmark_cpython_no_union_v2] 3.44 us +- 0.06 us: 1.02x slower
deepcopy_memo: Mean +- std dev: [benchmark_cpython_baseline] 41.6 us +- 0.3 us -> [benchmark_cpython_no_union_v2] 42.2 us +- 0.4 us: 1.01x slower
deltablue: Mean +- std dev: [benchmark_cpython_baseline] 3.49 ms +- 0.04 ms -> [benchmark_cpython_no_union_v2] 3.46 ms +- 0.06 ms: 1.01x faster
float: Mean +- std dev: [benchmark_cpython_baseline] 86.0 ms +- 1.0 ms -> [benchmark_cpython_no_union_v2] 84.3 ms +- 1.0 ms: 1.02x faster
create_gc_cycles: Mean +- std dev: [benchmark_cpython_baseline] 1.54 ms +- 0.01 ms -> [benchmark_cpython_no_union_v2] 1.48 ms +- 0.02 ms: 1.05x faster
gc_traversal: Mean +- std dev: [benchmark_cpython_baseline] 3.72 ms +- 0.01 ms -> [benchmark_cpython_no_union_v2] 3.93 ms +- 0.02 ms: 1.06x slower
genshi_xml: Mean +- std dev: [benchmark_cpython_baseline] 52.1 ms +- 0.5 ms -> [benchmark_cpython_no_union_v2] 53.0 ms +- 0.6 ms: 1.02x slower
go: Mean +- std dev: [benchmark_cpython_baseline] 141 ms +- 1 ms -> [benchmark_cpython_no_union_v2] 144 ms +- 1 ms: 1.02x slower
hexiom: Mean +- std dev: [benchmark_cpython_baseline] 6.37 ms +- 0.02 ms -> [benchmark_cpython_no_union_v2] 6.51 ms +- 0.02 ms: 1.02x slower
json_dumps: Mean +- std dev: [benchmark_cpython_baseline] 10.6 ms +- 0.1 ms -> [benchmark_cpython_no_union_v2] 10.5 ms +- 0.1 ms: 1.01x faster
json_loads: Mean +- std dev: [benchmark_cpython_baseline] 28.3 us +- 0.2 us -> [benchmark_cpython_no_union_v2] 27.7 us +- 0.2 us: 1.02x faster
logging_format: Mean +- std dev: [benchmark_cpython_baseline] 7.49 us +- 0.11 us -> [benchmark_cpython_no_union_v2] 7.45 us +- 0.09 us: 1.01x faster
logging_silent: Mean +- std dev: [benchmark_cpython_baseline] 109 ns +- 0 ns -> [benchmark_cpython_no_union_v2] 111 ns +- 1 ns: 1.02x slower
logging_simple: Mean +- std dev: [benchmark_cpython_baseline] 6.85 us +- 0.07 us -> [benchmark_cpython_no_union_v2] 6.89 us +- 0.09 us: 1.01x slower
mako: Mean +- std dev: [benchmark_cpython_baseline] 11.8 ms +- 0.0 ms -> [benchmark_cpython_no_union_v2] 11.7 ms +- 0.0 ms: 1.01x faster
nbody: Mean +- std dev: [benchmark_cpython_baseline] 93.9 ms +- 1.3 ms -> [benchmark_cpython_no_union_v2] 96.3 ms +- 1.8 ms: 1.03x slower
nqueens: Mean +- std dev: [benchmark_cpython_baseline] 85.9 ms +- 1.2 ms -> [benchmark_cpython_no_union_v2] 85.4 ms +- 0.7 ms: 1.01x faster
pathlib: Mean +- std dev: [benchmark_cpython_baseline] 22.9 ms +- 0.4 ms -> [benchmark_cpython_no_union_v2] 22.6 ms +- 0.3 ms: 1.01x faster
pickle: Mean +- std dev: [benchmark_cpython_baseline] 11.5 us +- 0.1 us -> [benchmark_cpython_no_union_v2] 11.4 us +- 0.1 us: 1.01x faster
pickle_dict: Mean +- std dev: [benchmark_cpython_baseline] 33.3 us +- 0.1 us -> [benchmark_cpython_no_union_v2] 34.1 us +- 0.1 us: 1.02x slower
pickle_list: Mean +- std dev: [benchmark_cpython_baseline] 4.54 us +- 0.04 us -> [benchmark_cpython_no_union_v2] 4.90 us +- 0.04 us: 1.08x slower
pickle_pure_python: Mean +- std dev: [benchmark_cpython_baseline] 324 us +- 2 us -> [benchmark_cpython_no_union_v2] 328 us +- 2 us: 1.01x slower
pidigits: Mean +- std dev: [benchmark_cpython_baseline] 201 ms +- 0 ms -> [benchmark_cpython_no_union_v2] 214 ms +- 0 ms: 1.06x slower
pprint_safe_repr: Mean +- std dev: [benchmark_cpython_baseline] 787 ms +- 5 ms -> [benchmark_cpython_no_union_v2] 802 ms +- 6 ms: 1.02x slower
pprint_pformat: Mean +- std dev: [benchmark_cpython_baseline] 1.61 sec +- 0.02 sec -> [benchmark_cpython_no_union_v2] 1.63 sec +- 0.01 sec: 1.01x slower
pyflate: Mean +- std dev: [benchmark_cpython_baseline] 457 ms +- 4 ms -> [benchmark_cpython_no_union_v2] 472 ms +- 5 ms: 1.03x slower
raytrace: Mean +- std dev: [benchmark_cpython_baseline] 311 ms +- 2 ms -> [benchmark_cpython_no_union_v2] 321 ms +- 1 ms: 1.03x slower
regex_compile: Mean +- std dev: [benchmark_cpython_baseline] 144 ms +- 2 ms -> [benchmark_cpython_no_union_v2] 145 ms +- 1 ms: 1.01x slower
regex_effbot: Mean +- std dev: [benchmark_cpython_baseline] 3.84 ms +- 0.02 ms -> [benchmark_cpython_no_union_v2] 3.92 ms +- 0.02 ms: 1.02x slower
regex_v8: Mean +- std dev: [benchmark_cpython_baseline] 22.7 ms +- 0.1 ms -> [benchmark_cpython_no_union_v2] 23.0 ms +- 0.1 ms: 1.01x slower
richards: Mean +- std dev: [benchmark_cpython_baseline] 47.0 ms +- 0.8 ms -> [benchmark_cpython_no_union_v2] 48.1 ms +- 1.2 ms: 1.02x slower
richards_super: Mean +- std dev: [benchmark_cpython_baseline] 52.9 ms +- 1.1 ms -> [benchmark_cpython_no_union_v2] 54.7 ms +- 1.0 ms: 1.03x slower
scimark_lu: Mean +- std dev: [benchmark_cpython_baseline] 119 ms +- 2 ms -> [benchmark_cpython_no_union_v2] 122 ms +- 3 ms: 1.03x slower
scimark_monte_carlo: Mean +- std dev: [benchmark_cpython_baseline] 74.3 ms +- 1.0 ms -> [benchmark_cpython_no_union_v2] 74.9 ms +- 0.5 ms: 1.01x slower
scimark_sor: Mean +- std dev: [benchmark_cpython_baseline] 129 ms +- 4 ms -> [benchmark_cpython_no_union_v2] 132 ms +- 3 ms: 1.02x slower
spectral_norm: Mean +- std dev: [benchmark_cpython_baseline] 110 ms +- 2 ms -> [benchmark_cpython_no_union_v2] 114 ms +- 0 ms: 1.03x slower
sqlglot_parse: Mean +- std dev: [benchmark_cpython_baseline] 1.38 ms +- 0.02 ms -> [benchmark_cpython_no_union_v2] 1.39 ms +- 0.01 ms: 1.01x slower
sqlglot_transpile: Mean +- std dev: [benchmark_cpython_baseline] 1.71 ms +- 0.03 ms -> [benchmark_cpython_no_union_v2] 1.73 ms +- 0.03 ms: 1.01x slower
sqlglot_optimize: Mean +- std dev: [benchmark_cpython_baseline] 55.4 ms +- 0.4 ms -> [benchmark_cpython_no_union_v2] 56.0 ms +- 0.4 ms: 1.01x slower
sqlglot_normalize: Mean +- std dev: [benchmark_cpython_baseline] 112 ms +- 1 ms -> [benchmark_cpython_no_union_v2] 113 ms +- 1 ms: 1.01x slower
sqlite_synth: Mean +- std dev: [benchmark_cpython_baseline] 2.78 us +- 0.05 us -> [benchmark_cpython_no_union_v2] 2.81 us +- 0.07 us: 1.01x slower
sympy_expand: Mean +- std dev: [benchmark_cpython_baseline] 465 ms +- 9 ms -> [benchmark_cpython_no_union_v2] 468 ms +- 3 ms: 1.01x slower
sympy_integrate: Mean +- std dev: [benchmark_cpython_baseline] 21.4 ms +- 0.2 ms -> [benchmark_cpython_no_union_v2] 21.6 ms +- 0.1 ms: 1.01x slower
sympy_sum: Mean +- std dev: [benchmark_cpython_baseline] 168 ms +- 2 ms -> [benchmark_cpython_no_union_v2] 169 ms +- 2 ms: 1.01x slower
sympy_str: Mean +- std dev: [benchmark_cpython_baseline] 290 ms +- 3 ms -> [benchmark_cpython_no_union_v2] 294 ms +- 3 ms: 1.02x slower
typing_runtime_protocols: Mean +- std dev: [benchmark_cpython_baseline] 150 us +- 2 us -> [benchmark_cpython_no_union_v2] 148 us +- 1 us: 1.01x faster
unpack_sequence: Mean +- std dev: [benchmark_cpython_baseline] 51.5 ns +- 0.3 ns -> [benchmark_cpython_no_union_v2] 54.9 ns +- 0.2 ns: 1.07x slower
unpickle_list: Mean +- std dev: [benchmark_cpython_baseline] 5.01 us +- 0.07 us -> [benchmark_cpython_no_union_v2] 5.07 us +- 0.11 us: 1.01x slower
unpickle_pure_python: Mean +- std dev: [benchmark_cpython_baseline] 230 us +- 2 us -> [benchmark_cpython_no_union_v2] 234 us +- 1 us: 1.02x slower
xml_etree_parse: Mean +- std dev: [benchmark_cpython_baseline] 153 ms +- 2 ms -> [benchmark_cpython_no_union_v2] 152 ms +- 3 ms: 1.01x faster
xml_etree_iterparse: Mean +- std dev: [benchmark_cpython_baseline] 107 ms +- 2 ms -> [benchmark_cpython_no_union_v2] 105 ms +- 1 ms: 1.01x faster
xml_etree_generate: Mean +- std dev: [benchmark_cpython_baseline] 91.2 ms +- 0.5 ms -> [benchmark_cpython_no_union_v2] 92.6 ms +- 0.5 ms: 1.01x slower
xml_etree_process: Mean +- std dev: [benchmark_cpython_baseline] 62.3 ms +- 0.5 ms -> [benchmark_cpython_no_union_v2] 63.8 ms +- 0.6 ms: 1.02x slower

Benchmark hidden because not significant (27): async_generators, async_tree_none, async_tree_cpu_io_mixed, async_tree_eager_cpu_io_mixed, async_tree_eager_io, async_tree_io, async_tree_memoization, chameleon, comprehensions, bench_thread_
pool, docutils, dulwich_log, fannkuch, generators, genshi_text, html5lib, mdp, meteor_contest, python_startup, python_startup_no_site, regex_dna, scimark_fft, scimark_sparse_mat_mult, telco, tomli_loads, tornado_http, unpickle

Geometric mean: 1.01x slower

@gpshead
Copy link
Member

gpshead commented Jun 4, 2023

Per https://peps.python.org/pep-0007/#c-dialect "Python 3.11 and later use C11 without optional features".
Anonymous Structs and Unions are part of C11.

@gpshead so does that mean that PEP007 guarantees that it's ok to have anonymous unions in Python 3.11? Hence making the original issue invalid now?

That is my interpretation. This seems to tie in with another discussion on https://discuss.python.org/t/requiring-compilers-c11-standard-mode-to-build-cpython/26481.


Regardless of the primary issue about the union being modern, I do think the point about SIZEOF_VOID_P > 4 being better written as SIZEOF_VOID_P == 8 is valid. Just make an #error if compiling somewhere with a > 8 pointer size. I'm not aware of any such systems but our code isn't ready to run on whatever they are.

@gpshead
Copy link
Member

gpshead commented Jun 5, 2023

Is there a reason the union has to be anonymous? Why can't we just give it a name?

@encukou
Copy link
Member

encukou commented Jun 5, 2023

Why can't we just give it a name?

That would mean all uses of obj->ob_refcnt would change to to obj->just_a_name.refcnt.

Anonymous unions are a great tool for keeping API compatibility around changes like these. I wish we could use them :)

@eduardo-elizondo
Copy link
Contributor

Exactly as Petr said, it's just to keep the API compatibility so that we can still access ob_refcnt directly.

@gpshead
Copy link
Member

gpshead commented Jun 5, 2023

okay, that's what I assumed given the public header file.

@vstinner
Copy link
Member

vstinner commented Jun 6, 2023

Anonymous Structs and Unions are part of C11.

While building Python requires a C11 compiler, we are trying to keep the Python C API compatible with ISO C89 whenever possible. I would prefer to upgrade to C99/C11, but some programs explicitly requires ISO C89 with warnings on C99 variable declarations in the middle of functions (pedantic mode). I don't know the rationale for treating these warnings as errors, but so far, it was cheap to keep ISO C89 compatibility.

@vstinner
Copy link
Member

vstinner commented Jun 6, 2023

Exactly as Petr said, it's just to keep the API compatibility so that we can still access ob_refcnt directly.

While it's not officially deprecated, I would strongly suggest to use Py_REFCNT() and Py_SET_REFCNT() functions instead. See issue #83754.

@vstinner
Copy link
Member

I wrote PR #105767 to fix the C API for C99 and older:

  • On C11 and newer and on C++, PyObject.ob_refcnt is an union, Py_INCREF() and Py_DECREF() are inlined.
  • On C99 and older, PyObject.ob_refcnt type is Py_ssize, Py_INCREF() and Py_DECREF() are implemented as opaque functions.

Well, no longer inlining code on C99 and older can have an impact on performance which should be measured. For me, the first step is to fix the regression, and then design the most efficient implementation depending on the C or C++ standard, compiler flags, etc.

@h-vetinari
Copy link

some programs explicitly requires ISO C89 with warnings on C99 variable declarations in the middle of functions (pedantic mode). I don't know the rationale for treating these warnings as errors, but so far, it was cheap to keep ISO C89 compatibility.

I think something like -pedantic deserves at least a rationale, or to be thrown out as a requirement.

@vstinner
Copy link
Member

vstinner commented Jun 14, 2023

no longer inlining code on C99 and older can have an impact on performance which should be measured.

I'm pretty sure that it can be avoided when building C extensions with GCC and clang which support __extension__ (no warning on the anonymous union, even with -pendantic). But for now, I prefer to focus on C11 vs C99 code path in my PR. Well, see my PR #105767 where I added some comments.

@encukou
Copy link
Member

encukou commented Jun 14, 2023

I think something like -pedantic deserves at least a rationale,

It approximates the least common denominator of compilers we don't use in CI.
If we say we're compatible with some standard, we should test that with -pedantic.

@vstinner
Copy link
Member

vstinner commented Jun 14, 2023

@zooba:

I think we need a test that basically includes the headers and spits out warnings or errors.

test_cppext checks for C++ compiler warnings and treats them as errors. Not only to include Python.h, but also use the most common APIs.

@vstinner
Copy link
Member

@eduardo-elizondo:

This causes incorrect behavior since we can have Extension code that has increased the reference count beyond the maximum value and this implicit conversion would cause the reference count of the extension to be lost.

If it's not the case yet, we should have tests on these corner cases so they are no longer "undefined" behavior, but "tested" behavior 😁

@vstinner
Copy link
Member

@encukou:

That would mean all uses of obj->ob_refcnt would change to to obj->just_a_name.refcnt.

Well, I would strongly suggest C extensions to move away accessing directly PyObject members and use Py_REFCNT() and Py_SET_REFCNT() functions instead. It would be a Python 3.12 incompatible change that we should also consider.

@ndparker
Copy link
Contributor

FWIW, I'm using -pedantic (among other flags, including -Werror) for my extension development builds just to see what I'm doing wrong. Right now I'm basically missing a diagnostic tool right there for python 3.12 testing.

@encukou
Copy link
Member

encukou commented Jul 13, 2023

If you use -pedantic, you should pair it with the appropriate C standard. That's -std=c11 for Python 3.11+.

@ndparker
Copy link
Contributor

Well, no. I was specifically referring to writing my extensions. I believe it has been highlighted in this issue that C11 should not be required for those.

vstinner added a commit to vstinner/cpython that referenced this issue Jul 25, 2023
Anonymous union is new in C11. To prevent compiler warning, use Clang
and GCC extension on C99 and older.
@vstinner
Copy link
Member

I wrote PR #105767 to implement Py_INCREF() and Py_DECREF() as opaque function call on C99 and older. I'm worried that it can have a significant negative impact on performance. So I abandon it, and instead I try to get rid of the compiler warning: PR #107232 (use GCC and clang __extension__).

vstinner added a commit that referenced this issue Jul 25, 2023
Anonymous union is new in C11. To prevent compiler warning
when using -pedantic compiler option, use Clang and GCC
extension on C99 and older.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 25, 2023
…-107232)

Anonymous union is new in C11. To prevent compiler warning
when using -pedantic compiler option, use Clang and GCC
extension on C99 and older.
(cherry picked from commit 6261585)

Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner added a commit to vstinner/cpython that referenced this issue Jul 25, 2023
Use pragram to ignore the MSCV compiler warning on the PyObject
nameless union.
vstinner added a commit to vstinner/cpython that referenced this issue Jul 25, 2023
Use pragma to ignore the MSCV compiler warning on the PyObject
nameless union.
vstinner added a commit that referenced this issue Jul 25, 2023
…) (#107236)

gh-105059: Use GCC/clang extension for PyObject union (GH-107232)

Anonymous union is new in C11. To prevent compiler warning
when using -pedantic compiler option, use Clang and GCC
extension on C99 and older.
(cherry picked from commit 6261585)

Co-authored-by: Victor Stinner <vstinner@python.org>
vstinner added a commit that referenced this issue Jul 25, 2023
Use pragma to ignore the MSCV compiler warning on the PyObject
nameless union.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 25, 2023
…H-107239)

Use pragma to ignore the MSCV compiler warning on the PyObject
nameless union.
(cherry picked from commit 1c8fe9b)

Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Member

I close the issue. Thanks everybody for your very useful feedback! I fixed the known compiler warnings (MSVC, GCC, clang).

Thanks @encukou for your reproducer, it was very useful on Linux and Windows.


C compilers try to implement latest standards, but they also have many extensions which may or may not be enabled by default. The GCC is famous for enabling many extensions by default: they are only disabled by -pedantic. It took many years until the Linux kernel could be built with LLVM clang, time for clang to implement some GCC extensions, and to update the Linux kernel to avoid some other extensions.

While anonymous union is now standard in C11, the feature existed in C compilers before it was standardized.

In a perfect world, we should respect strictly the C standard and maximize C backward compatibility of the Python C API, by trying to stay compatible with C89. Well, in practice we use static inline which was not standardized in C89, but in C99.

There is a simple fix to maximize backward compatibility: only expose opaque function void Py_INCREF(PyObject *op) in the C API, and then use whatever we want in the hidden implementation. That's exactly what I did in the limited C API version 3.12 and newer. But it has a significant negative impact on performance since Py_INCREF/Py_DECREF are called frequently in C extensions.

For Python 3.12, we are trying to implement immortal objects with the minimum performance overhead. The chosen implementation is to use an anonymous union on 64-bit platforms (if sizeof(void*) is larger than 4 bytes).

What I did is to fix the annoying side effect: compiler warnings when the most pedantic warning mode is enabled, gcc -Wall -Wextra -pedantic (GCC) and cl /W4 (MSC).

  • For GCC and clang, I used __extension__ to make the warning quiet: 6261585
  • For MSVC, I used __pragma to make the warning quiet: 1c8fe9b

These changes are being backported to Python 3.12.


Maybe there is a another efficient implementation of Py_INCREF/Py_DECREF which can be inlined and be compatible with C99 or event C89. But apparently, @eduardo-elizondo spent significant time on designing the current implementation, and he failed to find a better silver bullet.

Well, at least now we know how to check for pedantic compiler warnings :-) If someone has a new clever implementation idea, please open a new issue!


I fail to reproduce the issue with C++. Program in C++: (...)

I didn't get any warning with g++. If someone gets a new compiler warning with Python 3.12 Py_INCREF/Py_DECREF, please open a separated issue with details how to reproduce it.

vstinner added a commit that referenced this issue Jul 25, 2023
) (#107248)

gh-105059: Fix MSCV compiler warning on PyObject union (GH-107239)

Use pragma to ignore the MSCV compiler warning on the PyObject
nameless union.
(cherry picked from commit 1c8fe9b)

Co-authored-by: Victor Stinner <vstinner@python.org>
@vstinner
Copy link
Member

By the way, I fixed another class of compiler warnings when building Python.h with MSVC cl /W4. I implemented Py_UNUSED() for MSVC in the main branch: PR #107250. It fix 2 warnings about unused arguments in unicodeobject.h.

jtcave pushed a commit to jtcave/cpython that referenced this issue Jul 27, 2023
…07232)

Anonymous union is new in C11. To prevent compiler warning
when using -pedantic compiler option, use Clang and GCC
extension on C99 and older.
jtcave pushed a commit to jtcave/cpython that referenced this issue Jul 27, 2023
…107239)

Use pragma to ignore the MSCV compiler warning on the PyObject
nameless union.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes deferred-blocker
Projects
Development

No branches or pull requests

9 participants