Skip to content

Add per-object locking for thread safety, sub-interpreter and free-threaded Python support#154

Merged
ifduyue merged 1 commit into
masterfrom
multi-threads
May 30, 2026
Merged

Add per-object locking for thread safety, sub-interpreter and free-threaded Python support#154
ifduyue merged 1 commit into
masterfrom
multi-threads

Conversation

@ifduyue
Copy link
Copy Markdown
Owner

@ifduyue ifduyue commented May 28, 2026

Introduce a per-object lock (PyThread_type_lock on 3.9-3.12, PyMutex on 3.13+) to protect all access to xxhash_state, eliminating reliance on the GIL.

Lock is lazily allocated on first large update (>=64KB) for Python < 3.13, avoiding overhead in single-threaded small-data paths. On Python 3.13+ the lock is always active.

Key changes:

  • All state accessors (update, digest, hexdigest, intdigest, copy, reset, init) are now lock-guarded
  • update() releases GIL before acquiring lock for large data to avoid ABBA deadlock, matching hashlib pattern
  • One-shot hash functions (xxh32(), xxh3_64(), etc.) release GIL for data >=64KB
  • Switch from static PyTypeObject to heap types via PyType_FromModuleAndSpec, required for sub-interpreter safety
  • Declare Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (3.12+) and Py_MOD_GIL_NOT_USED (3.13+) in module slots
  • Deduplicate per-type init/update logic via XXHASH_INIT and XXHASH_DO_UPDATE macros
  • Optimize _get_buffer_or_str with a fast-path for bytes objects
  • Replace intermediate stack buffers in digest/hexdigest with direct writes into pre-allocated PyBytes/PyUnicode
  • Add comprehensive thread-safety stress tests (test_thread_safety.py)
  • Add sub-interpreter compatibility tests (test_interpreters.py)

@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 28, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety enhancements to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to protect internal state access when the GIL is released. It also adds a comprehensive test suite in tests/test_thread_safety.py to verify concurrent operations. The review feedback highlights critical improvements for the test suite, including correctly handling Windows process crashes and timeouts in subprocesses, failing tests on any non-zero exit code, and making the test repetitions configurable via an environment variable to optimize CI runtimes.

Comment on lines +432 to +460
def _run_many(self, code, scenario_name):
crashed = 0
errors = []
for i in range(self.REPETITIONS):
rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)
if rc == -signal.SIGSEGV:
crashed += 1
errors.append(f"run {i}: SEGFAULT")
elif rc == -signal.SIGABRT:
crashed += 1
errors.append(f"run {i}: SIGABRT")
elif rc != 0:
# Non-zero exit – something went wrong
errors.append(f"run {i}: exit {rc} {err[:200]}")
# else: OK, no news is good news

if crashed:
self.fail(
f"{scenario_name}: {crashed}/{self.REPETITIONS} runs crashed "
f"due to data race: {'; '.join(errors[:5])}"
)
if errors:
# Non-crash errors (timeouts, exceptions) – these might still
# indicate data-corruption issues even without a segfault.
print(
f"WARNING: {scenario_name}: {len(errors)}/{self.REPETITIONS} "
f"non-crash failures (may indicate corruption): "
f"{'; '.join(errors[:3])}"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two major issues with the current implementation of _run_many:

  1. On Windows, process crashes (such as Access Violations / SIGSEGV) return positive exit codes (like 3221225477 / 0xC0000005) rather than negative signal numbers. These are not caught by rc == -signal.SIGSEGV, meaning Windows crashes will only print a warning and the test will still pass!
  2. Any other non-zero exit code (such as timeouts or unexpected exceptions) only prints a warning instead of failing the test. Any non-zero exit code or timeout represents a failure in thread-safety (e.g., deadlock, corruption, or crash) and must fail the test.

We should fail the test on any non-zero exit code and correctly identify Windows crashes.

Suggested change
def _run_many(self, code, scenario_name):
crashed = 0
errors = []
for i in range(self.REPETITIONS):
rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)
if rc == -signal.SIGSEGV:
crashed += 1
errors.append(f"run {i}: SEGFAULT")
elif rc == -signal.SIGABRT:
crashed += 1
errors.append(f"run {i}: SIGABRT")
elif rc != 0:
# Non-zero exit – something went wrong
errors.append(f"run {i}: exit {rc} {err[:200]}")
# else: OK, no news is good news
if crashed:
self.fail(
f"{scenario_name}: {crashed}/{self.REPETITIONS} runs crashed "
f"due to data race: {'; '.join(errors[:5])}"
)
if errors:
# Non-crash errors (timeouts, exceptions) – these might still
# indicate data-corruption issues even without a segfault.
print(
f"WARNING: {scenario_name}: {len(errors)}/{self.REPETITIONS} "
f"non-crash failures (may indicate corruption): "
f"{'; '.join(errors[:3])}"
)
def _run_many(self, code, scenario_name):
errors = []
for i in range(self.REPETITIONS):
rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)
if rc != 0:
status = f"exit {rc}"
if rc == -signal.SIGSEGV or rc == 3221225477:
status = "SEGFAULT/Access Violation"
elif rc == -signal.SIGABRT:
status = "SIGABRT"
errors.append(f"run {i}: {status}\nSTDERR: {err[:200]}")
if errors:
self.fail(
f"{scenario_name}: {len(errors)}/{self.REPETITIONS} runs failed or crashed: "
f"{'; '.join(errors[:5])}"
)

Comment thread tests/test_thread_safety.py Outdated
Comment on lines +30 to +38
def _run_in_subprocess(code: str, timeout: float = 60.0):
"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""
proc = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True,
timeout=timeout,
)
return proc.returncode, proc.stdout, proc.stderr
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _run_in_subprocess function does not handle subprocess.TimeoutExpired. If a timeout occurs, it will raise an unhandled exception and crash the entire test suite immediately, rather than reporting it as a test failure. Catching TimeoutExpired and returning a non-zero exit code (e.g., -1) ensures that timeouts are gracefully handled and reported as test failures.

Suggested change
def _run_in_subprocess(code: str, timeout: float = 60.0):
"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""
proc = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True,
timeout=timeout,
)
return proc.returncode, proc.stdout, proc.stderr
def _run_in_subprocess(code: str, timeout: float = 60.0):
"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""
try:
proc = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True,
timeout=timeout,
)
return proc.returncode, proc.stdout, proc.stderr
except subprocess.TimeoutExpired as e:
return -1, e.stdout or "", (e.stderr or "") + "\nTIMEOUT EXPIRED"

Comment thread tests/test_thread_safety.py Outdated
crashes, deadlocks, or unexpected exceptions occur.
"""

REPETITIONS = 20
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding REPETITIONS = 20 can make the test suite very slow, especially in CI environments where resources are constrained. Making it configurable via an environment variable (defaulting to a lower value like 5 or 10 for standard CI runs) allows developers to run aggressive stress tests locally while keeping CI fast and efficient.

Suggested change
REPETITIONS = 20
REPETITIONS = int(os.environ.get("XXHASH_TEST_REPETITIONS", 5))

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented May 28, 2026

Merging this PR will not alter performance

✅ 54 untouched benchmarks
🆕 94 new benchmarks
⏩ 66 skipped benchmarks1

Performance Changes

Benchmark BASE HEAD Efficiency
🆕 test_xxh32_ctor_seed_kw N/A 82.8 µs N/A
🆕 test_xxh32_stream_update_digest_64kb N/A 187.6 µs N/A
🆕 test_xxh32_stream_intdigest_5b N/A 85.6 µs N/A
🆕 test_xxh3_128_digest_64kb N/A 129.9 µs N/A
🆕 test_xxh32_stream_digest_2mb N/A 2.2 ms N/A
🆕 test_xxh3_128_ctor_empty N/A 81.9 µs N/A
🆕 test_xxh32_stream_update_digest_2mb N/A 3.3 ms N/A
🆕 test_xxh32_digest_5b N/A 81.7 µs N/A
🆕 test_xxh32_stream_update_hexdigest_5b N/A 85.8 µs N/A
🆕 test_xxh32_stream_update_hexdigest_64kb N/A 187.9 µs N/A
🆕 test_xxh3_128_ctor_seed_kw N/A 84 µs N/A
🆕 test_xxh32_stream_intdigest_64kb N/A 149.9 µs N/A
🆕 test_xxh32_stream_update_digest_5b N/A 85.2 µs N/A
🆕 test_xxh3_128_hexdigest_5b N/A 82.3 µs N/A
🆕 test_xxh32_stream_update_hexdigest_2mb N/A 3.3 ms N/A
🆕 test_xxh3_128_digest_5b N/A 81.7 µs N/A
🆕 test_xxh32_digest_64kb N/A 138.8 µs N/A
🆕 test_xxh32_stream_update_intdigest_2mb N/A 3.3 ms N/A
🆕 test_xxh32_stream_update_intdigest_64kb N/A 187.8 µs N/A
🆕 test_xxh3_128_hexdigest_64kb N/A 130.2 µs N/A
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing multi-threads (0c1c69f) with master (e3eb2c0)

Open in CodSpeed

Footnotes

  1. 66 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust thread-safety mechanisms to the _xxhash C extension by implementing per-object locking using PyMutex (for Python 3.13+) and PyThread_type_lock (for Python 3.9-3.12). It also releases the GIL for hashing large data blocks, adds sub-interpreter support, and introduces a comprehensive suite of thread-safety tests. The review feedback suggests improving the test suite's robustness by gracefully handling subprocess timeouts inside the test runner to explicitly report potential deadlocks instead of raising raw tracebacks.

Comment thread tests/test_thread_safety.py Outdated
Comment on lines +30 to +38
def _run_in_subprocess(code: str, timeout: float = 60.0):
"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""
proc = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True,
timeout=timeout,
)
return proc.returncode, proc.stdout, proc.stderr
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If a deadlock or hang occurs during the thread-safety tests, subprocess.run will raise subprocess.TimeoutExpired after the timeout. Handling this exception gracefully inside _run_in_subprocess and returning a specific error code (e.g., -999) allows the test suite to report a clear diagnostic message (like 'TIMEOUT (possible deadlock)') instead of failing with a raw traceback.

Suggested change
def _run_in_subprocess(code: str, timeout: float = 60.0):
"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""
proc = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True,
timeout=timeout,
)
return proc.returncode, proc.stdout, proc.stderr
def _run_in_subprocess(code: str, timeout: float = 60.0):
"""Run *code* in a subprocess and return (returncode, stdout, stderr)."""
try:
proc = subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True,
timeout=timeout,
)
return proc.returncode, proc.stdout, proc.stderr
except subprocess.TimeoutExpired:
return -999, "", "TIMEOUT"

Comment on lines +435 to +446
for i in range(self.REPETITIONS):
rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)
if rc == -signal.SIGSEGV:
crashed += 1
errors.append(f"run {i}: SEGFAULT")
elif rc == -signal.SIGABRT:
crashed += 1
errors.append(f"run {i}: SIGABRT")
elif rc != 0:
# Non-zero exit – something went wrong
errors.append(f"run {i}: exit {rc} {err[:200]}")
# else: OK, no news is good news
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Handle the custom timeout return code (-999) from _run_in_subprocess to explicitly report a possible deadlock/timeout failure in the test summary.

        for i in range(self.REPETITIONS):
            rc, out, err = _run_in_subprocess(code, timeout=self.TIMEOUT)
            if rc == -signal.SIGSEGV:
                crashed += 1
                errors.append(f"run {i}: SEGFAULT")
            elif rc == -signal.SIGABRT:
                crashed += 1
                errors.append(f"run {i}: SIGABRT")
            elif rc == -999:
                crashed += 1
                errors.append(f"run {i}: TIMEOUT (possible deadlock)")
            elif rc != 0:
                # Non-zero exit – something went wrong
                errors.append(f"run {i}: exit {rc} {err[:200]}")
            # else: OK, no news is good news

@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 28, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) and releasing the GIL for large updates. It also adds comprehensive thread-safety tests. A high-severity issue was identified in the Python 3.13+ implementation where calling PyMutex_Lock while holding the GIL can lead to interpreter-wide GIL starvation. It is recommended to release the GIL while waiting to acquire the mutex.

Comment thread src/_xxhash.c
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 28, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-object locking (using PyThread_type_lock) to the xxhash C extension to ensure thread safety across all hash types, preventing data races during concurrent access. The changes include the addition of a new locking macro, updates to the initialization logic for hash objects, and the inclusion of comprehensive thread-safety tests. Additionally, build configuration files were updated and the version number was incremented. A high-severity review comment was raised regarding the potential for crashes when using static type objects with sub-interpreter support, suggesting a refactor to heap-allocated types.

Comment thread src/_xxhash.c
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 28, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety improvements to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to prevent data races during concurrent hash operations. It also migrates static types to heap types for sub-interpreter safety and adds comprehensive thread-safety and benchmark tests. The review feedback highlights several potential data races where the per-object lock is not acquired: specifically, within the init macro which can be called multiple times on a shared object, and when copying self->seed to a new object during copy operations without holding self's lock.

Comment thread src/_xxhash.c
Comment thread src/_xxhash.c Outdated
Comment on lines +949 to +952
p->seed = self->seed;
XXHASH_LOCK_ACQUIRE(self);
XXH32_copyState(p->xxhash_state, self->xxhash_state);
XXHASH_LOCK_RELEASE(self);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.

    XXHASH_LOCK_ACQUIRE(self);
    p->seed = self->seed;
    XXH32_copyState(p->xxhash_state, self->xxhash_state);
    XXHASH_LOCK_RELEASE(self);

Comment thread src/_xxhash.c Outdated
Comment on lines +1300 to +1303
p->seed = self->seed;
XXHASH_LOCK_ACQUIRE(self);
XXH64_copyState(p->xxhash_state, self->xxhash_state);
XXHASH_LOCK_RELEASE(self);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.

    XXHASH_LOCK_ACQUIRE(self);
    p->seed = self->seed;
    XXH64_copyState(p->xxhash_state, self->xxhash_state);
    XXHASH_LOCK_RELEASE(self);

Comment thread src/_xxhash.c Outdated
Comment on lines 1651 to 1653
p->seed = self->seed;
XXHASH_LOCK_ACQUIRE(self);
XXH3_copyState(p->xxhash_state, self->xxhash_state);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.

    XXHASH_LOCK_ACQUIRE(self);
    p->seed = self->seed;
    XXH3_copyState(p->xxhash_state, self->xxhash_state);

Comment thread src/_xxhash.c Outdated
Comment on lines 2028 to 2030
p->seed = self->seed;
XXHASH_LOCK_ACQUIRE(self);
XXH3_copyState(p->xxhash_state, self->xxhash_state);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure full thread-safety, self->seed should be copied to p->seed while holding self's lock. Otherwise, if another thread concurrently calls __init__ on self, p->seed could read a stale or torn value.

    XXHASH_LOCK_ACQUIRE(self);
    p->seed = self->seed;
    XXH3_copyState(p->xxhash_state, self->xxhash_state);

@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 28, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex on Python 3.13+ and PyThread_type_lock on Python 3.9-3.12) to protect internal hash states during concurrent operations. It also transitions static types to heap types for sub-interpreter safety, optimizes GIL release thresholds, updates benchmarks, and adds a comprehensive thread-safety test suite. There are no review comments to address, and I have no additional feedback to provide.

@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 28, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety mechanisms to the _xxhash C extension by implementing per-object locking (using PyMutex on Python 3.13+ and PyThread_type_lock on Python 3.9-3.12) and releasing the GIL for large hash updates. It also refactors the hash types to be heap-allocated for sub-interpreter safety and adds comprehensive thread-safety tests and benchmarks. The review feedback highlights critical issues regarding the use of PyObject_New instead of tp_alloc for heap types, which can cause memory corruption when subclassed. Additionally, the reviewer identified a thread-safety vulnerability where lazy lock allocation on Python 3.9-3.12 leaves small updates unprotected, and pointed out a stray, incorrectly indented statement at the end of the benchmark tests that would break execution.

Comment thread src/_xxhash.c Outdated
Comment on lines 687 to 689
if ((self = PyObject_New(PYXXH32Object, type)) == NULL) {
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh32 in Python, the subclass type object will have a larger tp_basicsize (to accommodate __dict__, slots, etc.). PyObject_New only allocates sizeof(PYXXH32Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.

To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.

    if ((self = (PYXXH32Object *)type->tp_alloc(type, 0)) == NULL) {
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines 932 to 934
if ((p = PyObject_New(PYXXH32Object, Py_TYPE(self))) == NULL) {
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH32Object). PyObject_New will allocate too little memory, leading to heap corruption.

To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.

    if ((p = (PYXXH32Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) {
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines 1137 to 1139
if ((self = PyObject_New(PYXXH64Object, type)) == NULL) {
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh64 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH64Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.

To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.

    if ((self = (PYXXH64Object *)type->tp_alloc(type, 0)) == NULL) {
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines 1283 to 1285
if ((p = PyObject_New(PYXXH64Object, Py_TYPE(self))) == NULL) {
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH64Object). PyObject_New will allocate too little memory, leading to heap corruption.

To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.

    if ((p = (PYXXH64Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) {
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines 1488 to 1490
if ((self = PyObject_New(PYXXH3_64Object, type)) == NULL) {
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh3_64 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH3_64Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.

To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.

    if ((self = (PYXXH3_64Object *)type->tp_alloc(type, 0)) == NULL) {
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines 1634 to 1636
if ((p = PyObject_New(PYXXH3_64Object, Py_TYPE(self))) == NULL) {
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH3_64Object). PyObject_New will allocate too little memory, leading to heap corruption.

To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.

    if ((p = (PYXXH3_64Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) {
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines 1846 to 1848
if ((self = PyObject_New(PYXXH3_128Object, type)) == NULL) {
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using PyObject_New to allocate instances of heap types is unsafe when subclassing is supported. If a user subclasses xxh3_128 in Python, the subclass type object will have a larger tp_basicsize. PyObject_New only allocates sizeof(PYXXH3_128Object) bytes, which will lead to buffer overflows and memory corruption when subclass fields are accessed.

To support subclassing safely, always use type->tp_alloc(type, 0) to allocate instances of heap types.

    if ((self = (PYXXH3_128Object *)type->tp_alloc(type, 0)) == NULL) {
        return NULL;
    }

Comment thread src/_xxhash.c Outdated
Comment on lines 2011 to 2013
if ((p = PyObject_New(PYXXH3_128Object, Py_TYPE(self))) == NULL) {
return NULL;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using PyObject_New to copy instances of heap types is unsafe when subclassing is supported. If self is an instance of a subclass, Py_TYPE(self) will be the subclass type, which requires more memory than sizeof(PYXXH3_128Object). PyObject_New will allocate too little memory, leading to heap corruption.

To support subclassing safely, use Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0) to allocate the copy.

    if ((p = (PYXXH3_128Object *)Py_TYPE(self)->tp_alloc(Py_TYPE(self), 0)) == NULL) {
        return NULL;
    }

Comment thread tests/test_benchmark.py Outdated
Comment on lines +845 to +848
h.hexdigest()


h.hexdigest()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a stray h.hexdigest() statement at the module level (with incorrect indentation) at the end of the file. This will cause an IndentationError or NameError when the benchmark module is imported, breaking the benchmark execution.

Suggested change
h.hexdigest()
h.hexdigest()
h.hexdigest()

Comment thread src/_xxhash.c
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 28, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety improvements to the xxhash C extension by implementing per-object locking to protect internal hash states from concurrent access, alongside adding sub-interpreter support and comprehensive test suites. The review comments highlight critical issues across multiple object creation and copy functions where memory allocated by PyObject_New is not zero-initialized, potentially causing crashes during deallocation if state initialization fails. Additionally, a stray, indented h.hexdigest() call at the end of tests/test_benchmark.py needs to be removed to prevent an IndentationError or NameError.

Comment thread src/_xxhash.c
return NULL;
}

XXHASH_LOCK_INIT(self);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Memory allocated by PyObject_New is not zero-initialized. If XXH32_createState() fails and returns NULL, self->xxhash_state will contain garbage. When Py_DECREF(self) is called, the deallocator PYXXH32_dealloc will attempt to free this garbage pointer, leading to a crash or memory corruption. Initialize self->xxhash_state to NULL immediately after lock initialization to prevent this.

    XXHASH_LOCK_INIT(self);
    self->xxhash_state = NULL;

Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread tests/test_benchmark.py Outdated
h.hexdigest()


h.hexdigest()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a stray, indented h.hexdigest() call at the end of the file outside of any function. This will cause an IndentationError or NameError when the test module is imported or executed. This line should be removed.

@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 28, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety to the _xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for Python 3.9-3.12) to protect internal hash states during concurrent operations. It also migrates static type definitions to heap types using PyType_FromModuleAndSpec for sub-interpreter safety, refactors argument parsing and initialization using macros, and adds comprehensive thread-safety, sub-interpreter, and benchmark tests. I have no feedback to provide as the only review comment was incorrect; PyModule_AddType does not steal a reference to the type object, meaning the Py_DECREF calls in the original code are correct and necessary to prevent memory leaks.

@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 29, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety improvements to the _xxhash C extension by implementing per-object locking and releasing the GIL for large inputs. It also refactors static types to heap types to support sub-interpreters (PEP 489/684/689) and adds comprehensive thread-safety, sub-interpreter, and benchmark tests. However, the reviewer identified a critical performance regression: the PyType_Spec definitions for all four hash types (xxh32, xxh64, xxh3_64, and xxh3_128) are missing the Py_TPFLAGS_HAVE_VECTORCALL flag. Without this flag, CPython will ignore the assigned tp_vectorcall handler and fall back to the slower standard call path.

Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 29, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety locking mechanisms to protect internal hash states during concurrent GIL-released updates, transitions the extension to use heap types for sub-interpreter support, and adds a fast path for bytes objects. Feedback on the changes highlights that the Py_TPFLAGS_HAVE_VECTORCALL flag is missing from the newly defined heap type specs (XXH32Type_spec, XXH64Type_spec, XXH3_64Type_spec, and XXH3_128Type_spec), which prevents CPython from utilizing the optimized vectorcall path and causes constructor calls to fall back to slower paths.

Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
@ifduyue
Copy link
Copy Markdown
Owner Author

ifduyue commented May 29, 2026

@gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces thread-safety to the xxhash C extension by implementing per-object locking (using PyMutex for Python 3.13+ and PyThread_type_lock for older versions) and migrates static types to heap types to support sub-interpreters safely. It also adds comprehensive thread-safety and sub-interpreter tests, alongside new benchmarks. The review feedback correctly identifies that the Py_TPFLAGS_HAVE_VECTORCALL flag is missing from the PyType_Spec definitions for all hash types (xxh32, xxh64, xxh3_64, and xxh3_128), which prevents the vectorcall optimization from being enabled and falls back to the slower standard call path.

Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
Comment thread src/_xxhash.c
@ifduyue ifduyue changed the title Add per-object locking for full thread-safety, sub-interpreter and free-threaded Python support Add per-object locking for thread safety, sub-interpreter and free-threaded Python support May 30, 2026
@ifduyue ifduyue force-pushed the multi-threads branch 2 times, most recently from 6d7989b to 74b3cbb Compare May 30, 2026 06:44
…readed Python support

Introduce a per-object lock (PyThread_type_lock on 3.9-3.12, PyMutex on 3.13+) to protect all access to xxhash_state, eliminating reliance on the GIL.

Lock is lazily allocated on first large update (>=64KB) for Python < 3.13, avoiding overhead in single-threaded small-data paths. On Python 3.13+ the lock is always active.

Key changes:
- All state accessors (update, digest, hexdigest, intdigest, copy, reset, init) are now lock-guarded
- update() releases GIL before acquiring lock for large data to avoid ABBA deadlock, matching hashlib pattern
- One-shot hash functions (xxh32(), xxh3_64(), etc.) release GIL for data >=64KB
- Switch from static PyTypeObject to heap types via PyType_FromModuleAndSpec, required for sub-interpreter safety
- Declare Py_MOD_PER_INTERPRETER_GIL_SUPPORTED (3.12+) and Py_MOD_GIL_NOT_USED (3.13+) in module slots
- Deduplicate per-type init/update logic via XXHASH_INIT and XXHASH_DO_UPDATE macros
- Optimize _get_buffer_or_str with a fast-path for bytes objects
- Replace intermediate stack buffers in digest/hexdigest with direct writes into pre-allocated PyBytes/PyUnicode
- Add comprehensive thread-safety stress tests (test_thread_safety.py)
- Add sub-interpreter compatibility tests (test_interpreters.py)
@ifduyue ifduyue merged commit 945da72 into master May 30, 2026
47 checks passed
@ifduyue ifduyue deleted the multi-threads branch May 30, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant