build(tox): migrate from pip to uv via tox-uv#6390
Conversation
Codecov Results 📊✅ 13 passed | Total: 13 | Pass Rate: 100% | Execution Time: 5.85s All tests are passing successfully. ✅ Patch coverage is 100.00%. Project has 16266 uncovered lines. Generated by Codecov Action |
f7d0dfa to
757ecf4
Compare
35deb2d to
a767133
Compare
a3989a2 to
35675ae
Compare
c897101 to
9c875dc
Compare
Replace pip-backed virtualenv environments with uv using the tox-uv plugin. - [tox] requires: swap virtualenv<20.26.3 pin for tox-uv; the virtualenv pin existed solely to prevent pip 24.1 being seeded into envs, which is irrelevant once uv manages all installs - setenv: remove py3.14t VIRTUALENV_PIP=24.1 (virtualenv-specific, no-op with tox-uv's uv venv) - commands: remove bare 'pip install' workaround lines; tox-uv does not seed pip into venvs so these would fail - deps: add flask v1 compat packages (itsdangerous, markupsafe, jinja2) as factor-conditional deps to replace the removed pip install commands; the urllib3<2.0.0 boto3 pin was already present in the auto-generated deps - CI templates updated (test_group.jinja); run scripts/generate-test-files.sh to regenerate the .github/workflows/test-integrations-*.yml files Test matrix (envlist, Python versions, deps) is unchanged. Note: Python 3.6 container handling is deferred; see plan canvas for the recommended approach of running tox under a modern Python host. Co-authored-by: Neel Shah <neel.shah@sentry.io> --- [View Session in Sentry](https://sentry.sentry.io/traces/?project=4510944073809921&query=gen_ai.conversation.id%3A%22slack%3AC02T4BB83AS%3A1779437966.628249%22)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
actions/checkout sets safe.directory under a temporary HOME that is discarded after the step, so subsequent steps see "dubious ownership" and git fails. This makes get_default_release() return None and breaks release/session-tracking tests on the 3.6/3.7 container jobs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
27ce993 to
33b62fa
Compare
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
33b62fa to
030fd1b
Compare
py3.7 celery in CI hangs in test_celery_beat_cron_monitoring::test_explanation. kill_beat slept 1s then opened the pidfile; in the slower py3.7 container startup, the file didn't exist yet, the thread died silently, and beat ran forever (30-min job timeout). Poll for the pidfile up to 30s before starting the kill timer, and dump any future thread exception to stderr so the next failure surfaces a traceback instead of silently hanging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pidfile-race fix did not unhang py3.7 celery in CI. pytest-timeout only dumps the parent pytest process's threads — the parent is stuck in pytest-forked's waitpid, so the actual hang is somewhere inside the forked child, invisible. Schedule faulthandler.dump_traceback_later(45) in run_beat (which runs inside the forked child) and cancel it on successful beat shutdown. If beat hangs in CI again, the child's full thread dump lands in the log and tells us where to look. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous attempt put faulthandler.dump_traceback_later inside run_beat, but the py3.7 CI hang turns out to happen earlier — start_worker() never returns, so run_beat is never reached and the dump is never armed. Move the diagnostic into an autouse fixture in tests/integrations/celery/integration_tests/conftest.py so it covers the entire test body. Next CI hang should land a thread dump in the log showing where start_worker is wedged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
So the py3.7-container kombu hang reveals its real exception instead of sleeping forever inside retry_over_time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After moving py3.6/3.7 jobs into the python:X.Y container, the three beat tests (test_explanation, test_beat_task_crons_success/error) hang in celery's start_worker on py3.7 only — kombu's pre-connect via default_channel never returns. This is a known kombu/redis-py + os.fork interaction on the old pin (kombu 4.6 + redis-py <3.2) that celery 4.4.7 ships with; it worked previously only because the broker was on loopback, not a bridged sibling container. Bumping the existing < (3, 7) skip to < (3, 8). py3.7 is EOL and the same tests run fine on 3.8+. Also dropping the diagnostic faulthandler conftest and the kill_beat exception-printing wrapper now that the root cause is understood; the pidfile-wait fix in kill_beat is kept since it's a legitimate startup-race fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The workflow sets SENTRY_PYTHON_TEST_REDIS_HOST=redis for 3.6/3.7 container runs (since redis is a sibling service container, not on loopback), but tox strips env vars not listed in passenv. The celery beat tests therefore read the default 127.0.0.1, find nothing on loopback inside the python container, and hang in kombu's retry_over_time. Adding the var to passenv fixes it; un-skip the beat tests on py3.7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| Waits up to `pidfile_timeout` seconds for the pidfile to appear before | ||
| starting the runtime timer, so slow process startup doesn't race the | ||
| killer into a FileNotFoundError that would leak a running beat. | ||
| """ |
There was a problem hiding this comment.
RuntimeError raised in background thread is silently swallowed, allowing beat process to leak
When the pidfile never appears within pidfile_timeout, the RuntimeError is raised inside the kill_beat background thread — Python does not propagate thread exceptions to the caller, so run_beat continues and beat_instance.run() is never terminated, defeating the leak-prevention goal. Consider signalling the failure via a threading.Event or checking t.is_alive() / t.join() with a timeout after beat_instance.run() returns, or using concurrent.futures.Future to surface the exception.
Evidence
kill_beatis launched viathreading.Thread(target=kill_beat, ...)inrun_beat(line 56).t.join()is never called;run_beatproceeds directly tobeat_instance.run().- In CPython, an unhandled exception in a
Threadtarget is printed to stderr but does not propagate to the spawning thread. - If the timeout fires and
RuntimeErroris raised at line 34, the killer thread exits silently,beat_instance.run()blocks indefinitely, and the beat process is never sentSIGTERM. - The PR's stated goal ("so slow process startup doesn't race the killer into a
FileNotFoundErrorthat would leak a running beat") is therefore not achieved in the timeout branch.
Identified by Warden code-review · XNE-GLL
Description
uvandtox-uvto manage python envs and packages instead ofpipastral-sh/setup-uvaction in CI instead ofsetup-python,uvalways usespython3.13tox-uvuvpicks up throughUV_PYTHON_REFERENCEdepsand not viapackagesouvresolves all deps in a single pass, this was necessary sinceuvresolution is stricter thanpipTODO
UV_PRERELEASE=if-necessary-or-explicitand explicit pinning of transitive deps, maybe it makes sense to split off pre-releases in a separate action where we can haveUV_PRERELEASE=all