Skip to content

build(tox): migrate from pip to uv via tox-uv#6390

Draft
sentry-junior[bot] wants to merge 26 commits into
masterfrom
chore/migrate-to-tox-uv
Draft

build(tox): migrate from pip to uv via tox-uv#6390
sentry-junior[bot] wants to merge 26 commits into
masterfrom
chore/migrate-to-tox-uv

Conversation

@sentry-junior
Copy link
Copy Markdown

@sentry-junior sentry-junior Bot commented May 22, 2026

Description

  • add uv and tox-uv to manage python envs and packages instead of pip
  • use astral-sh/setup-uv action in CI instead of setup-python, uv always uses python3.13
  • except for 3.6 and 3.7, all python versions now go through tox-uv
  • 3.6 and 3.7 have their own containers which uv picks up through UV_PYTHON_REFERENCE
  • the SDK is now installed as part of deps and not via package so uv resolves all deps in a single pass, this was necessary since uv resolution is stricter than pip
  • some other pins were necessary to make CI pass

TODO

  • pre-releases are currently handled with UV_PRERELEASE=if-necessary-or-explicit and explicit pinning of transitive deps, maybe it makes sense to split off pre-releases in a separate action where we can have UV_PRERELEASE=all
  • coverage (which is a mess anyway)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 22, 2026

Codecov Results 📊

13 passed | Total: 13 | Pass Rate: 100% | Execution Time: 5.85s

All tests are passing successfully.

✅ Patch coverage is 100.00%. Project has 16266 uncovered lines.


Generated by Codecov Action

Comment thread scripts/populate_tox/tox.jinja Outdated
Comment thread .github/workflows/test-integrations-cloud.yml Outdated
Comment thread .github/workflows/test-integrations-common.yml
@sl0thentr0py sl0thentr0py force-pushed the chore/migrate-to-tox-uv branch 2 times, most recently from 35deb2d to a767133 Compare May 26, 2026 15:19
Comment thread .github/workflows/test-integrations-dbs.yml Outdated
Comment thread .github/workflows/test-integrations-tasks.yml
Comment thread scripts/populate_tox/tox.jinja
Comment thread Makefile
Comment thread .github/workflows/test-integrations-flags.yml Outdated
@sl0thentr0py sl0thentr0py force-pushed the chore/migrate-to-tox-uv branch from a3989a2 to 35675ae Compare May 26, 2026 16:55
Comment thread tox.ini Outdated
Comment thread scripts/populate_tox/tox.jinja Outdated
@sl0thentr0py sl0thentr0py force-pushed the chore/migrate-to-tox-uv branch from c897101 to 9c875dc Compare May 26, 2026 17:56
sentry-junior Bot and others added 13 commits May 26, 2026 21:26
Replace pip-backed virtualenv environments with uv using the tox-uv plugin.

- [tox] requires: swap virtualenv<20.26.3 pin for tox-uv; the virtualenv
  pin existed solely to prevent pip 24.1 being seeded into envs, which is
  irrelevant once uv manages all installs
- setenv: remove py3.14t VIRTUALENV_PIP=24.1 (virtualenv-specific, no-op
  with tox-uv's uv venv)
- commands: remove bare 'pip install' workaround lines; tox-uv does not
  seed pip into venvs so these would fail
- deps: add flask v1 compat packages (itsdangerous, markupsafe, jinja2)
  as factor-conditional deps to replace the removed pip install commands;
  the urllib3<2.0.0 boto3 pin was already present in the auto-generated deps
- CI templates updated (test_group.jinja); run scripts/generate-test-files.sh
  to regenerate the .github/workflows/test-integrations-*.yml files

Test matrix (envlist, Python versions, deps) is unchanged.

Note: Python 3.6 container handling is deferred; see plan canvas for
the recommended approach of running tox under a modern Python host.

Co-authored-by: Neel Shah <neel.shah@sentry.io>

---
[View Session in Sentry](https://sentry.sentry.io/traces/?project=4510944073809921&query=gen_ai.conversation.id%3A%22slack%3AC02T4BB83AS%3A1779437966.628249%22)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
actions/checkout sets safe.directory under a temporary HOME that is
discarded after the step, so subsequent steps see "dubious ownership"
and git fails. This makes get_default_release() return None and breaks
release/session-tracking tests on the 3.6/3.7 container jobs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
sl0thentr0py and others added 3 commits May 26, 2026 21:26
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sl0thentr0py sl0thentr0py force-pushed the chore/migrate-to-tox-uv branch 3 times, most recently from 27ce993 to 33b62fa Compare May 26, 2026 19:37
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sl0thentr0py sl0thentr0py force-pushed the chore/migrate-to-tox-uv branch from 33b62fa to 030fd1b Compare May 26, 2026 19:39
sl0thentr0py and others added 4 commits May 26, 2026 21:51
py3.7 celery in CI hangs in test_celery_beat_cron_monitoring::test_explanation.
kill_beat slept 1s then opened the pidfile; in the slower py3.7 container
startup, the file didn't exist yet, the thread died silently, and beat
ran forever (30-min job timeout).

Poll for the pidfile up to 30s before starting the kill timer, and dump
any future thread exception to stderr so the next failure surfaces a
traceback instead of silently hanging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .github/workflows/test-integrations-tasks.yml
sl0thentr0py and others added 5 commits May 26, 2026 23:24
The pidfile-race fix did not unhang py3.7 celery in CI. pytest-timeout
only dumps the parent pytest process's threads — the parent is stuck in
pytest-forked's waitpid, so the actual hang is somewhere inside the
forked child, invisible.

Schedule faulthandler.dump_traceback_later(45) in run_beat (which runs
inside the forked child) and cancel it on successful beat shutdown. If
beat hangs in CI again, the child's full thread dump lands in the log
and tells us where to look.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous attempt put faulthandler.dump_traceback_later inside run_beat,
but the py3.7 CI hang turns out to happen earlier — start_worker() never
returns, so run_beat is never reached and the dump is never armed.

Move the diagnostic into an autouse fixture in
tests/integrations/celery/integration_tests/conftest.py so it covers the
entire test body. Next CI hang should land a thread dump in the log
showing where start_worker is wedged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
So the py3.7-container kombu hang reveals its real exception instead of
sleeping forever inside retry_over_time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After moving py3.6/3.7 jobs into the python:X.Y container, the three beat
tests (test_explanation, test_beat_task_crons_success/error) hang in
celery's start_worker on py3.7 only — kombu's pre-connect via
default_channel never returns. This is a known kombu/redis-py + os.fork
interaction on the old pin (kombu 4.6 + redis-py <3.2) that celery 4.4.7
ships with; it worked previously only because the broker was on loopback,
not a bridged sibling container.

Bumping the existing < (3, 7) skip to < (3, 8). py3.7 is EOL and the same
tests run fine on 3.8+. Also dropping the diagnostic faulthandler
conftest and the kill_beat exception-printing wrapper now that the
root cause is understood; the pidfile-wait fix in kill_beat is kept since
it's a legitimate startup-race fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The workflow sets SENTRY_PYTHON_TEST_REDIS_HOST=redis for 3.6/3.7
container runs (since redis is a sibling service container, not on
loopback), but tox strips env vars not listed in passenv. The celery
beat tests therefore read the default 127.0.0.1, find nothing on
loopback inside the python container, and hang in kombu's
retry_over_time. Adding the var to passenv fixes it; un-skip the beat
tests on py3.7.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Waits up to `pidfile_timeout` seconds for the pidfile to appear before
starting the runtime timer, so slow process startup doesn't race the
killer into a FileNotFoundError that would leak a running beat.
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeError raised in background thread is silently swallowed, allowing beat process to leak

When the pidfile never appears within pidfile_timeout, the RuntimeError is raised inside the kill_beat background thread — Python does not propagate thread exceptions to the caller, so run_beat continues and beat_instance.run() is never terminated, defeating the leak-prevention goal. Consider signalling the failure via a threading.Event or checking t.is_alive() / t.join() with a timeout after beat_instance.run() returns, or using concurrent.futures.Future to surface the exception.

Evidence
  • kill_beat is launched via threading.Thread(target=kill_beat, ...) in run_beat (line 56).
  • t.join() is never called; run_beat proceeds directly to beat_instance.run().
  • In CPython, an unhandled exception in a Thread target is printed to stderr but does not propagate to the spawning thread.
  • If the timeout fires and RuntimeError is raised at line 34, the killer thread exits silently, beat_instance.run() blocks indefinitely, and the beat process is never sent SIGTERM.
  • The PR's stated goal ("so slow process startup doesn't race the killer into a FileNotFoundError that would leak a running beat") is therefore not achieved in the timeout branch.

Identified by Warden code-review · XNE-GLL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant