Skip to content

Test (cleanup) improvements#1619

Merged
jmchilton merged 4 commits intogalaxyproject:masterfrom
mvdbeek:test_cleanup_improvements
Feb 23, 2026
Merged

Test (cleanup) improvements#1619
jmchilton merged 4 commits intogalaxyproject:masterfrom
mvdbeek:test_cleanup_improvements

Conversation

@mvdbeek
Copy link
Copy Markdown
Member

@mvdbeek mvdbeek commented Feb 22, 2026

Not sure this will really fix the timeouts we see for test_serve_multiple_tool_data_tables but it should at least be correct.

Use a separate SQLite database for the Celery message broker
(amqp_internal_connection) to avoid write lock contention between
gunicorn and Celery workers during Galaxy startup.

Gravity starts gunicorn, a Celery worker, and Celery beat by default.
All three processes build Galaxy app instances that access the same
SQLite database. With isolation_level=IMMEDIATE, concurrent write
transactions cause exclusive lock contention that can deadlock Galaxy
startup, particularly when heavier initialization (like custom
tool_data_table loading) widens the contention window.
The sleep() timeout parameter was compared against an iteration counter
(count > timeout), but each iteration takes ~1.5s (connect timeout +
sleep wait), so timeout=300 actually meant ~450s of wall time. This
exceeded the 360s pytest-timeout, preventing the internal timeout
handler (which prints Galaxy log contents) from ever running.

Use time.time() - start_time instead so timeout=300 means 300 actual
seconds, giving _serve's exception handler (with log_contents) time
to fire before pytest-timeout kills the test.
The galaxy.yml config was being written with unresolved ${temp_directory}
template variables in property values. These were only resolved in the
GALAXY_CONFIG_OVERRIDE_* environment variables, but Gravity may not
propagate those env vars to gunicorn workers. This caused Galaxy workers
to fail during startup because paths like new_file_path, job_working_directory,
etc. contained literal "${temp_directory}" strings instead of actual paths.

Resolve all template variables in properties before writing them to the
YAML config file, making it self-contained and independent of env var
propagation.
ServeTestCase shares a single galaxy_root across all test methods.
When test_serve_multiple_tool_data_tables starts a Galaxy daemon
referencing temp .xml.test files, the base CliTestCase.tearDown()
deletes those temp files then sends SIGINT to the gunicorn process.
However, SIGINT doesn't stop the gravity supervisor, which respawns
workers that crash on the now-deleted files. This pollutes the shared
galaxy_root and causes test_serve_workflow to fail starting Galaxy.

Fix by registering cleanup hook that kills the process group.
@jmchilton jmchilton merged commit a327b04 into galaxyproject:master Feb 23, 2026
34 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants