I recently ran into an issue where a specific tool would not schedule, throwing an exception. The issue persisted across restarts of Galaxy. There was a pending request in PostGRES associated with Galaxy despite the Galaxy server being shut down. Manually killing that request and starting Galaxy allowed the tool to schedule properly.
Query listing zombie request:
select pid,datname,usename,application_name,query_start,state from pg_stat_activity;
Query to kill zombie request:
select pg_terminate_backend(PID);
I think this was caused by the way Galaxy shuts down. I restarted the Galaxy server while it was scheduling a large batch of jobs before this issue arose. Is there a way to have Galaxy gracefully shut down rather than killing the process?
galaxy.jobs.runners.util.process_groups DEBUG 2019-12-05 15:50:44,928 [p:129610,w:1,m:0] [LocalRunner.work_thread-2] check_pg(): No process found in process group 147596
...
galaxy.jobs.runners ERROR 2019-12-04 14:54:13,905 [p:102142,w:1,m:0] [SlurmRunner.work_thread-0] (12440) Failure preparing job
Traceback (most recent call last):
File "lib/galaxy/jobs/runners/__init__.py", line 237, in prepare_job
stderr_file=stderr_file,
File "lib/galaxy/jobs/runners/__init__.py", line 278, in build_command_line
stderr_file=stderr_file,
File "lib/galaxy/jobs/command_factory.py", line 71, in build_command
__handle_dependency_resolution(commands_builder, job_wrapper, remote_command_params)
File "lib/galaxy/jobs/command_factory.py", line 182, in __handle_dependency_resolution
if local_dependency_resolution and job_wrapper.dependency_shell_commands:
File "lib/galaxy/jobs/__init__.py", line 920, in dependency_shell_commands
job_directory=self.working_directory
File "lib/galaxy/tools/__init__.py", line 1657, in build_dependency_shell_commands
installed_tool_dependencies=self.installed_tool_dependencies,
File "lib/galaxy/tools/__init__.py", line 1667, in installed_tool_dependencies
if self.tool_shed_repository:
File "lib/galaxy/tools/__init__.py", line 535, in tool_shed_repository
from_cache=True)
File "lib/tool_shed/util/repository_util.py", line 372, in get_installed_repository
repository_id=repository_id)
File "lib/galaxy/tools/cache.py", line 172, in get_installed_repository
if installed_changeset_revision and repo.installed_changeset_revision != installed_changeset_revision:
File "/localscratch/galaxy/rpp-fiona/19.09/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 282, in __get__
return self.impl.get(instance_state(instance), dict_)
File "/localscratch/galaxy/rpp-fiona/19.09/lib/python2.7/site-packages/sqlalchemy/orm/attributes.py", line 705, in get
value = state._load_expired(state, passive)
File "/localscratch/galaxy/rpp-fiona/19.09/lib/python2.7/site-packages/sqlalchemy/orm/state.py", line 660, in _load_expired
self.manager.deferred_scalar_loader(self, toload)
File "/localscratch/galaxy/rpp-fiona/19.09/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 913, in load_scalar_attributes
"attribute refresh operation cannot proceed" % (state_str(state))
DetachedInstanceError: Instance <ToolShedRepository at 0x7f2f296e4cd0> is not bound to a Session; attribute refresh operation cannot proceed (Background on this error at: http://sqlalche.me/e/bhk3)
Galaxy 19.09 and PostGRES 9
I recently ran into an issue where a specific tool would not schedule, throwing an exception. The issue persisted across restarts of Galaxy. There was a pending request in PostGRES associated with Galaxy despite the Galaxy server being shut down. Manually killing that request and starting Galaxy allowed the tool to schedule properly.
Query listing zombie request:
Query to kill zombie request:
select pg_terminate_backend(PID);I think this was caused by the way Galaxy shuts down. I restarted the Galaxy server while it was scheduling a large batch of jobs before this issue arose. Is there a way to have Galaxy gracefully shut down rather than killing the process?
Related: #8933
The text was updated successfully, but these errors were encountered: