You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On ISO7-stable, I observed this (surprising) deadlock, I thought we had fixed it in the past?
As observed in the LSM testsuite:
DETAIL: Process 477163 waits for ShareLock on transaction 27565; blocked by process 477165.
Process 477165 waits for ShareLock on transaction 27560; blocked by process 477163.
HINT: See server log for query details.
Traceback (most recent call last):
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/util/__init__.py", line 600, in handle_result
task.result()
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/server/agentmanager.py", line 498, in _log_session_expiry_to_db
await data.AgentProcess.expire_process(session.id, now, connection)
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/data/__init__.py", line 3201, in expire_process
await aps.update_fields(connection=connection, expired=now)
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/data/__init__.py", line 1741, in update_fields
await self._execute_query(query, *values, connection=connection)
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/data/__init__.py", line 1599, in _execute_query
return await con.execute(query, *values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/asyncpg/connection.py", line 353, in execute
_, status, _ = await self._execute(
^^^^^^^^^^^^^^^^^^^^
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/asyncpg/connection.py", line 1794, in _execute
result, _ = await self.__execute(
^^^^^^^^^^^^^^^^^^^^^
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/asyncpg/connection.py", line 1892, in __execute
result, stmt = await self._do_execute(
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/asyncpg/connection.py", line 1945, in _do_execute
result = await executor(stmt, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "asyncpg/protocol/protocol.pyx", line 207, in bind_execute
asyncpg.exceptions.DeadlockDetectedError: deadlock detected
DETAIL: Process 477163 waits for ShareLock on transaction 27565; blocked by process 477165.
Process 477165 waits for ShareLock on transaction 27560; blocked by process 477163.
HINT: See server log for query details.
05:39:26.565 DEBUG Getting config in section client_rest_transport
db log
05:45:20.754 WARNING 2024-01-23 05:39:26.515 CET [477163] DETAIL: Process 477163 waits for ShareLock on transaction 27565; blocked by process 477165.
05:45:20.754 WARNING Process 477165 waits for ShareLock on transaction 27560; blocked by process 477163.
05:45:20.754 WARNING Process 477163: UPDATE agentprocess SET expired=$1 WHERE sid=$2
05:45:20.754 WARNING Process 477165: SELECT * FROM agent WHERE environment=$1 AND name=$2 LIMIT $3 FOR NO KEY UPDATE
05:45:20.755 WARNING 2024-01-23 05:39:26.515 CET [477163] HINT: See server log for query details.
Related
Related to this seems that on checkpoint and lsm modules (both using inmanta-extensions) we are losing the agent session frequently. This didn't happen before. If the session loss is not resolved while addressing this race, it should be made into another ticker
The text was updated successfully, but these errors were encountered:
…g_session_seen_to_db` and `_log_session_creation_to_db` methods. (Issue #7024, PR #7128)
# Description
The `_log_session_expiry_to_db`, `_log_session_seen_to_db` and `_log_session_creation_to_db` methods manipulate the `Agent`, `AgentInstance` and `AgentProcess` tables. But they don't modify these tables in the same order. This can result in deadlocks. This PR makes sure that all three methods manipulate the different database tables in the same order.
I must admit I didn't manage to reproduce this issue in a test case. So I am not sure whether this actually resolves the deadlock. But it's an improvement anyway. Let's see whether this change makes the deadlock disappear on our CI.
closes#7024
# Self Check:
- [x] Attached issue to pull request
- [x] Changelog entry
- [x] Type annotations are present
- [x] Code is clear and sufficiently documented
- [x] No (preventable) type errors (check using make mypy or make mypy-diff)
- [x] Sufficient test cases (reproduces the bug/tests the requested feature)
- [x] Correct, in line with design
- [ ] ~~End user documentation is included or an issue is created for end-user documentation~~
- [ ] ~~If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see [test-fixes](https://internal.inmanta.com/development/core/tasks/build-master.html#test-fixes) for more info)~~
On ISO7-stable, I observed this (surprising) deadlock, I thought we had fixed it in the past?
As observed in the LSM testsuite:
db log
Related
Related to this seems that on checkpoint and lsm modules (both using inmanta-extensions) we are losing the agent session frequently. This didn't happen before. If the session loss is not resolved while addressing this race, it should be made into another ticker
The text was updated successfully, but these errors were encountered: