Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock on agent table #7024

Closed
wouterdb opened this issue Jan 23, 2024 · 0 comments
Closed

Deadlock on agent table #7024

wouterdb opened this issue Jan 23, 2024 · 0 comments
Assignees
Labels
build master task A ticket created by the build master meant to be picked up by any developer when it suits them.

Comments

@wouterdb
Copy link
Contributor

wouterdb commented Jan 23, 2024

On ISO7-stable, I observed this (surprising) deadlock, I thought we had fixed it in the past?

As observed in the LSM testsuite:

DETAIL:  Process 477163 waits for ShareLock on transaction 27565; blocked by process 477165.
Process 477165 waits for ShareLock on transaction 27560; blocked by process 477163.
HINT:  See server log for query details.
Traceback (most recent call last):
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/util/__init__.py", line 600, in handle_result
    task.result()
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/server/agentmanager.py", line 498, in _log_session_expiry_to_db
    await data.AgentProcess.expire_process(session.id, now, connection)
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/data/__init__.py", line 3201, in expire_process
    await aps.update_fields(connection=connection, expired=now)
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/data/__init__.py", line 1741, in update_fields
    await self._execute_query(query, *values, connection=connection)
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/****/data/__init__.py", line 1599, in _execute_query
    return await con.execute(query, *values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/asyncpg/connection.py", line 353, in execute
    _, status, _ = await self._execute(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/asyncpg/connection.py", line 1794, in _execute
    result, _ = await self.__execute(
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/asyncpg/connection.py", line 1892, in __execute
    result, stmt = await self._do_execute(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jenkins/workspace/modules_lsm_master/env/lib64/python3.11/site-packages/asyncpg/connection.py", line 1945, in _do_execute
    result = await executor(stmt, None)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "asyncpg/protocol/protocol.pyx", line 207, in bind_execute
asyncpg.exceptions.DeadlockDetectedError: deadlock detected
DETAIL:  Process 477163 waits for ShareLock on transaction 27565; blocked by process 477165.
Process 477165 waits for ShareLock on transaction 27560; blocked by process 477163.
HINT:  See server log for query details.
05:39:26.565 DEBUG Getting config in section client_rest_transport

db log

05:45:20.754 WARNING 2024-01-23 05:39:26.515 CET [477163] DETAIL:  Process 477163 waits for ShareLock on transaction 27565; blocked by process 477165.
05:45:20.754 WARNING 	Process 477165 waits for ShareLock on transaction 27560; blocked by process 477163.
05:45:20.754 WARNING 	Process 477163: UPDATE agentprocess SET expired=$1 WHERE sid=$2
05:45:20.754 WARNING 	Process 477165: SELECT  *  FROM agent WHERE environment=$1 AND name=$2 LIMIT $3 FOR NO KEY UPDATE
05:45:20.755 WARNING 2024-01-23 05:39:26.515 CET [477163] HINT:  See server log for query details.

Related

Related to this seems that on checkpoint and lsm modules (both using inmanta-extensions) we are losing the agent session frequently. This didn't happen before. If the session loss is not resolved while addressing this race, it should be made into another ticker

@wouterdb wouterdb added the build master task A ticket created by the build master meant to be picked up by any developer when it suits them. label Jan 23, 2024
@arnaudsjs arnaudsjs self-assigned this Feb 5, 2024
inmantaci pushed a commit that referenced this issue Feb 6, 2024
…g_session_seen_to_db` and `_log_session_creation_to_db` methods. (Issue #7024, PR #7128)

# Description

The `_log_session_expiry_to_db`, `_log_session_seen_to_db` and `_log_session_creation_to_db` methods manipulate the `Agent`, `AgentInstance` and `AgentProcess` tables. But they don't modify these tables in the same order. This can result in deadlocks. This PR makes sure that all three methods manipulate the different database tables in the same order.

I must admit I didn't manage to reproduce this issue in a test case. So I am not sure whether this actually resolves the deadlock. But it's an improvement anyway. Let's see whether this change makes the deadlock disappear on our CI.

closes #7024

# Self Check:

- [x] Attached issue to pull request
- [x] Changelog entry
- [x] Type annotations are present
- [x] Code is clear and sufficiently documented
- [x] No (preventable) type errors (check using make mypy or make mypy-diff)
- [x] Sufficient test cases (reproduces the bug/tests the requested feature)
- [x] Correct, in line with design
- [ ] ~~End user documentation is included or an issue is created for end-user documentation~~
- [ ] ~~If this PR fixes a race condition in the test suite, also push the fix to the relevant stable branche(s) (see [test-fixes](https://internal.inmanta.com/development/core/tasks/build-master.html#test-fixes) for more info)~~
inmantaci pushed a commit that referenced this issue Feb 6, 2024
…g_session_seen_to_db` and `_log_session_creation_to_db` methods. (Issue #7024, PR #7128)

Pull request opened by the merge tool on behalf of #7128
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build master task A ticket created by the build master meant to be picked up by any developer when it suits them.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants