Perform row-level locking when using Agents #3299

knu · 2023-07-12T18:06:44Z

Huginn agents are not robust enough when it comes to dealing with concurrent events simultaneously because they don't have a proper locking mechanism built in to ensure the integrity of their memory state.

This PR is about making Huginn acquire a pessimistic lock whenever it reads and updates agent states.

dsander

In general it's a great idea to make this more resilient, but synchronizing things (even in our tiny distributed system) is very tricky. I am mostly worried about putting locks around everything we ever change and then potentially(?) end up with a bunch of deadlocks.

What seems to be missing is everything in AgentControllerConcern. We do now have locks in the check and receive jobs, but these Agents can also control other Agents.

Not super sure, but aren't the locks in the AgentController unnecessary if we have the locks in the check and receive jobs?

dsander · 2023-07-15T22:21:11Z

app/controllers/agents_controller.rb

+    Agent.transaction do
+      @agent = current_user.agents.lock.find(params[:id])
+      @agent.update!(memory: {})
+    end


Mostly just curious, in some places you used .with_lock and in other wrapped .lock.find in a transaction "manually", afaik they are interchangeable functionality wise. WDYT about using one pattern everywhere it's feasible? I generally think with_lock is easier to read

Suggested change

Agent.transaction do

@agent = current_user.agents.lock.find(params[:id])

@agent.update!(memory: {})

end

@agent = current_user.agents.find(params[:id])

@agent.with_lock do

@agent.update!(memory: {})

end

That would cost you two select queries when you needed just one and could lead to a race condition.

SELECT; -- find -- The record may change BEGIN; -- with_lock do SELECT FOR UPDATE; -- ... COMMIT; -- end

Of course, Huginn will never change id or the ownership of an agent so the race condition would do little to no harm in this particular case, but I'm talking about the situation in general.

dsander · 2023-07-15T22:28:40Z

app/controllers/agents_controller.rb

-    @agent = current_user.agents.find(params[:id])
-    @scenario = current_user.scenarios.find(params[:scenario_id])
-    @agent.scenarios.destroy(@scenario)
+    Agent.transaction do


Hmm do we need this? Afaik we don't have agents that change the scenario of another Agent.

knu · 2023-07-16T03:12:22Z

In general it's a great idea to make this more resilient, but synchronizing things (even in our tiny distributed system) is very tricky. I am mostly worried about putting locks around everything we ever change and then potentially(?) end up with a bunch of deadlocks.

I deeply understand your concern. As far as I investigated, running an agent is always done asynchronously either via Agent.async_check or Agent.async_receive, so a deadlock will not occur even if an agent triggers other agents while it's running. There can be a cyclic workflow, but each run is independent and does not involve a wait for another to finish.

What seems to be missing is everything in AgentControllerConcern. We do now have locks in the check and receive jobs, but these Agents can also control other Agents.

Not super sure, but aren't the locks in the AgentController unnecessary if we have the locks in the check and receive jobs?

Yes, this is one other thing to consider.

A controller agent can "run" other agents, but in an async way as I said above.

A controller agent can "configure" other agents, but controller agents are not allowed to be a controller of themselves, so it should be safe to acquire a lock on the target agents for updating while a controller agent is running.
Without a lock, updating an agent cannot be performed atomically, leading to a race condition between reading and writing.

I've been running my instance with this patch applied for 3-4 days now. So far I haven't seen a problem, but I'm going to keep an eye on it for a while longer.

knu · 2023-07-18T01:29:11Z

It turned out that you should do this in a more fine-grained manner. Putting the whole run of check() in a transaction means any events created in it will be committed only after it finishes, and if the asynchronous calls for receive() of the downstream agents take place before that, they won't be able to see the created events.

So, while there should be a global mechanism to prevent an agent from being run while it is running, transactions to keep the memory/event state consistent should be carefully designed and implemented on a per-agent, fine-grained basis. I'll reconsider this and come back with a better implementation.

knu added 3 commits July 13, 2023 01:26

Perform row-level locking when using Agents

3e1a7ca

Reformat the code with Rubocop

487116f

ActiveRecord::Base#reload takes arguments

39a38bb

knu requested a review from dsander July 12, 2023 18:13

dsander reviewed Jul 15, 2023

View reviewed changes

knu marked this pull request as draft July 18, 2023 01:30

MarioRicalde mentioned this pull request Aug 9, 2023

Issues, Pull Requests and Milestones Cleanup (housekeeping) #3306

Open

knu closed this Nov 3, 2023

knu deleted the lock_agents branch November 3, 2023 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform row-level locking when using Agents #3299

Perform row-level locking when using Agents #3299

knu commented Jul 12, 2023

dsander left a comment

dsander Jul 15, 2023

knu Jul 16, 2023

knu Jul 16, 2023 •

edited

Loading

dsander Jul 15, 2023

knu commented Jul 16, 2023

knu commented Jul 18, 2023

Perform row-level locking when using Agents #3299

Perform row-level locking when using Agents #3299

Conversation

knu commented Jul 12, 2023

dsander left a comment

Choose a reason for hiding this comment

dsander Jul 15, 2023

Choose a reason for hiding this comment

knu Jul 16, 2023

Choose a reason for hiding this comment

knu Jul 16, 2023 • edited Loading

Choose a reason for hiding this comment

dsander Jul 15, 2023

Choose a reason for hiding this comment

knu commented Jul 16, 2023

knu commented Jul 18, 2023

knu Jul 16, 2023 •

edited

Loading