-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform row-level locking when using Agents #3299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it's a great idea to make this more resilient, but synchronizing things (even in our tiny distributed system) is very tricky. I am mostly worried about putting locks around everything we ever change and then potentially(?) end up with a bunch of deadlocks.
What seems to be missing is everything in AgentControllerConcern
. We do now have locks in the check and receive jobs, but these Agents can also control other Agents.
Not super sure, but aren't the locks in the AgentController
unnecessary if we have the locks in the check and receive jobs?
Agent.transaction do | ||
@agent = current_user.agents.lock.find(params[:id]) | ||
@agent.update!(memory: {}) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just curious, in some places you used .with_lock
and in other wrapped .lock.find
in a transaction "manually", afaik they are interchangeable functionality wise. WDYT about using one pattern everywhere it's feasible? I generally think with_lock
is easier to read
Agent.transaction do | |
@agent = current_user.agents.lock.find(params[:id]) | |
@agent.update!(memory: {}) | |
end | |
@agent = current_user.agents.find(params[:id]) | |
@agent.with_lock do | |
@agent.update!(memory: {}) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would cost you two select queries when you needed just one and could lead to a race condition.
SELECT; -- find
-- The record may change
BEGIN; -- with_lock do
SELECT FOR UPDATE;
-- ...
COMMIT; -- end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, Huginn will never change id
or the ownership of an agent so the race condition would do little to no harm in this particular case, but I'm talking about the situation in general.
@agent = current_user.agents.find(params[:id]) | ||
@scenario = current_user.scenarios.find(params[:scenario_id]) | ||
@agent.scenarios.destroy(@scenario) | ||
Agent.transaction do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm do we need this? Afaik we don't have agents that change the scenario of another Agent.
I deeply understand your concern. As far as I investigated, running an agent is always done asynchronously either via Agent.async_check or Agent.async_receive, so a deadlock will not occur even if an agent triggers other agents while it's running. There can be a cyclic workflow, but each run is independent and does not involve a wait for another to finish.
Yes, this is one other thing to consider. A controller agent can "run" other agents, but in an async way as I said above. A controller agent can "configure" other agents, but controller agents are not allowed to be a controller of themselves, so it should be safe to acquire a lock on the target agents for updating while a controller agent is running. I've been running my instance with this patch applied for 3-4 days now. So far I haven't seen a problem, but I'm going to keep an eye on it for a while longer. |
It turned out that you should do this in a more fine-grained manner. Putting the whole run of check() in a transaction means any events created in it will be committed only after it finishes, and if the asynchronous calls for receive() of the downstream agents take place before that, they won't be able to see the created events. So, while there should be a global mechanism to prevent an agent from being run while it is running, transactions to keep the memory/event state consistent should be carefully designed and implemented on a per-agent, fine-grained basis. I'll reconsider this and come back with a better implementation. |
Huginn agents are not robust enough when it comes to dealing with concurrent events simultaneously because they don't have a proper locking mechanism built in to ensure the integrity of their memory state.
This PR is about making Huginn acquire a pessimistic lock whenever it reads and updates agent states.