-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 Events from 1 Agent when running multiple workers #1454
Comments
Maybe it's late, but now I'm very consistently getting 1001 Events. I don't think I've changed anything significant. I changed my |
Which version are you running? Before #1432 was merged you could get duplicate events by triggering the event propagation via the web interface at the same time the background processor runs the propagation. |
Yep, running code that's after #1432. No other agents on the system besides the ones generated by the above script. |
I wonder if you get the same symptoms with Agents that don't interact with the network? |
I changed my setup script to be trigger = User.find(1).agents.create!(type: "Agents::ManualEventAgent", name: "PostTrigger"); nil;
500.times do |i|
options = {
:expected_receive_period_in_days => "2",
:rules => [{
:type => "regex",
:value => "rain",
:path => "conditions"
}],
:message => "Agent: #{i} -It looks like '{{conditions}}' tomorrow in {{location}}"
}
agent = User.find(1).agents.new(type: 'Agents::TriggerAgent', name: "First Agent #{i}", options: options)
agent.sources << trigger
agent.save!
end and Procfile looks like:
with a start command to kick off 2 workers: Results: Event.count = 507 |
I see, my guess is that the duplicate events are caused by the scaling feature of foreman. We added the bulky multiple delayed_job worker examples to the Procfiles because DJ needs the
|
Got 509 with your Procfile |
Darn, thanks for checking that. Looks like another debugging session is in order. 😄 |
Is it possible some jobs are failing and being retried in some way to cause duplicates? Is there something I can look in the logs for? |
It seems unlikely that the jobs of those agents would fail. Do you see a pattern in the duplicated events e.g. are the duplicates always from the same agents or is it completely random? |
It looks like it's all over the place. Here's the duplicate Events (events with overlapping agent_ids) from the last run I did where the count at the end was 509: [#<Event:0x007fd9b975d1f0
id: 72722,
user_id: 1,
agent_id: 22732,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 4 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975d038
id: 72723,
user_id: 1,
agent_id: 22732,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 4 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975ce80
id: 72778,
user_id: 1,
agent_id: 22768,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 40 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975ccc8
id: 72779,
user_id: 1,
agent_id: 22768,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 40 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975ca98
id: 72820,
user_id: 1,
agent_id: 22851,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 123 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975c868
id: 72819,
user_id: 1,
agent_id: 22851,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 123 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975c700
id: 72694,
user_id: 1,
agent_id: 22886,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 158 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:21 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:21 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975c548
id: 72693,
user_id: 1,
agent_id: 22886,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 158 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:21 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:21 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975c318
id: 72754,
user_id: 1,
agent_id: 22952,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 224 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975c1d8
id: 72753,
user_id: 1,
agent_id: 22952,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 224 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:22 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b975c070
id: 72793,
user_id: 1,
agent_id: 23169,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 441 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b9757e30
id: 72792,
user_id: 1,
agent_id: 23169,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 441 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b9757bd8
id: 72849,
user_id: 1,
agent_id: 23175,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 447 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b9757958
id: 72850,
user_id: 1,
agent_id: 23175,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 447 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:23 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b9757688
id: 72676,
user_id: 1,
agent_id: 23225,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 497 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:21 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:21 PDT -07:00,
expires_at: nil>,
#<Event:0x007fd9b9757548
id: 72679,
user_id: 1,
agent_id: 23225,
lat: nil,
lng: nil,
payload: {"message"=>"Agent: 497 -It looks like 'rain' tomorrow in "},
created_at: Mon, 02 May 2016 13:47:21 PDT -07:00,
updated_at: Mon, 02 May 2016 13:47:21 PDT -07:00,
expires_at: nil>] |
I'm also curious to know if someone else can reproduce the "extra events" I'm seeing? Maybe I just have a bad configuration locally? |
I was able to reproduce it in the |
Spoke too soon, duplicates are still happening, really strange. |
Good to know I'm not alone. Maybe worth noting I'm using postgres (PostgreSQL) 9.4.5 as my database. I haven't tried the same experiment agains a MySQL database. |
When I comment out these lines from Delayed::Backend::ActiveRecord.configure do |config|
config.reserve_sql_strategy = :default_sql
end They seem to be put in for #984, which is in response to #977, and tied to collectiveidea/delayed_job_active_record#111 |
- huginn#1454 - Now workers can be scaled independently on Heroku to address extra load
Hmm maybe we can switch back to the optimized version for postgres, I suspect in mysql we would still see the deadlocks. |
As mentioned above, there were times where I saw > 1000 Events for what should have been 500 new events. |
I think using the optimized version for postgres makes sense. Is switching on the adapter the best option? |
- Now workers can be scaled independently on Heroku to address extra load Fixes huginn#1454
Thanks for figuring this out! |
As I'm digging in to this problem I thought I'd open an issue in case someone else comes across this problem and/or has some insight. All this is locally on my MacBook Pro. I ran a script that dsander posted in another thread to create 500 pairs of agents (1000 total) plus one ManualEventAgent to kick off a set of web requests back to the web server.
Expected outcome: 501 Events, Actual: 502 (or more) Events.
My Procfile looks like this:
If I comment out the second worker, I get the expected 501 Events.
The extra event looks like a duplicate. Their id's are back-to-back (eg. 302 and 303) with the same agent_id.
Other stuff that might be relevant that I can think of:
DELAYED_JOB_SLEEP_DELAY=10
Ruby 2.2.3
postgres (PostgreSQL) 9.4.5
The text was updated successfully, but these errors were encountered: