Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix huge bottleneck in notification emails #4200

Merged
merged 1 commit into from
Dec 11, 2017

Conversation

coolo
Copy link
Member

@coolo coolo commented Dec 10, 2017

The way source_watchers and target_watchers were implemented was pretty
suboptimal. It took >20 seconds to dig out 128 users watching
openSUSE:Factory by going through the WatchedProject objects connected
to these 128 users and listed the users for these, which resulted in
128x128 SELECT user ... - but all we want is the list of 128 User
objects (to filter subscriptions for them)

So the new query results in one SQL statement taking the database a
friction of a second

@coolo
Copy link
Member Author

coolo commented Dec 10, 2017

Before:

I, [2017-12-10T09:32:57.197592 #3250]  INFO -- : [ActiveJob] [SendEventEmailsJob] [13052478-0991-4bb6-99fd-fdbc65e947b8] [3250:151741.11] Sent mail to dleuenberger
I, [2017-12-10T09:34:06.935899 #3250]  INFO -- : [ActiveJob] [SendEventEmailsJob] [13052478-0991-4bb6-99fd-fdbc65e947b8] [3250:151810.84] Sent mail to dleuenberger
I, [2017-12-10T09:34:54.675760 #3250]  INFO -- : [ActiveJob] [SendEventEmailsJob] [13052478-0991-4bb6-99fd-fdbc65e947b8] [3250:151858.58] Sent mail to dleuenberger
I, [2017-12-10T09:36:52.169911 #3250]  INFO -- : [ActiveJob] [SendEventEmailsJob] [13052478-0991-4bb6-99fd-fdbc65e947b8] [3250:151976.08] Sent mail to dleuenberger

(aka roughly one mail a minute)

After:

I, [2017-12-10T12:11:35.828373 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3029.05] Sent mail to dleuenberger
I, [2017-12-10T12:11:36.535037 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3029.76] Sent mail to dleuenberger
I, [2017-12-10T12:11:37.341072 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3030.57] Sent mail to dleuenberger
I, [2017-12-10T12:11:38.088358 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3031.31] Sent mail to dleuenberger
I, [2017-12-10T12:11:39.950355 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3033.18] Sent mail to dleuenberger
I, [2017-12-10T12:11:42.779564 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3036.00] Sent mail to dleuenberger
I, [2017-12-10T12:11:43.441353 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3036.67] Sent mail to dleuenberger
I, [2017-12-10T12:11:44.185412 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3037.41] Sent mail to dleuenberger
I, [2017-12-10T12:11:44.843651 #21396]  INFO -- : [ActiveJob] [SendEventEmailsJob] [b8da3c60-16bc-4070-9e48-5cab2fca8d1f] [21396:3038.07] Sent mail to dleuenberger

Now roughly one mail every 3 seconds. Still not perfect, but we should get the event backlog down with this speed.

@codecov
Copy link

codecov bot commented Dec 10, 2017

Codecov Report

Merging #4200 into master will increase coverage by 41.17%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #4200       +/-   ##
===========================================
+ Coverage   46.58%   87.75%   +41.17%     
===========================================
  Files         340      340               
  Lines       22439    18567     -3872     
===========================================
+ Hits        10453    16294     +5841     
+ Misses      11986     2273     -9713
Flag Coverage Δ
#api 81.06% <100%> (?)
#rspec 69.27% <100%> (+22.68%) ⬆️

@Vogtinator
Copy link
Member

This is likely the reason I got 4308 emails this morning instead of the usual ~100. If it is - good job!

@coolo
Copy link
Member Author

coolo commented Dec 11, 2017

fyi: the delayed job backlog is done as of this morning - we had >11.000 when I started to look at the problem.

@evanrolfe
Copy link

@coolo Nice work thanks for fixing this! Somewhat annoyed I didn't see this problem myself.. I thought that by INNER JOINing the watched_projects and users this wouldn't be a problem but obviously that wasn't the case.

@@ -134,8 +134,8 @@ def source_watchers

def find_watchers(project_key)
project_names = payload['actions'].map { |action| action[project_key] }.uniq
projects = Project.where(name: project_names).joins(watched_projects: :user)
projects.flat_map { |project| project.watched_projects.map(&:user) }
projects = WatchedProject.where(project: Project.where(name: project_names))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small point but please change the variable name from projects to watched_projects.

Copy link
Member

@Ana06 Ana06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we had an unneeded join 🙈

The way source_watchers and target_watchers were implemented was pretty
suboptimal. It took >20 seconds to dig out 128 users watching
openSUSE:Factory by going through the WatchedProject objects connected
to these 128 users and listed the users for these, which resulted in
128x128 SELECT user ... - but all we want is the list of 128 User
objects (to filter subscriptions for them)

So the new query results in one SQL statement taking the database a
friction of a second
@coolo
Copy link
Member Author

coolo commented Dec 11, 2017

@evanrolfe if it helps your ego - the problem was only so severe because Max created 4000 requests from the very often watched openSUSE:Factory. Normally we don't have so many requests to or from it, that you would notice.

For the curious: this is how the SQL looks before and after:

  Project Load (806.0ms)  SELECT `projects`.* FROM `projects` INNER JOIN `watched_projects` ON `watched_projects`.`project_id` = `projects`.`id` INNER JOIN `users` ON `users`.`id` = `watched_projects`.`user_id` WHERE (projects.id not in ($SECRET)) AND `projects`.`name` = 'openSUSE:Leap:15.0'
  WatchedProject Load (3.8ms)  SELECT `watched_projects`.* FROM `watched_projects` WHERE `watched_projects`.`project_id` = 236751
  User Load (24.9ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 5 LIMIT 1
  User Load (9.2ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 23185 LIMIT 1
  User Load (1.3ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 47748 LIMIT 1
  WatchedProject Load (3.5ms)  SELECT `watched_projects`.* FROM `watched_projects` WHERE `watched_projects`.`project_id` = 236751
  User Load (2.5ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 5 LIMIT 1
  User Load (8.8ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 23185 LIMIT 1
  User Load (0.6ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 47748 LIMIT 1
  WatchedProject Load (5.1ms)  SELECT `watched_projects`.* FROM `watched_projects` WHERE `watched_projects`.`project_id` = 236751
  User Load (6.9ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 5 LIMIT 1
  User Load (3.5ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 23185 LIMIT 1
  User Load (14.3ms)  SELECT  `users`.* FROM `users` WHERE `users`.`id` = 47748 LIMIT 1
User Load (1602.8ms)  SELECT `users`.* FROM `users` WHERE `users`.`id` IN (SELECT `watched_projects`.`user_id` FROM `watched_projects` WHERE `watched_projects`.`project_id` IN (SELECT `projects`.`id` FROM `projects` WHERE (projects.id not in ($SECRET)) AND `projects`.`name` = 'openSUSE:Leap:15.0'))

projects = Project.where(name: project_names).joins(watched_projects: :user)
projects.flat_map { |project| project.watched_projects.map(&:user) }
watched_projects = WatchedProject.where(project: Project.where(name: project_names))
User.where(id: watched_projects.select(:user_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before we where returning an array, while now a relations is returned. Maybe we want to keep returning an array using find instead:

User.find(watched_projects.select(:user_id))

I also find it more readable. I do not know if using find instead of where can affect performance (either in a negative or a positive way).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO returning a relation instead of an array is an improvement since you can always call .to_a on a relataion but you cant really turn an array into a relation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't matter - the next call on the relation is .each - which will cast into an array. But when you return a relation the caller has the freedom to use in_batches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then let's merge this! 😉

@Ana06 Ana06 merged commit ccda057 into openSUSE:master Dec 11, 2017
@coolo coolo deleted the optimize_watcher_query branch December 11, 2017 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants