Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

top_contributors_questions DB queries are too expensive to include in questions_list view #3373

Closed
jgmize opened this issue Oct 17, 2018 · 1 comment

Comments

@jgmize
Copy link
Contributor

jgmize commented Oct 17, 2018

We have had repeated performance problems where the DB CPU gets stuck near 100% due to queries similar to the following:

SELECT COUNT(*) FROM 
(SELECT `auth_user`.`id` AS Col1, COUNT(`questions_answer`.`id`) AS `query_count` 
 FROM `auth_user` 
 INNER JOIN `questions_answer` ON 
 (`auth_user`.`id` = `questions_answer`.`creator_id` ) 
 WHERE (`questions_answer`.`id`) IN 
 (SELECT U0.`id` FROM `questions_answer` U0 INNER JOIN `questions_question` U1 
  ON ( U0.`question_id` = U1.`id` ) 
  INNER JOIN `products_product` U4 ON ( U1.`product_id` = U4.`id` ) 
  WHERE (NOT (U0.`is_spam` = %s) AND NOT (U1.`is_spam` = %s) AND NOT (U0.`creator_id` = (U1.`creator_id`)) AND U0.`created` >= %s AND U1.`locale` = %s AND U4.`slug` = %s)) 
 GROUP BY `auth_user`.`id` ORDER BY NULL)

While the CPU is spiking, I could see many, many examples of this query in a MySQL show processlist, and you can also see an example of this query (and the ~20 sec execution time) in NR

I believe this query is from top_contributors_questions utility function called by the question_list view. The results of this query are cached for 15 minutes, but when there is a cache miss the repeated attempts to update the cache (especially when particularly aggressive bots are scanning the site) cause the DB to max out the CPU and have required manual intervention to bring the site back to acceptable performance levels.

@pmac
Copy link
Member

pmac commented Oct 17, 2018

We had a great discussion about this and we think we have a good path forward. The deal is this:

The query above generates the "Top Contriburors" list on the questions list pages (there is one for every product, and one for all products). This feature is revealed by hovering over the "Top Contributors" button on the list page on the right side:

screen shot 2018-10-17 at 11 55 38

Due to the fact that this feature is hidden-by-default and is causing downtime and some significant developer resources we propose the following:

  1. Disable this feature for now. This is the quickest fix for the current issue and should not significantly impact the usability of the pages.
  2. Discuss this feature and decide whether to bring it back.
  3. If we would like to bring it back, discuss the best UI for it (perhaps it should be visible by default) and code it so that the result is always cached and pre-calculated using a daily cron job.

jpetto added a commit to jpetto/kitsune that referenced this issue Oct 17, 2018
…ons pages.

- Also remove top contributors list from mobile view.
@pmac pmac closed this as completed in f34e62a Oct 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants