Bug 1190171 - Revision content denormalization and other cleanups of the compare revision view. by jezdez · Pull Request #3386 · mdn/kuma

jezdez · 2015-08-04T19:22:32Z

Adds ratelimiting to the compare revision view.
Refactors the signal handling to use new patterns and less confusing code.
Prefetch revision tags and document for fewer queries.
Denormalize the revision content in a separate field that is filled on save and uses tidy to do the CPU-intense work once instead of on every request.

jezdez · 2015-08-04T19:24:22Z

So this does a bunch of things, which we don't have to do all at once, but I wanted to offer it anyway since I think we shouldn't run tidy on every request of that view but keep the tidied revision content on file instead.

The other changes are probably less troubling.

groovecoder · 2015-08-04T20:41:48Z

kuma/wiki/models.py

Ah, this is a good way to preserve the submitted data (in the content field) and store the tidied content. But should we populate the tidied_content field when it's accessed via get_tidied_content() or should we just populate it when the Revision is saved?

It's populated both, see https://github.com/mozilla/kuma/pull/3386/files#diff-a103da980103930e43dab15a81a2a4f5R128

The idea is to not have to go through all the hundred of thousands revisions at once but spread it a bit. Updating the tidied content on access is only useful for old revisions, the post_save hook is for all new revisions.

Doh, sorry I over-looked that; I jumped straight to this models.py diff to see what you did here.

groovecoder · 2015-08-05T21:48:22Z

❤️ using Django 1.7 app system.

groovecoder · 2015-08-06T14:26:05Z

Needs a rebase now.

This makes use of Django's new app loading classes that are guarenteed to connect the signals at startup and is the recommended way of doing this. This also moves the search index config into the wiki app.

…nce per second.

That reduces the number of queries being done when rendering the view.

…INSTALLED_APPS items.

groovecoder · 2015-08-06T20:19:23Z

Intern tests pass. Spot-check on get_tidied_content() from a django shell works as expected - empty at first, triggers a celery task, then it's populated next time. Saving a new revision populates tidied_content before get_tidied_content is called.

groovecoder · 2015-08-06T20:20:13Z

kuma/wiki/tasks.py

This is still 60/m?

jezdez · 2015-08-07T14:58:10Z

I actually decided to increase the rate limit of the tidy_revision_content Celery task to 120/m. The reasoning is simple: we have 550k revisions right now, with a rate limit of 15/m we'd wait 12 days for even running the tasks. On the other hand, MySQL is currently able to handle UPDATE queries very fast and hasn't hit performance ceilings (not even close) with that when I looked at the last week's stats. We also are able to have the Celery nodes run through the CPU intense tidying easier given we can spread it to their collective CPUs, in other words, they aren't fully utilized right now as well.

That said I agree that there is a big risk with scheduling Celery tasks on every call to get_tidied_content since if assuming we create a big stack of tasks by normal site browsing it's guarenteed that we'll create duplicates of the tasks. That's why I've added a simple cache based check to the get_tidied_content function that prevents duplicate scheduling. It's set to timeout after 3 days, after which a call to get_tidied_content would again schedule a task. The calculation for that timeout is: (546235 / 120) / 60 / 24 = 3.1 days -- the number of days it takes in theory to run through all revisions.

This will be filled on post_save and via a Celery task and occasionally for older revisions on demand (also via Celery). The task is rate limited in a way to be able to get through the ~550k revisions rather quickly.

groovecoder · 2015-08-10T16:05:56Z

Note: It's set to 120/m, which I assume is correct. (Not 120/s written in the comment. 😉)

jezdez · 2015-08-10T16:21:10Z

@groovecoder woops, good catch!

Edit: Fixed.

Bug 1190171 - Revision content denormalization and other cleanups of the compare revision view.

jezdez assigned groovecoder Aug 4, 2015

jezdez force-pushed the bug1190171 branch from 93a3a55 to 533f126 Compare August 4, 2015 19:53

groovecoder reviewed Aug 4, 2015
View reviewed changes

jezdez added the not ready label Aug 6, 2015

jezdez force-pushed the bug1190171 branch 2 times, most recently from 880d013 to ecd578f Compare August 6, 2015 07:07

jezdez removed the not ready label Aug 6, 2015

groovecoder assigned jezdez and unassigned groovecoder Aug 6, 2015

jezdez added 5 commits August 6, 2015 20:41

Bug 1190171 - Refactor wiki signal handling.

0ff9858

This makes use of Django's new app loading classes that are guarenteed to connect the signals at startup and is the recommended way of doing this. This also moves the search index config into the wiki app.

Bug 1190171 - Ratelimit the compare revisions view by user or IP to o…

321e7d0

…nce per second.

Bug 1190171 - Prefetch document and tags in compare revisions view.

032e743

That reduces the number of queries being done when rendering the view.

Bug 1190171 - Cleanup of wiki helpers.

e588c67

Bug 1190171 - Work around a bug in Jingo that can't handle new style …

6fe6b46

…INSTALLED_APPS items.

jezdez force-pushed the bug1190171 branch from ecd578f to f83fedf Compare August 6, 2015 18:42

groovecoder reviewed Aug 6, 2015
View reviewed changes

kuma/wiki/tasks.py Outdated

Copy link

Contributor

groovecoder Aug 6, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still 60/m?

jezdez added the not ready label Aug 7, 2015

jezdez force-pushed the bug1190171 branch from f83fedf to e3fcac2 Compare August 7, 2015 14:47

jezdez force-pushed the bug1190171 branch from e3fcac2 to c3e7abb Compare August 7, 2015 14:58

Bug 1190171 - Add tidied_content field to Revision model.

5afacc6

This will be filled on post_save and via a Celery task and occasionally for older revisions on demand (also via Celery). The task is rate limited in a way to be able to get through the ~550k revisions rather quickly.

jezdez force-pushed the bug1190171 branch from c3e7abb to 5afacc6 Compare August 7, 2015 15:02

jezdez removed the not ready label Aug 7, 2015

jezdez assigned groovecoder and unassigned jezdez Aug 7, 2015

groovecoder added a commit that referenced this pull request Aug 10, 2015

Merge pull request #3386 from mozilla/bug1190171

9120a26

Bug 1190171 - Revision content denormalization and other cleanups of the compare revision view.

groovecoder merged commit 9120a26 into master Aug 10, 2015

groovecoder mentioned this pull request Aug 10, 2015

Bug 1180208 - Post user profile cleanup. #3390

Merged

jezdez deleted the bug1190171 branch August 11, 2015 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1190171 - Revision content denormalization and other cleanups of the compare revision view.#3386

Bug 1190171 - Revision content denormalization and other cleanups of the compare revision view.#3386
groovecoder merged 6 commits intomasterfrom
bug1190171

jezdez commented Aug 4, 2015

Uh oh!

jezdez commented Aug 4, 2015

Uh oh!

groovecoder Aug 4, 2015

Uh oh!

jezdez Aug 4, 2015

Uh oh!

jezdez Aug 4, 2015

Uh oh!

groovecoder Aug 4, 2015

Uh oh!

groovecoder commented Aug 5, 2015

Uh oh!

groovecoder commented Aug 6, 2015

Uh oh!

groovecoder commented Aug 6, 2015

Uh oh!

groovecoder Aug 6, 2015

Uh oh!

jezdez commented Aug 7, 2015

Uh oh!

groovecoder commented Aug 10, 2015

Uh oh!

jezdez commented Aug 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jezdez commented Aug 4, 2015

Uh oh!

jezdez commented Aug 4, 2015

Uh oh!

groovecoder Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

jezdez Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

jezdez Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

groovecoder Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

groovecoder commented Aug 5, 2015

Uh oh!

groovecoder commented Aug 6, 2015

Uh oh!

groovecoder commented Aug 6, 2015

Uh oh!

groovecoder Aug 6, 2015

Choose a reason for hiding this comment

Uh oh!

jezdez commented Aug 7, 2015

Uh oh!

groovecoder commented Aug 10, 2015

Uh oh!

jezdez commented Aug 10, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants