Skip to content
This repository has been archived by the owner. It is now read-only.

Run talos tests with Activity Stream enabled/disabled #2006

Closed
Mardak opened this issue Jan 18, 2017 · 15 comments
Closed

Run talos tests with Activity Stream enabled/disabled #2006

Mardak opened this issue Jan 18, 2017 · 15 comments

Comments

@Mardak
Copy link
Member

@Mardak Mardak commented Jan 18, 2017

Split from #1977. We'll want to do some runs to see how Firefox performance is impacted both enabled and disabled to get a sense of what issues we'll need to fix/optimize or defer.

http://trychooser.pub.build.mozilla.org/

@sarracini
Copy link
Contributor

@sarracini sarracini commented Feb 8, 2017

How to make sense of Talos: https://wiki.mozilla.org/Buildbot/Talos/Tests

@sarracini sarracini moved this from Unassigned to Assigned in Land in Nightly / Graduate Feb 8, 2017
@Mardak
Copy link
Member Author

@Mardak Mardak commented Feb 13, 2017

Tracking some issues from running talos + mochitests:
https://public.etherpad-mozilla.org/p/as-mc-failures

@Mardak
Copy link
Member Author

@Mardak Mardak commented Feb 15, 2017

Below is a table of talos results (osx-10-10 opt) with the activity stream add-on mostly disabled (returns early from main but sdk still loaded and required some stuff) vs enabled. There's some noise as these are just single runs, but some of the deltas (between disabled/enabled) are quite significant that should be focused on assuming we treat disabled as baseline to somewhat isolate regressions from the SDK.

description mostly disabled enabled delta
try commit b0ed23505e69 d9d5a3f33d9b
perfherder link 21 important changes 22 important changes
tsvg_static regress 17.86% regress 56.51% 38.65%
tsvg_static e10s regress 4.81% regress 25.13% 20.32%
tsvgr_opacity improve -4.27% regress 9.85% 14.12%
tp5o regress 7.17% regress 19.02% 11.85%
tabpaint regress 15.45% regress 25.99% 10.54%
tp5o e10s regress 3.01% regress 11.41% 8.40%
tsvgx regress 4.33% regress 12.59% 8.26%
tart regress 4.26% regress 9.49% 5.23%
tsvgr_opacity e10s regress 2.55% regress 6.10% 3.55%
tpaint regress 6.66% regress 9.45% 2.79%
tp5o Main_RSS regress 3.57% regress 5.92% 2.35%
tart e10s regress 14.83% regress 16.57% 1.74%
damp regress 8.14% regress 9.49% 1.35%
cart regress 2.86% regress 4.19% 1.33%
ts_paint regress 5.87% regress 6.95% 1.08%
sessionrestore e10s regress 2.70% regress 3.68% 0.98%
tpaint e10s regress 13.62% regress 14.05% 0.43%
ts_paint e10s regress 5.89% regress 6.30% 0.41%
dromaeo_css regress -2.09% regress -2.23% 0.14%
tps regress 12.07% regress 11.88% -0.19%
sessionrestore_no_auto_restore regress 2.10% regress 1.43% -0.67%
tp5o_scroll e10s improve -3.00% improve -2.33% -0.67%
sessionrestore_no_auto_restore e10s regress 4.63% regress 2.76% -1.87%
damp e10s regress 11.82% regress 8.75% -3.07%
tps e10s regress 4.95% regress 0.30% -4.65%

There's a few svg related regressions, and perhaps activity stream is doing more work than necessary processing the page for metadata? A bunch of tests are loading pages in particular tp5o for top sites page loading, so we might want to look into ways to delay work a bit after load? And there's tpaint/tart/tabpaint related tests of opening a new window or tab that are affected by our lack of preloading.

@jmaher / @mikeconley anything else we should focus on first (probably in parallel with #1999 removing SDK)?

@jmaher
Copy link

@jmaher jmaher commented Feb 15, 2017

I did many retriggers on the base and try pushes, I believe what you have listed is pretty accurate, you can always revisit the perfherder compare links to see the updated data :)

My familiarity with activity stream and the work here is close to nil, but that doesn't mean it should remain there. Since almost all tests (not graphics related ones) are showing a regression, that is cause for concern- possibly we just focus on tpaint (as that might help the startup tests in general), tabpaint, and svgx- either way all need some love.

There is --spsProfiling available on try, that is useful to see where time is spent- delaying initialization/loading/work is helpful, and if there are changes to talos (preferences, delaying load, more initialization on the first load), we should consider that as well.

@mikeconley
Copy link
Contributor

@mikeconley mikeconley commented Feb 16, 2017

I ran the worst offender on this list (tsvg_static, e10s disabled) locally, with the Activity Stream code both applied and unapplied.

I was able to reproduce the regression, which is good. After I applied the fix for bug 1340267, I was able to get some profiles from the test runs.

I arbitrarily picked the "gearflowers" subtest from tsvg_static. Here are the profiles:

With Activity Streams activated

Without Activity Streams

I'll now attempt an analysis. Gimme a few moments.

@mikeconley
Copy link
Contributor

@mikeconley mikeconley commented Feb 16, 2017

So the regression in here doesn't seem to be called by any one thing, but I definitely see Activity Streams taking up time doing a few things, spread out over the test run

  1. It's spending time attempting to parse the .SVG and extra metadata out of it. The SVG file, gearflowers.svg, is intentionally complex, and is a large document. I would imagine your meta-data getting stuff is getting bogged down by that.

  2. In Talos, the browser starts up with a profile that has very little history. Ursula has told me that as frecency changes, Activity Streams will aggressively query Places to determine how best to populate the Activity Streams page. This appears to be happening during this test, as we're visiting this SVG page for the first time in a profile mostly devoid of history.

I would be very interesting to see what the delta tends to be between this time and this time. I may be missing something, but it looks as if the meta data parser is operating in the main thread. If that's the case... I think you're going to have a really bad time there, especially for large documents. Even if the parser is running in the parent process (which it might be? I think I see it being messaged up), then sending up the string representation of a large document is still going to come with a cost.

@mikeconley
Copy link
Contributor

@mikeconley mikeconley commented Feb 16, 2017

@sarracini mentioned that there's some work going on to add a metadata parsing capability to Places that could be used instead. Do we know what that bug is?

@Mardak
Copy link
Member Author

@Mardak Mardak commented Feb 16, 2017

@mikeconley
Copy link
Contributor

@mikeconley mikeconley commented Feb 16, 2017

I see. Is the parser going to be running on the main thread?

@mikeconley
Copy link
Contributor

@mikeconley mikeconley commented Feb 16, 2017

Perhaps I should direct that at @jaredkerim

@Mardak
Copy link
Member Author

@Mardak Mardak commented Feb 22, 2017

I pushed a try version that pointed package.json main option to an empty.js:

try: https://hg.mozilla.org/try/rev/5442a42878e4384952f7871a123d56a8d6baa6b5#l4.12

perfherder: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=25a94c1047e7&newProject=try&newRevision=5442a42878e4384952f7871a123d56a8d6baa6b5&framework=1

This means the add-on is basically just running the SDK's bootstrap.js, processing package.json, and loading empty.js. Below are the perfherder "important" regressions for osx (with 7 runs each):

test regression
ts_paint 7.00%
ts_paint e10s 6.14%
damp 5.49%
tabpaint 5.02%
sessionrestore e10s 4.87%
sessionrestore_no_auto_restore e10s 3.25%
sessionrestore_no_auto_restore 2.99%
sessionrestore 2.66%
tsvgr_opacity e10s 2.46%

So it looks like the earlier attempt to return early in main.js still had some additional regressions probably from the top-level requires that additionally loaded more files, e.g., TabTracker -> sdk/tabs, that might have contributed to the big tsvg_static regressions. At least it seems like getting rid of the SDK bootstrap.js/loader will get rid of the ts_paint and some other regressions… hopefully…

@jaredlockhart
Copy link
Contributor

@jaredlockhart jaredlockhart commented Feb 23, 2017

@Mardak Please run a TALOS run with a build of activity stream with all features turned on that includes this commit on top:

JaredLockhart-Mozilla@a445b3d

This will move metadata parsing out of the main thread and into the content threads, as per feedback from multiple Firefox developers. Let's see if and how this affects our TALOS performance. Please post the results in here so we can compare them in Perfherder.

@Mardak
Copy link
Member Author

@Mardak Mardak commented Feb 24, 2017

@jaredkerim Here's the results relative to activity stream enabled:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d9d5a3f33d9b&newProject=try&newRevision=5e7eca743493

And relative to mozilla-central:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=25a94c1047e7&newProject=try&newRevision=5e7eca743493

Plain activity-stream enabled has tsvg_static 56% regression while the additional commit has 62% regression.

@jaredlockhart
Copy link
Contributor

@jaredlockhart jaredlockhart commented Feb 24, 2017

@Mardak Thanks so much! I expected that this would increase the regression, I think that TALOS is optimized to monitor content process heavy workloads, not necessarily main process workloads. I think we will need to develop a TALOS test which is specifically designed to monitor the impact of activity stream.

Can you please run two more for me if you have a chance:

  1. Disable all feed processing in the main thread JaredLockhart-Mozilla@92f1bc3

  2. Disable the local metadata parser JaredLockhart-Mozilla@14feb1b

I want to know which one of those makes a bigger impact on performance. If it's true that TALOS is weighted more heavily to towards content space processing, then #2 should show a lower regression than #1 I suspect.

Thanks!

@Mardak
Copy link
Member Author

@Mardak Mardak commented Feb 25, 2017

Here's results for talos-no-feeds:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c7759c975fa3155bb78302e908261a710417849
Relative to enabled: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d9d5a3f33d9b&newProject=try&newRevision=0c7759c975fa3155bb78302e908261a710417849&framework=1&filter=osx

osx test improvement
tsvg_static 17.24%
tsvg_static e10s 13.94%
tp5o 5.75%
tp5o e10s 5.34%
tsvgx 4.83%
tsvgr_opacity 2.29%

And similarly for talos-no-framescript:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5efbfcc6b2869d5854a25afc1a309866573f5ee1
Relative to enabled: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=d9d5a3f33d9b&newProject=try&newRevision=5efbfcc6b2869d5854a25afc1a309866573f5ee1&framework=1&filter=osx

osx test improvement
tabpaint 7.68%
tart 7.08%
tp5o e10s 2.94%
tsvgr_opacity 2.25%

There's various changes only on linux64, e.g., a11yr 2.22% improvement yet tp5o Private Bytes e10s 4.73% regression. You can unfilter from the perfherder links.

@Mardak Mardak closed this Feb 25, 2017
@sarracini sarracini moved this from Assigned to Complete in Land in Nightly / Graduate Feb 27, 2017
@sarracini sarracini moved this from Active to Completed in MVP Included Feb 27, 2017
@k88hudson k88hudson mentioned this issue Feb 27, 2017
2 of 2 tasks complete
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants