Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perma-orange: browser_blockHPKP.js leaked about:newtab #2829

Closed
sarracini opened this issue Jul 7, 2017 · 15 comments

Comments

Projects
5 participants
@sarracini
Copy link
Contributor

commented Jul 7, 2017

For debug builds, which run leak checks:
LOG ERROR | TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_blockHPKP.js | leaked 2 window(s) until shutdown [url = about:newtab]

Linux: https://treeherder.mozilla.org/logviewer.html#?job_id=112104353&repo=pine&lineNumber=15758
Linux64: https://treeherder.mozilla.org/logviewer.html#?job_id=112105477&repo=pine&lineNumber=16160
OSX: https://treeherder.mozilla.org/logviewer.html#?job_id=112103112&repo=pine&lineNumber=16156

@dmose

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

Interestingly, this has occurred in m-c before: https://bugzilla.mozilla.org/show_bug.cgi?id=1281839.

@Mardak Mardak changed the title Intermittent failure browser_blockHPKP.js in Linux debug builds Intermittent failure browser_blockHPKP.js on debug builds Jul 7, 2017

@Mardak

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

@dmose I don't think that bug is related. It's an intermittent test failure for some test check vs here there's a leak.

@Mardak

This comment has been minimized.

Copy link
Member

commented Jul 7, 2017

This should be reproducible with this command: ./mach mochitest browser_blockHPKP.js --repeat=2

Note that it'll say the test Passed at the bottom, but if you scroll up a little bit, there'll be multiple lines of ERROR TEST-UNEXPECTED-FAILs

You can do debug artifact builds with this .mozconfig:

export MOZ_DEBUG="1"
ac_add_options --enable-debug 
ac_add_options --enable-artifact-builds

@Mardak Mardak changed the title Intermittent failure browser_blockHPKP.js on debug builds Intermittent failure browser_blockHPKP.js leaked about:newtab Jul 7, 2017

@Mardak Mardak added this to Debug in Pref-on Test Failures Jul 7, 2017

@Mardak Mardak added this to Debug in Pref-on Test Failures Jul 7, 2017

@Mardak Mardak added this to Debug in Pref-on Test Failures Jul 7, 2017

@Mardak Mardak added this to Debug in Pref-on Test Failures Jul 7, 2017

@Mardak Mardak added this to Debug in Pref-on Test Failures Jul 7, 2017

@Mardak Mardak changed the title Intermittent failure browser_blockHPKP.js leaked about:newtab Perma-orange: browser_blockHPKP.js leaked about:newtab Jul 11, 2017

@tspurway tspurway added the Critical label Jul 12, 2017

@tspurway tspurway self-assigned this Jul 12, 2017

@sarracini

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2017

After turning multiple things on an off in order to pinpoint what is causing this failure, here's what I found (with running the test locally with Ed's instructions):

  1. With Activity Stream enabled, running in the content process, and preloading about:newtab enabled, we leak 3 windows until shutdown with the url = about:newtab

  2. With Activity Stream enabled, running in the content process, and preloading about:newtab disabled, we leak 1 window until shutdown with the url = about:newtab

  3. With Activity Stream being enabled however it is a no-op system addon (as in all the code that initializes Activity Stream is deleted and system addon does nothing), running in the content process, an preloading about:newtab enabled, we leak 3 windows until shutdown with the url = about:newtab

  4. With Activity Stream enabled, running in the main process, and preloading about:newtab enabled, we do not leak any windows.

Other notes that were found:

  • disabling the pre-allocated content process manager makes no difference on this test
  • there is nothing really special about this test that would indicate that Activity Stream would behave differently here
  • the garbage collection and cycle collection logs that I collected with mconley didn't show anything interesting
  • the preloaded browser is indeed getting destroyed on shutdown (http://searchfox.org/mozilla-central/source/testing/mochitest/browser-test.js#612-617)
  • the activity stream page is indeed getting unloaded on shutdown for every time it get loaded

Normally a "leak until shutdown" means that the window goes away before we shutdown Firefox, but it doesn't go away after the test is over (and we garbage/cycle collect). This is usually caused by forgetting to unregister an observer or a listener. So I searched for things that could have been forgotten, in for example aboutNewtabService.js or AboutNewTab.jsm, and those were all dead ends.

The only thing that fixes the test is to run the new tab page in the main process. Which makes me believe that this test failure is exposing a bigger problem with these tests possibly not knowing how to handle content process shutdown for a preloaded browser, or something to that effect, however running it in the main process is no longer an option.

@sarracini

This comment has been minimized.

Copy link
Contributor Author

commented Jul 14, 2017

I tried to get some more cycle collection logs but they were pretty unhelpful, so I disabled this test and it did what I thought it would do, and just moved the leak to some other test in the suite. I tried commenting out this entire test and had it just open and close a newtab and it still leaks 3 windows until shutdown with url = about:newtab. I preff'ed off Activity Stream for that test and it worked, but have yet to run the entire suite - it's possible its still leaking other windows in other tests, but that may be the only thing left to do here if we don't move it back into the main process

@sarracini

This comment has been minimized.

Copy link
Contributor Author

commented Jul 14, 2017

Sadly, doing something like:

   Services.prefs.setBoolPref(khpkpPinninEnablePref, true);
   registerCleanupFunction(function() {
     Services.prefs.clearUserPref(kpkpEnforcementPref);
     Services.prefs.clearUserPref(khpkpPinninEnablePref);
     let uri = gIOService.newURI("https://" + kPinningDomain);
     gSSService.removeState(Ci.nsISiteSecurityService.HEADER_HPKP, uri, 0);
   });
-  whenNewTabLoaded(window, loadPinningPage);
+
+  // turn Activity Stream off with this test since they don't play nice together
+  // when the newtab page is being loaded in content
+  SpecialPowser.pushPrefEnv({
+      set: [["browser.newtabpage.activity-stream.enabled", false]]
+  }, whenNewTabLoaded(window, loadPinningPage));
 }

in the test alone also just causes the leak to go to some other test in the file - consistently the failure would look like:

ERROR TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_PageMetaData_pushstate.js | leaked 2 window(s) until shutdown [url = about:newtab]

Soooooo, what do we do now? @dmose @Mardak

@sarracini

This comment has been minimized.

Copy link
Contributor Author

commented Jul 14, 2017

@ncloudioj and I are going to use valgrind to debug more and track down this leak 😈

@dmose

This comment has been minimized.

Copy link
Member

commented Jul 14, 2017

Nice!

@Mardak

This comment has been minimized.

Copy link
Member

commented Jul 15, 2017

@Standard8 any ideas on how your fix for bug 1380570 made this leak go away? Thanks for fixing it as we were quite stumped on how to fix this issue other than making activity-stream about:newtab run in main process instead of child.

I bisected the new behavior to Bug 1380570 - browser_bookmark_titles.js fails with async Places Transactions turned on
https://bugzilla.mozilla.org/show_bug.cgi?id=1380570
Specifically https://hg.mozilla.org/mozilla-central/rev/e664a64cb729

I bisected with ./mach mochitest --setpref browser.newtabpage.activity-stream.enabled=true browser/base/content/test/general/browser_{blockHPKP,bookmark_titles}.js | grep UNEXPECTED-FAIL which consistently found a failure before the commit above.

Those aren't completely unrelated as they run almost one after the other:

browser/base/content/test/general/browser_blockHPKP.js
browser/base/content/test/general/browser_bookmark_popup.js
browser/base/content/test/general/browser_bookmark_titles.js

FYI, blockHPKP wouldn't leak by itself or if just with the tests before it. Only if it included bookmark_titles.

@Mardak Mardak closed this Jul 15, 2017

@Mardak Mardak moved this from Debug to Done in Pref-on Test Failures Jul 15, 2017

@Standard8

This comment has been minimized.

Copy link
Member

commented Jul 17, 2017

@Standard8 any ideas on how your fix for bug 1380570 made this leak go away? Thanks for fixing it as we were quite stumped on how to fix this issue other than making activity-stream about:newtab run in main process instead of child.

I bisected the new behavior to Bug 1380570 - browser_bookmark_titles.js fails with async Places Transactions turned on
https://bugzilla.mozilla.org/show_bug.cgi?id=1380570
Specifically https://hg.mozilla.org/mozilla-central/rev/e664a64cb729

@Mardak If it is that changeset, then my best guess would be due to the fact that I removed the browser.stop() call that stopped the initial load on a new tab - it didn't seem necessary for the test.

So my only thought is, are your tests updates causing the activity stream page to be loaded on a new tab? (I'm not 100% sure, but I think for most tests, they should be loading about:blank by default, not about:newtab).

Additionally, if activity stream is inserting itself into about:newtab, could it be the case that cancelling the load of about:newtab is causing it to leak?

@Mardak

This comment has been minimized.

Copy link
Member

commented Jul 17, 2017

Thanks for the suggestion. Turns out it wasn't the stop, so I started applying your patch line by line, and this is the line that avoids the leak:

-  PlacesUtils.bookmarks.removeItem(id);
+  await PlacesUtils.bookmarks.remove(await PlacesUtils.bookmarks.fetch({url: uri}));

I tested with keeping the original removeItem and just waiting longer than the ~100ms that fetch+remove was taking: await new Promise(resolve => setTimeout(resolve, 1000)); but that didn't help.

Strange stuff…

@Mardak

This comment has been minimized.

Copy link
Member

commented Jul 19, 2017

Oh. ha. ha. ha… The reason why the above diff fixes things is that it switches from the synchronous removeItem to an async fetch then async remove. Before @Standard8's change, activity stream is trying to access the bookmark that is immediately being removed, and activity stream throws an exception because the just added bookmark doesn't exist.

Why that exception results in a leak... not sure yet.

@Mardak

This comment has been minimized.

Copy link
Member

commented Jul 20, 2017

@k88hudson any ideas why having dispatch throw a plain exception on the content side might cause leaks? Note that eating the exception on the content side fixes this particular leak, but eating the exception on the main/jsm side doesn't prevent the leak.

diff --git a/browser/extensions/activity-stream/vendor/redux.js b/browser/extensions/activity-stream/vendor/redux.js
--- a/browser/extensions/activity-stream/vendor/redux.js
+++ b/browser/extensions/activity-stream/vendor/redux.js
@@ -330,6 +330,8 @@ function dispatch(action) {
            try {
              isDispatching = true;
              currentState = currentReducer(currentState, action);
+           } catch (e) {
+             throw e;
            } finally {
              isDispatching = false;
            }

This leaks and prints:

17 INFO Console message: [JavaScript Error: "action.data is null" {file: "resource://activity-stream/data/content/activity-stream.bundle.js" line: 705}]
61 ERROR TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_blockHPKP.js | leaked 2 window(s) until shutdown [url = about:newtab]

Whereas throwing a string (throw e + ""; or just throw "hello world";) doesn't leak and prints:

17 INFO Console message: [JavaScript Error: "TypeError: action.data is null" {file: "resource://gre/modules/RemotePageManager.jsm" line: 37}]

But if I throw an Error of that string, i.e., throw Error(e + "");, that leaks again printing:

17 INFO Console message: [JavaScript Error: "TypeError: action.data is null" {file: "resource://activity-stream/vendor/redux.js" line: 334}]
61 ERROR TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_blockHPKP.js | leaked 2 window(s) until shutdown [url = about:newtab]

The main difference is the Error objects has a context in the content space, and I suppose the Error object has references to other things from that scope. Although I did try throwing objects and functions, and there were no leaks.

When throwing the string, the line points to https://searchfox.org/mozilla-central/source/toolkit/modules/RemotePageManager.jsm#37 which blames @mikeconley. ;) But it's just a Cu.reportError(e);

@Mardak

This comment has been minimized.

Copy link
Member

commented Jul 20, 2017

https://searchfox.org/mozilla-central/source/toolkit/modules/RemotePageManager.jsm#37 which blames @mikeconley. ;) But it's just a Cu.reportError(e);

WELL… on that note. I tried removing that reportError. And tada! no leak (for this HPKP issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.