Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: check web_contents() for destroyed WebContents #27815

Merged
merged 1 commit into from
Mar 2, 2021

Conversation

zcbenz
Copy link
Member

@zcbenz zcbenz commented Feb 19, 2021

Description of Change

When Electron is quitting, it may happen that the content::WebContents is destroyed while electron::api::WebContents is still alive. This was causing a few CI flakes.

Checklist

Release Notes

Notes: none

@zcbenz zcbenz added semver/patch backwards-compatible bug fixes target/10-x-y labels Feb 19, 2021
@electron-cation electron-cation bot added the new-pr 🌱 PR opened in the last 24 hours label Feb 19, 2021
Copy link
Member

@nornagon nornagon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like this must be missing situations where we expect web_contents() to be non-null. There are 100+ references to web_contents() in this file alone.

Is there some way we could be more safe here?

@electron-cation electron-cation bot removed the new-pr 🌱 PR opened in the last 24 hours label Feb 20, 2021
@zcbenz
Copy link
Member Author

zcbenz commented Feb 22, 2021

it seems like this must be missing situations where we expect web_contents() to be non-null. There are 100+ references to web_contents() in this file alone.

Is there some way we could be more safe here?

The content::WebContents is only destroyed when the electron::api::WebContents is being destroyed, so for most cases web_contents() can not be null.

The only exception is when user decides to destroy the WebContents during an event, and the electron::api::WebContents continues to use web_contents() immediately after emitting the event, only in this case do we need to check if the WebContents has been destroyed. Fortunately there are only a few cases and we are already using the weak_this hack to do the check, this PR adds another check for a corner case on exit.

Ideally we should refactor the electron::api::WebContents class to eliminate the corner cases of lifetime at all, but I don't have a clear plan for now.

Copy link
Member

@nornagon nornagon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... this change makes web_contents() return nullptr before the api::WebContents is deleted, and there's an unspecified amount of time in between when web_contents() starts returning nullptr and when the api::WebContents is properly deleted. However, we have already called MarkDestroyed() so any entry point from JS will hit the IsDestroyed() check. Further, since we're no longer observing the WebContents, no WebContentsObserver methods will be called. That leaves only WebContentsDelegate methods which can be newly be called in this state when web_contents() can return null.

There are many many such methods and I suspect that we will introduce new bugs in future where we implement WebContentsDelegate methods expecting that web_contents() will be non-null, but it isn't always. To wit, here are a few that reference web_contents() that haven't been addressed in this PR:

  • WebContents::HandleContextMenu
  • WebContents::CloseContents
  • WebContents::OnGoToEntryOffset

I think this PR improves things a bit, but also sheds light on a bunch of problems that are already here.

@@ -1411,7 +1411,7 @@ void WebContents::RenderProcessGone(base::TerminationStatus status) {
Emit("crashed", status == base::TERMINATION_STATUS_PROCESS_WAS_KILLED);

// User might destroy WebContents in the crashed event.
if (!weak_this)
if (!weak_this || !web_contents())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this ever be called when web_contents() is nullptr? RenderProcessGone is a WebContentsObserver method, so if we've called Observe(nullptr) we shouldn't ever get an observer method call. If Emit("crashed") destroyed the WebContents, then the weak_this check will be enough to guard this.

@@ -1482,7 +1482,7 @@ void WebContents::DidFinishLoad(content::RenderFrameHost* render_frame_host,
// ⚠️WARNING!⚠️
// Emit() triggers JS which can call destroy() on |this|. It's not safe to
// assume that |this| points to valid memory at this point.
if (is_main_frame && weak_this)
if (is_main_frame && weak_this && web_contents())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too.

@zcbenz
Copy link
Member Author

zcbenz commented Feb 23, 2021

this change makes web_contents() return nullptr before the api::WebContents is deleted

The WebContentsDestroyed is called in the content::WebContents's destructor with lots of things already getting cleared, so by the time this method is called we should not be accessing web_contents() anymore. Making following web_contents() calls crash with null pointer error would be better leaving them in an undecided state that may or may not crash.

https://source.chromium.org/chromium/chromium/src/+/master:content/browser/web_contents/web_contents_impl.cc;l=1004;drc=5d4bf842d53c6c47f6c3656cea4a1d2964ac60d5;bpv=0;bpt=1

Can this ever be called when web_contents() is nullptr? RenderProcessGone is a WebContentsObserver method, so if we've called Observe(nullptr) we shouldn't ever get an observer method call. If Emit("crashed") destroyed the WebContents, then the weak_this check will be enough to guard this.

For most cases no, but there is an edge case for app.quit, as you can see in the test in this PR.

The call stack is something like this:

RenderProcessGone()
  => Emit("crashed")
    => Browser::Quit()
      => NativeWindow::Close()
        => BrowserWindow::OnWindowClosed()
          => api_web_contents_->DestroyWebContents(!Browser::Get()->is_shutting_down())
  => content::WebContents destroyed now

The content::WebContents is cleared by the DestroyWebContents method, usually it is destroyed asynchronously to avoid having null web_contents() when api::WebContents is alive, but when app is quitting it is destroyed synchronously and there is a short time when web_contents() is null while api::WebContents is not destroyed.

@zcbenz
Copy link
Member Author

zcbenz commented Feb 23, 2021

Hmmm after writing this down, I realized that the root cause for this crash, is that there is an edge case causing content::WebContents to be destroyed before api::WebContents, and we should fix the edge case instead of working around it.

I'll create another PR to cleanup the shutdown routine of WebContents to eliminate this kind of edge cases. But it will be a relatively large refactoring to core shutdown code, and I'm not confident backporting it would not cause other crashes in the stable branches. So I'm in favor of using this PR as a temporary fix to make CI happy, and I'll fix the root cause in the master branch.

@zcbenz
Copy link
Member Author

zcbenz commented Mar 2, 2021

@nornagon I have created #27920 to fix the root cause of the crash, by eliminating the cases that web_contents() could be null before api::WebContents is destroyed.

My plan is:

  1. Merge this PR and backport it to older branches. (This will fix the persistent crashReporter CI failure on 12).
  2. Revert this change in refactor: cleanup how WebContents is destroyed #27920, which would fix the root cause instead of working around the crash in this PR.
  3. Merge refactor: cleanup how WebContents is destroyed #27920 to master without backporting it, since it is a refactoring to shutdown code that may cause new crashes, or expose existing crashes that were hidden by the nondeterministic shutdown order.

Are you OK with it?

@nornagon
Copy link
Member

nornagon commented Mar 2, 2021

Hm, alright. Seems better than nothing probably. And any new crashes caused by this PR are ... probably rare?

@codebytere codebytere merged commit ede8611 into master Mar 2, 2021
@release-clerk
Copy link

release-clerk bot commented Mar 2, 2021

No Release Notes

@trop
Copy link
Contributor

trop bot commented Mar 2, 2021

I have automatically backported this PR to "11-x-y", please check out #27965

@trop
Copy link
Contributor

trop bot commented Mar 2, 2021

I have automatically backported this PR to "12-x-y", please check out #27966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/patch backwards-compatible bug fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants