Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ScriptMemoryFailsafe use as_unsafe_cell #4692

Closed
jdm opened this issue Jan 20, 2015 · 36 comments
Closed

Make ScriptMemoryFailsafe use as_unsafe_cell #4692

jdm opened this issue Jan 20, 2015 · 36 comments

Comments

@jdm
Copy link
Member

@jdm jdm commented Jan 20, 2015

The ScriptMemoryFailsafe is supposed to be able to reclaim all JS-related memory in the event that a script task panics. This usually doesn't work, since it tries to borrow RefCell values that are often already borrowed when panics occur, so we end up aborting Servo. We can avoid this by calling as_unsafe_cell instead of borrow_mut in ScriptMemoryFailsafe's destructor; this is safe since we know that no other Rust code will be interacting with those values.

http://doc.servo.org/core/cell/struct.RefCell.html#method.as_unsafe_cell

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 25, 2015

@jdm Just trying to find an easy bug for my first servo fix. Not sure I understand this one. Should we call as_unsafe_cell instead of borrow_mut in the line below?
https://github.com/servo/servo/blob/master/components/script/script_task.rs#L249
How can I reproduce the original crash to verify that my patch fixes the issue?
Thanks! Looking forward to submitting my first PR!

@jdm
Copy link
Member Author

@jdm jdm commented Jan 25, 2015

Replacing both of the borrow_mut calls in that function is the intent here. As for reproducing the crash, I suggest adding a panic!() to the implementation of Console::Log and running an HTML page that contains a console.log calls; try running this in a build without your changes and take a look at the output - I'm pretty sure the output of a build with your changes should show different output that doesn't include anything about attempting to borrow values that are already borrowed.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 26, 2015

@jdm Thanks for your help. Still learning the ropes. This is my PR that just adds a panic! for console.log

#4721

I created a simple page that makes a console.log call:

https://dl.dropboxusercontent.com/u/4710867/v3brainstorming/template/index.html

I compile servo and run it:

./mach run https://dl.dropboxusercontent.com/u/4710867/v3brainstorming/template/index.html

This is the output I'm getting:

http://pastebin.mozilla.org/8369054

I don't see any reference to "borrow values that are already borrowed". Am I doing everything correctly?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 27, 2015

Yes, you are, and it appears I was mistaken, assuming that was the output of a build without the changes to ScriptMemoryFailsafe::drop.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 27, 2015

That being said, I think this is still a valuable change to make even if I'm having trouble coming up with reproducible ways of demonstrating its effectiveness.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 27, 2015

@jdm More noob questions. In this case:

https://github.com/servo/servo/blob/master/components/script/script_task.rs#L249

Can I call as_unsafe_cell directly? Do I have to implement a new method that calls as_unsafe_cell like this one? Can I just call one of the existing ones?

https://github.com/servo/servo/blob/master/components/script/dom/bindings/cell.rs#L32

@jdm
Copy link
Member Author

@jdm jdm commented Jan 27, 2015

Oh right, we have our wrapper type. You'll want to add a new method to it that calls through to as_unsafe_cell, like you proposed.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 27, 2015

@jdm. Thanks for your help. This is where I'm at:

https://github.com/dmarcos/servo/tree/issue4692

I added a method borrow_for_script_deallocation to DOMRefCell. I'm sure you have a better name for it. I'm not sure what to do with the assert.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 27, 2015

I think you can just assert SCRIPT; we don't have any guanratees about the state of the GC when we try to reclaim memory.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 27, 2015

Continuing from #4690, I just realized that the call to mut_js_info is going to bite us too :( We can either replace that with an equivalent method that calls through to as_unsafe_cell as well, or just wait for the fix in the Rust standard library that will allow borrowing already-borrowed RefCell values during unwinding.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 27, 2015

@jdm I let you make the call. It doesn't really matter to me. I just want to land my first patch on servo :). Do I keep working on this or I look for a new easy bug? I'm open to suggestions if you know of another easy one that might be interesting.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 28, 2015

I would really like to have Servo stop aborting whenever there's a panic in the script task. Let's add another method to Page that does the same as mut_js_info but uses as_unsafe_cell instead.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 28, 2015

@jdm Okay. Just to check I understood. Do I keep mut_js_info and add another method that calls as_unsafe_cell right? What would be a good name for the method?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 28, 2015

Let's just call it unsafe_mut_js_info for now.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 28, 2015

@jdm Ok. So I got a patch working:

https://github.com/dmarcos/servo/tree/issue4692

I spent some time making the compiler happy (any tricks to make compilation faster?). I'm still wrapping my head around type casting, memory allocation/deallocation and borrowing concepts in Rust. I spent too much time in JS land in the last couple of years :). I'm sure you can suggest a cleaner way to do what I did. Below the outputs with and without my patch when running my test page that calls windows.close() on loading:

https://dl.dropboxusercontent.com/u/4710867/v3brainstorming/template/index.html

There are no "DOMRefCell already mutably borrowed' messages anymore:

OUTPUT WITH PATCH

https://pastebin.mozilla.org/8407151

OUTPUT WITHOUT PATCH (master)

https://pastebin.mozilla.org/8407148

What do I do next to make my patch worth of being merged?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 29, 2015

Hmm, there are a couple things I would change there. First, I don't believe *(&mut page.unsafe_mut_js_info()) is doing what we want. unsafe_mut_js_info should probably be returning &'a mut Option<JSPageInfo> instead (and similarly borrow_for_script_deallocation). Also, be sure to indent the code inside the unsafe blocks by the full four spaces that the rest of the code uses. Finally, we shouldn't change the code in shut_down_layout. We don't have any guarantees about what other code may be borrowing those values when that method is executed, so it's not safe to sidestep the checks that complain about multiple borrows there.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 29, 2015

@jdm Thanks for helping with this.

dmarcos@1a6f486

  1. Fixed the indenting
  2. Changed unsafe_mut_js_info to return &'a mut Option
  3. Changed borrow_for_script_deallocation to return &'a mut T
  4. Removing changes from shut_down_layout.

I'm getting the following compilation error for borrow_for_script_deallocation:

dom/bindings/cell.rs:48:9: 48:44 error: cannot borrow immutable borrowed content as mutable
dom/bindings/cell.rs:48 &*self.value.as_unsafe_cell().get()

How should I turn the immutable return value into mutable? Just looked for examples in the servo code base but haven't found any.

@jdm
Copy link
Member Author

@jdm jdm commented Jan 29, 2015

You should use &mut *self.value.as_unsafe_cell().get() instead.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 29, 2015

@jsm Thanks!

It compiles now. After removing the changes in shut_down_layout I get one 'already mutably borrowed' error but I guess it's what you expect. One of the messages is gone, the other remains. This is the output with patch:

https://pastebin.mozilla.org/8410108

vs no patch (master):

https://pastebin.mozilla.org/8407148

What else do you want me to do?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 29, 2015

thread 'ScriptTask' has overflowed its stack - this message worries me. Can you run the test under a debugger and get the backtrace when this occurs?

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 29, 2015

@jdm Sure. I run this on lldb and this is the output I get:

https://pastebin.mozilla.org/8422656

It doesn't look meaningful to me. Should I be building servo on debug mode or something like this? How do I do it?

@jdm
Copy link
Member Author

@jdm jdm commented Jan 29, 2015

Is that the result of thread backtrace?

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 29, 2015

@jdm This is the result of thread backtrace:

https://pastebin.mozilla.org/8423377

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 29, 2015

This is repeated ad infinitum:

frame #14: 0x00000001018580a7 servo`panicking::panic_fmt::hd6c77548964d848fpfl + 167
frame #15: 0x0000000101f66288 servo`option::Option$LT$T$GT$::expect::_FILE_LINE::heee4863a37ed8f1dxoo + 24
frame #16: 0x000000010183656c servo`failure::on_fail::h518a2e1bba17c9c90Hz + 2572
frame #17: 0x000000010180581a servo`rt::unwind::begin_unwind_inner::h85e4f9f71209d373Qpz + 2282
frame #18: 0x0000000101805a14 servo`rt::unwind::begin_unwind_fmt::hf3d322d122a35172noz + 308
frame #19: 0x0000000101835aae servo`rust_begin_unwind + 94
@jdm
Copy link
Member Author

@jdm jdm commented Jan 29, 2015

Ok, what happens if you set a breakpoint on rust_panic? You can skip the first one (that's the borrow-related one), but the backtrace of subsequent ones should be interesting.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 29, 2015

@jdm This is what I'm getting (You can see the whole lldb sequence of commands):

https://pastebin.mozilla.org/8424649

I see DOMRefCell referenced there but still looks pretty cryptic to me.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 30, 2015

@jdm It seems that it gets stuck on Garbage Collection cycles. I don't understand how my changes are affecting garbage collection

@jdm
Copy link
Member Author

@jdm jdm commented Jan 30, 2015

I'm pretty sure you want continue instead of next, in order to skip over the first instance of the breakpoint. Right now I think we're just advancing to the next line and then looking at the backtrace.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 30, 2015

@jdm I'm building to run lldb one more time. Any tricks to make compilation faster? Any option you can recommend that I should use?

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 30, 2015

@jdm This is what I got:

https://pastebin.mozilla.org/8428521

It doesn't stop on the breakpoint a second time. After doing 'c' once I get the stack overflow

@jdm
Copy link
Member Author

@jdm jdm commented Jan 30, 2015

Huh, that's really quite peculiar. I'll clone your branch and see if I can reproduce.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 30, 2015

@jdm Yeah please go ahead. Let me know what you find. I'm just curious to understand what's going on besides looking forward to landing a first patch :)

This is the branch:

https://github.com/dmarcos/servo/tree/issue4692

@jdm
Copy link
Member Author

@jdm jdm commented Jan 30, 2015

I figured it out! The assertion in LiveDOMReference's destructor is failing (and this shows up in one of your previous logs), and this causes an endless chain of failure because the destructor attempts to keep executing over and over. You can just delete the whole destructor (dom/bindings/refcounted.rs), since the assertion is not actually particularly useful.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 30, 2015

@jdm Sweet! Just got rid of the destructor. I see different behavior though with and without the patch. Without the patch the program aborts and with it it doesn't seem to exit, I have to quit manually. Is this expected? Below the outputs.

Without the patch (master):

https://pastebin.mozilla.org/8429537

With the patch:

https://pastebin.mozilla.org/8429616

I pushed the changes to the branch:

https://github.com/dmarcos/servo/tree/issue4692

@jdm
Copy link
Member Author

@jdm jdm commented Jan 30, 2015

I noticed that too. It turns out it's because we route key events from the compositor to the current script task and back, to allow the focused element to consume events if necessary. In practice, this means Constellation::handle_key_msg runs and doesn't do anything, since current_frame returns None. What should be happening here is that the failed script task should be replaced automatically, but that seems to be broken. I'm fine with merging this change as-is and filing a followup to investigate why the failed task isn't getting replaced, and another followup to make handle_key_msg handle a missing script task better.

@dmarcos
Copy link
Contributor

@dmarcos dmarcos commented Jan 30, 2015

@jdm Okay! Here it's the PR. Thanks a lot for your help!

#4777

@jdm jdm closed this Feb 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.