Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent crash with “failed to fill whole buffer” #991

Closed
SimonSapin opened this issue Mar 19, 2017 · 11 comments
Closed

Intermittent crash with “failed to fill whole buffer” #991

SimonSapin opened this issue Mar 19, 2017 · 11 comments

Comments

@SimonSapin
Copy link
Member

SimonSapin commented Mar 19, 2017

Many intermittent test failures on Servo’s CI have in comment this error message: servo/servo#14323.

The backtraces have webrender::render_backend::RenderBackend::run just before core::result::unwrap_failed. That method calls std::io::Read::read_exact which does produce “failed to fill whole buffer” errors when the input is shorter than expected. In this case the input is auxiliary_data, obtained from self.payload_rx.recv().unwrap(). With IPC enabled (which I think is the case for Servo), self.payload_rx is a ipc_channel::ipc::IpcBytesReceiver, whose recv method returns Result<Vec<u8>, _>. But since ipc_channel has no documentation, I don’t know what length that vector is supposed to have. Does recv return exactly one vector for each corresponding IpcBytesSender::send call, or is this a byte stream that can be fragmented at any point? Anyway, somehow this vector is sometimes shorter than WebRender expects.

@SimonSapin
Copy link
Member Author

CC @pcwalton for IpcBytesReceiver::recv semantics.

@pcwalton
Copy link
Contributor

It should be returning exactly one vetcor for every corresponding IpcBytesSender::send() call. If not, it's a bug in ipc-channel.

@pcwalton
Copy link
Contributor

Filed servo/ipc-channel#160.

@SimonSapin
Copy link
Member Author

This could be either that, or a bug in WebRender: the expected vector lengths are display_list_descriptor.size() and auxiliary_lists_descriptor.size(), and these descriptors are part of ApiMsg::SetRootDisplayList. Maybe the sender is buggy and sends a message that doesn’t match what is sent in the bytes channel?

@antrik
Copy link
Contributor

antrik commented Apr 1, 2017

A bit of a long shot: but is it possible by any chance that a single Renderer -- and thus a single auxillary_tx AIUI -- is used by multiple LayoutThreads?...

(That would be very bad, since it would mean the epoch is not unique I believe, thus causing all kinds of funny races...)

@jdm
Copy link
Member

jdm commented Apr 1, 2017

There is a single Renderer used by multiple LayoutThreads, yes. We create one Renderer per Browser (https://dxr.mozilla.org/servo/rev/449758ef5dd399f7e1a5a9550dcdd614056cee9e/components/servo/lib.rs#179-191) and clone the RenderApiSender value for each layout thread spawned from the same constellation (there's a single constellation per Browser).

@emilio
Copy link
Member

emilio commented Apr 2, 2017

So that seems quite likely, thanks for digging into it @antrik!

So mainly set_root_display_list is called from multiple layout threads, which seems dubious, and it sends two messages, which don't necessarily need to be ordered (someone could send a message in between).

Those two messages are the SetRootDisplayList message, and the Payload.

So apart from the epoch thing that @antrik pointed out, it means that things can get out of sync pretty easily...

It seems dubious to me that the layout thread in the first place is setting the root pipeline, given that pipeline doesn't necessarily need to be displayed...

@emilio
Copy link
Member

emilio commented Apr 2, 2017

So we could add an explicit synchronization API to WR, or (ideally) we could fix servo to actually only render the top-level page, and send the stuff to the render back-end from the constellation instead of the layout thread.

That means, also, that we could move the IPC boundary off-webrender I think (the renderer lives in the constellation process), which will simplify things also for Gecko, is that right @glennw?

@SimonSapin
Copy link
Member Author

CC @nical, it sounds like moving the IPC boundary would be relevant to you.

@emilio
Copy link
Member

emilio commented Apr 2, 2017

Actually as @glennw pointed out, we should probably just check the pipeline id + the epoch. Let's see if that fixes the intermittents, or we need to add more synchronization between those messages.

@emilio
Copy link
Member

emilio commented Apr 2, 2017

Relevant IRC discussion at http://logs.glob.uno/?c=mozilla%23servo&s=2+Apr+2017&e=2+Apr+2017#c643593.

Turns out we already have all that leftover_auxiliary_data to deal with the synchronization, the problem was exclusively the epoch mismatch problems @antrik pointed out.

I'm going to fix that for now, but we should also consider moving all the IPC complexity into Servo, and leave WR IPC-free. (I believe @staktrace could be happy about that too :P).

emilio added a commit to emilio/webrender that referenced this issue Apr 2, 2017
bors-servo pushed a commit that referenced this issue Apr 2, 2017
channel: Add the pipeline id to the payload message, and filter by it when processing auxiliary lists.

Fixes #991

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/1050)
<!-- Reviewable:end -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants