Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add native Windows back-end #166

Open
wants to merge 109 commits into
base: master
from

Conversation

Projects
None yet
@antrik
Copy link
Contributor

antrik commented Jun 2, 2017

This picks up from #108

For now, it's just commit refactoring (squashing/splitting) of the original PR -- though the intention is to add a series of follow-up commits next, doing some improvements to the actual code, according to my latest review of the original PR.

Since the implementation is essentially working at this point though, it could also be merged as is, if that turns out necessary.

@highfive

This comment has been minimized.

Copy link
Collaborator

highfive commented Jun 2, 2017

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!
@antrik

This comment has been minimized.

Copy link
Contributor Author

antrik commented Jun 2, 2017

Note: the Linux test failure is a known intermittent in the test condition (happening under memory pressure), not in the actual ipc-channel implementation -- it is on my list to fix when I get around to it...

@antrik antrik force-pushed the antrik:windows branch from f5441b6 to ef7758d Jun 2, 2017

@antrik

This comment has been minimized.

Copy link
Contributor Author

antrik commented Jun 2, 2017

Dropped truncate() calls from the newly introduced tests -- missed that during the rebase...

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Aug 22, 2017

☔️ The latest upstream changes (presumably #167) made this pull request unmergeable. Please resolve the merge conflicts.

@antrik antrik force-pushed the antrik:windows branch 2 times, most recently from 29a368b to 60919ae Aug 25, 2017

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Sep 6, 2017

☔️ The latest upstream changes (presumably #168) made this pull request unmergeable. Please resolve the merge conflicts.

@antrik antrik force-pushed the antrik:windows branch 4 times, most recently from e62d6d8 to 3134354 Sep 23, 2017

@nox

This comment has been minimized.

Copy link
Member

nox commented Sep 24, 2017

What's the status on this?

@antrik

This comment has been minimized.

Copy link
Contributor Author

antrik commented Sep 24, 2017

@nox every time I get to work on the issues I already know, I find other bits to improve as well... So it's hard to give an estimate. If I don't run into any serious problems, it could be done in a few days -- but I wouldn't count on that...

@antrik antrik force-pushed the antrik:windows branch 4 times, most recently from cbe49f7 to 4cda562 Sep 24, 2017

@nox

This comment has been minimized.

Copy link
Member

nox commented Sep 24, 2017

No problem @antrik, was just doing a bit of triaging.

@antrik antrik force-pushed the antrik:windows branch 6 times, most recently from 82db31d to 99f2c1a Sep 24, 2017

antrik added some commits Apr 30, 2018

windows: cleanup: Streamline handling of reader list in `select()`
When receiving an event for a reader, instead of keeping it in the list,
and only removing it at the end if it got closed, we now remove it
immediately, only to add it back later when a new read is started
successfully.

This simplifies the code a bit; and it should be more robust too, since
a reader's presence in the list is now tied more closely to it having a
read in flight, i.e. being able to receive events.
windows: Fix `unsafe` coverage in `OsIpcReceiverSet.select()`
Correct IOCP event handling is critical to the soundness of the
`select()` code: mixing it up might result in trying to access a buffer
and `OVERLAPPED` structure that the kernel is not actually done with
yet...

This means that all code on the path from receiving IOCP events, to
changing the status of the corresponding readers in
`notify_completion()`, needs to be considered unsafe.

Handling the results of `notify_completion()` and `start_read()` on the
other hand isn't unsafe at all, since neither affects the individual
readers' `read_in_progress` flags -- which have sole responsibility for
keeping track of kernel aliasing. The worst that could happen here is
trying to fetch messages from the a reader which didn't actually receive
(which is begnign), and/or losing track of which readers are still open.
windows: Rename `set_id` to `entry_id`
`set_id` kept misleading me, sounding like it's the ID of the set the
receiver is part of, rather than the ID of the receiver within the
set...
windows: cleanup: More idiomatic handling of `io_err`
Initialise using exhaustive conditionals, avoiding mutation.
windows: cleanup: Pass status as a proper `Result<>`
Rather than passing a naked error code (with a magic value denoting
success), use a standard `Result<>` enum.
windows: cleanup: `match` on errors in `notify_completion()`
Use exhaustive match in one place rather than scattered conditionals.
This should be much easier to follow.
windows: More defensive `read_in_progress` handling
Set the flag before issuing the system call, and only reset it later if
it turns out no read was actually started.

This way it's less likely that setting the flag gets ommited by mistake,
which could have catastrophic effects, since setting this is crucial for
tracking the kernel aliasing of `ov` and `read_buf`.
windows: Make `MessageReader.ov` optional
This field is only meaningful while an async read is in progress with
the kernel; and we never reuse it between reads. Making this more
explicit and robust by only giving it a value while it's in use.
windows: Drop explicit `read_in_progress` flag
Since the `ov` field is now only set when we have a read in progress,
and thus effectively serves as an indicator of that state, the explicit
flag became redundant: we can just check the presence of `ov` instead.

(And since a check is performed implicitly when unwrapping the `ov`
value, some of the explicit asserts on `read_in_progress` can be dropped
entirely.)
windows: Avoid leaking unsafety outside `unsafe` blocks
Introduce an `AliasedCell` wrapper type for the fields that get aliased
by the kernel during async read operations, making sure that these
values can only be accessed through special unsafe methods, i.e. only
inside `unsafe` blocks, even if the wrappers get passed around through
safe code.

This makes invoking `MessageReader.start_read()` safe; and consequently
removes all unsafety from `OsIpcReceiver`.

(`OsIpcReceiverSet.select()` retains some unsafe code though, since it
does raw I/O itself... This can only be fixed by factoring out that
code.)
windows: Put `ov` and `buf` in the same `AliasedCell<>`
Put both of the fields aliased by the kernel during an async read
operation together in a common `AliasedCell<>`. This increases
robustness, and further tightens unsafe boundaries, by making sure the
two fields stay consistent with each other.

When they are wrapped in `AliasedCell<>` separately, safe code is
prevented from giving any of them invalid values individually -- but
they could still get out of sync with each other, if some code moves
just one of them and not the other. Consistency between these fields
however is crucial for correctness as well as soundness.
windows: Don't panic on unknown errors in `notify_completion()`
While this might have been a problem in the past, the current code
properly passes errors from `notify_completion()` up through all layers;
so there is nothing really preventing us from orderly returning any kind
of error reported by the `GetOverlappedResult()` or
`GetQueuedCompletionStatus()` system calls.

(The only problem would be if `GetOverlappedResult()` has failure modes
that actually leave the async operation in progress, and thus the async
data in use: in that case, unpacking the `AliasedCell<>` in async and
returning to the caller would be wrong... But if that's the case, the
previous behaviour of panicking after unpacking the `AliasedCell<>` was
just as wrong.)

Since returning arbitrary errors requires us to invoke
`WinError::from_system()` (either directly, or indirectly through
`WinError::last()`), and this one wants to know the origin of the error
(so it can put it in debug traces), we need to do these conversions near
the call sites, and pass an already converted error to
`notify_completion()`. (Which seems cleaner anyway.)

This has the side effect of also logging "broken pipe" (sender closed)
errors in the debug trace, which might be slightly redundant with any
other tracing done by the "closed" handling... I don't think that's
really a problem, though.
windows: refactor: Split out `OsIpcReceiverSet.fetch_iocp_result()`
Split out the system call and associated handling for getting completion
notifications on an IOCP (set) from the rest of the `select()` method,
similar to how `fetch_async_result()` encapsulates the event handling
for regular readers.

This will be necessary for proper `Drop` handling for
`OsIpcReceiverSet`.

Incidentally, this exactly covers tha unsafe code section of the
`select()` implementation; and as such, it's a good step towards better
layering of the IOCP handling in general.

I guess it could be argued that a method call is also more readable than
assigning from a large anonymous block...
windows: Make `cancel_io()` sound
According to the documentation of `CancelIoEx()`, successful completion
of the `CancelIoEx()` call only indicates that the cancel request has
been successfully *queued* -- but it does *not* mean we can safely free
the aliased buffers yet! Rather, we have to wait for a notification
signalling the completion of the async operation itself.

We thus split out the actual `CancelIoEx()` call into a new
`issue_async_cancel()` method, and turn `cancel_io()` into a wrapper
that waits for the actualy async read to conclude (using
`fetch_async_result()`) after issuing the cancel request.

Since that doesn't work on readers in a receiver set, we need to add an
explicit `Drop` implementation for `OsIpcReceiverSet`, which issues
cancel requests for all outstanding read operations, and then uses
`fetch_iocp_result()` to wait for all of them to conclude.
windows: Move `handle` into `AsyncData` as well
For the duration of an async read operation, move the pipe handle into
the `AliasedCell` along with the other fields used for the async
operation.

This prevents anything else from messing with the pipe while the async
read is in progress; and makes sure the handle and the other fields can
never get mismatched. While I'm not sure whether there is any scenario
in which such a mismatch could result in undefined behaviour, it's good
for general robustness in any case.
[RemoveMe] Temporarily restore all CI targets using `unix` back-end
Increase chance of catching intermittent failures while working on other
stuff...
windows: Properly hide all `win32-trace` code behind conditionals
Make sure all code related to the `win32-trace` feature is compiled only
when the feature is enabled.

This avoids unnecessary bloat; as well as potential compile errors when
other code conditional on this feature is added.
windows: Introduce `MessageReader.get_raw_handle()` debug helper
Add a (conditional) helper method for obtaining the raw handle of the
reader -- which is often needed for the `win32_trace` invocations -- to
abstract the internal structure of this type, thus facilitating further
refactoring.

An alternate approach would be overriding the `Debug` trait on
`MessageReader`, to just print the raw handle value. That would provide
better encapsulation; however, it would also preclude the possibility of
easily printing all the constituents of the structure during
debugging...

(Or we could leave `Debug` alone, and instead implement it as `Display`
-- but that feels like an abuse of the `Display` facility... Not sure
what to think about that.)

@antrik antrik force-pushed the antrik:windows branch from 0681aa6 to fac85e5 Jun 8, 2018

@bors-servo

This comment has been minimized.

Copy link
Contributor

bors-servo commented Aug 9, 2018

☔️ The latest upstream changes (presumably #183) made this pull request unmergeable. Please resolve the merge conflicts.

@ebkalderon

This comment has been minimized.

Copy link

ebkalderon commented Sep 30, 2018

Anyone know the current status of this PR? Is no one working on it anymore?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.