Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCD based, Asynchronous Implementation of the API #96

Closed
wants to merge 17 commits into from

Conversation

@helje5
Copy link
Contributor

@helje5 helje5 commented Nov 10, 2017

This PR contains a GCD DispatchIO based implementation of the API and presumably should scale significantly better than the synchronous one. Throughput should be somewhat worse for obvious reasons.

I also setup a small test server demonstrating the API: http-testserver. Of special interest is the slow handler, which does the typical sleep() async demo.

Unlike Noze.io or Node.js it does not use a single network processing queue, but rather a set of base queues as suggested by Johannes. To support that, the handler API has been enhanced with the queue argument. All API calls of a handler need to be dispatched back to that (but the handler itself runs on the same queue, so the trivial cases require no extra work).

This implementation is supposed to properly support back pressure and pipelining.

Caveats:

  • tests don't work
  • little testing has been done, probably has tons of bugs ;-)
helje5 added 9 commits Nov 10, 2017
The little socket stuff we need we can do directly in the
code. No need to wrap such basics.

The StreamingParser was some combination of
response writer and parser. Those are really
distinct things (the writer doesn't really
need to know about the parser, and the reverse).
Similar to PR #86. In this case it is non-optional,
that thing being a concrete implementation.

All API functions need to run on the given queue. But
since the handler itself also runs on that queue, it
only needs to be done if the handler actually dispatches
to a different queue.

This is an optimization to avoid excessive dispatching
in async environments. (if API calls could be issued
from arbitrary queues, they would always need to be
queued to the handler queue. which is possible but
pretty expensive).

P.S.: The argument is necessary because there is
nothing like `DispatchQueue.current` in GCD anymore.
This is a concrete implementation. Unless we do anything as
protocols, there is no need to have this as one.
HTTP server which uses GCD to handle async I/O.
... on Darwin. Use OSAtomicIncrement/Dec.
Presumably tests would need to be done differently in an async
setup. Not sure how much we could reuse here.
Import Dispatch explicitly, some SOCK_STREAM weirdness.
... debug log, remove.
Copy link
Contributor

@carlbrown carlbrown left a comment

I like this approach, but we at least need to get the tests running first before we merge it. I can help with that, if you think more than one of us can work on it without stepping all over each other.

@helje5
Copy link
Contributor Author

@helje5 helje5 commented Nov 10, 2017

It would be awesome if you would get the tests working again. I don't plan to work on such any time soon, so feel free to do anything you want!

Adjusting the tests might be challenging, because, well, they need to be asynchronous. For Noze.io I have a helper class which tries to deal with the asynchronicity, not sure whether it might help you or not (or even whether this is the correct way to test that stuff): NozeIOTestCase.swift

The other hurdle is that I made the HTTPResponseWriter concrete (i.e. it is not a protocol anymore). I suppose that makes some mocking hard. We could change it back to a protocol, but I think it was suggested that we should avoid them for performance reasons.

@carlbrown carlbrown mentioned this pull request Nov 10, 2017
@helje5
Copy link
Contributor Author

@helje5 helje5 commented Nov 10, 2017

BTW: I also still wonder whether we would want to allow multiple implementations, or whether that is just bad for a basic core module. E.g. I can imagine that we might be able to use generics to efficiently do something like Apache MPMs, for example:

let asyncServer = HTTPServer<DispatchMPM>(port: ...)

and

let syncServer = HTTPServer<ThreadedMPM>(port: ...)

Both have their unique advantages and disadvantages. But maybe this would make it too complex, not sure.

@carlbrown
Copy link
Contributor

@carlbrown carlbrown commented Nov 10, 2017

Adjusting the tests might be challenging, because, well, they need to be asynchronous. For Noze.io I have a helper class which tries to deal with the asynchronicity, not sure whether it might help you or not

Ok. Let me go to work on that, and we'll see how far I get.

//
// See http://swift.org/LICENSE.txt for license information
// Created by Helge Hess on 22.10.17.

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

We probably want to keep the swift.org license stuff, right?

This comment has been minimized.

@helje5

helje5 Nov 11, 2017
Author Contributor

Nah, I wrote it. It should have my name on it 😬 License is fine though.

This comment has been minimized.

@ogres

ogres Nov 16, 2017

This is probably not a good idea.
None of the other files in this repo have neither this nor "Copyright © 2017 SomeCompany..."
Do you think each time someone changes this file, they should add "Changed by..." to it as well? We have a git history for it, so your work won't be forgotten :)


private var acceptCount = 0
private let handlerBaseQueues : [ DispatchQueue ] = {
var queues = [ DispatchQueue ]()

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

nit spacing var queues = [DispatchQueue]()

This comment has been minimized.

@helje5

helje5 Nov 11, 2017
Author Contributor

Yeah, that is ugly. But feel free to run one of your style 'cleanups', I don't care.

This comment has been minimized.

@anayini

anayini Nov 11, 2017
Contributor

👍 Sounds good

private let handlerBaseQueues : [ DispatchQueue ] = {
var queues = [ DispatchQueue ]()
for i in 0..<ProcessInfo.processInfo.processorCount {
let q = DispatchQueue(label: "de.zeezide.swift.server.http.handler\(i)")

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

Interesting - the idea here is that its most efficient to have a queue per processor?

This comment has been minimized.

@helje5

helje5 Nov 11, 2017
Author Contributor

That is not an idea, that is a fact ;-) There is also activeProcessCount which might be more appropriate (is that ever different?)

All this really belongs into the Options, care to file an Issue for this? E.g. the above is good for a server, but an on-device server may only want to use one thread.

This comment has been minimized.

@anayini

anayini Nov 11, 2017
Contributor

Hm cool didn't know about that

Whereas the processorCount property reports the number of advertised processing cores, the activeProcessorCount property reflects the actual number of active processing cores on the system. There are a number of different factors that may cause a core to not be active, including boot arguments, thermal throttling, or a manufacturing defect.

I guess they can be different

This comment has been minimized.

@benasher44

benasher44 Nov 13, 2017

Shouldn't this be the responsibility of libdispatch? Having 1 warm thread per processor is great, but GCD is (partly among other things) supposed to free us from having to worry about this detail (# of threads per process, and arguable # of active threads at all).

If we think it'll be commonly desirable to warm threads for this kind of application (a server written in Swift), can we move this logic behind a GCD API or have GCD (if it doesn't do this already) warm threads in a library constructor function? At the very least, I think we should open a ticket/issue and leave a TODO here to come back and clean this up.

This comment has been minimized.

@benasher44

benasher44 Nov 13, 2017

DispatchQueue.concurrentPerform already leans on the internal function _dispatch_qos_max_parallelism to understand the capabilities of system in terms of max parallelism, which contains some logic that is like missed here (or at least repeated). I think that if we could figure out a way to get some help from GCD for this purpose, we'd end up with a better (at least in terms of less repeated code, if not in terms of better performance) result.

This comment has been minimized.

@helje5

helje5 Nov 30, 2017
Author Contributor

more threads won't help us to be faster

I think keeping the base queues is worth it for the reverse. I also do on-device servers and I would usually prefer to run those on less threads than the device has cores (one or maybe three on the iPhone Y). The same may be true on servers which are not dedicated to just one app.

What do you think?

This comment has been minimized.

@helje5

helje5 Nov 30, 2017
Author Contributor

Also usleep(10_000) vs foo(remaining: 10000) 🤔

This comment has been minimized.

@weissi

weissi Nov 30, 2017

@helje5 the usleep is just a crappy way to force it to not re-use the very same thread.

If we want to over-engineer it we could also just have a baseQueueProvider closure or something that either returns an appropriate base queue or if the user doesn't want that just nil. Because base queue nil means no target queue AFAIK.

This comment has been minimized.

@helje5

helje5 Nov 30, 2017
Author Contributor

Man, this was 10_000 vs 10000 ;-)

The closure doesn't sound too bad. Would be optional and part of the Options object passed in.

This comment has been minimized.

@helje5

helje5 Dec 7, 2017
Author Contributor

I added the closure and dropped the specific algorithm.

for i in 0..<ProcessInfo.processInfo.processorCount {
let q = DispatchQueue(label: "de.zeezide.swift.server.http.handler\(i)")
queues.append(q)
q.async { _ = 46 + 2 } // warm up threads

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

What exactly happens in Dispatch when you do this? Is there like a fixed cost on a DispatchQueue during the first async to it?

This comment has been minimized.

@helje5

helje5 Nov 11, 2017
Author Contributor

See the thread with Johannes on the mailing list. My takeaway was that GCD only assigns threads on-demand (which can take 100ms on Linux or something I don't know).
But maybe this code is non-sense, a GCD expert should have a look over the whole codebase.

This comment has been minimized.

@weissi

weissi Nov 30, 2017

I did indeed say that and it is still the case (on Linux) for concurrent queues but not serial queues. Apologies.

private func handleListenEvent(on fd : Int32,
handler : @escaping HTTPRequestHandler,
localAddress : sockaddr_in)
{

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

In general I prefer:

private func handleListenEvent(
  on fd: Int32,
  handler: @escaping HTTPRequestHandler,
  localAddress: sockaddr_in
) {

which I think is common in many Swift style guides.

This comment has been minimized.

@helje5

helje5 Nov 11, 2017
Author Contributor

I don't, but I don't care much either.

This comment has been minimized.

@helje5

helje5 Nov 13, 2017
Author Contributor

Not that I care a lot but I only know this style guide: https://github.com/raywenderlich/swift-style-guide and that doesn't have it ...

guard !data.isEmpty else { return nil }
// TODO: reuse header values by matching the data via memcmp(), maybe first
// switch on length, compare c0
guard let s = String(data: data, encoding: .utf8) else { return nil }

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

Can put these both in one guard if you like

This comment has been minimized.

@helje5

helje5 Nov 11, 2017
Author Contributor

Why would I do this? Those are separate things. This one doesn't make sense for me.

internal class HTTPConnection : CustomStringConvertible {

/// The `HTTPServer` which created the connection. If the connection shuts
/// down, it will unregister from that server.

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

Nit: spacing / color violations

This comment has been minimized.

@helje5

helje5 Nov 11, 2017
Author Contributor

color violations??? :-)

2 spaces are right, has ARI approval: http://www.alwaysrightinstitute.com/swifter-space/

but feel free to change it.

private var writers = [ HTTPResponseWriter ]()
// FIXME: make this a linked list
internal init(fd : Int32,

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

Nit: usually use newline before first arg when splitting args up on multiple lines

This comment has been minimized.

@helje5

helje5 Nov 13, 2017
Author Contributor

This actually sounds a little weird, is there a style guide or standard lib example which does it like this?

This comment has been minimized.

@anayini

anayini Nov 13, 2017
Contributor

Hm - yeah maybe this is just a thing that we do in my company's internal style guide.

I actually don't even know, is there an Apple approved style guide? Couldn't find one with a quick search.

This comment has been minimized.

@helje5

helje5 Nov 13, 2017
Author Contributor

You should find out if you propose such changes ;->

This comment has been minimized.

@anayini

anayini Nov 13, 2017
Contributor

I guess they do this only if the line would have been too long without, in which case they start the arguments on the next line.

I can go check to see if other projects are adhering to some style guide somewhere. Eventually we will probably run into a bunch of different tiny things like this.

This comment has been minimized.

@helje5

helje5 Nov 13, 2017
Author Contributor

This looks really ugly, but fair enough. On the pro side, it uses a proper 2-space indent :-> Maybe we should have an issue to track what style we want to do?

Just to be clear, I do care a lot about style (and know very well that mine is the only correct one), but I think that in a project like this it should just match the mainline - please never argue with we do in my company's internal style guide but with Swift does it like this, check [here].

This comment has been minimized.

@anayini

anayini Nov 13, 2017
Contributor

👍 . Will file an issue.

This comment has been minimized.

@anayini

anayini Nov 13, 2017
Contributor

Wasn't really arguing with we do this in my company's internal style guide. Only that I'm used to that style guide so I made the mistake of marking this as wrong here.

writers.remove(at: idx)
}
else {
assert(false, "did not find writer which went done: \(self)")

This comment has been minimized.

@anayini

anayini Nov 10, 2017
Contributor

Can use assertionFailure(message: String)

This comment has been minimized.

@helje5

helje5 Nov 11, 2017
Author Contributor

Don't care.

This comment has been minimized.

@helje5

helje5 Nov 13, 2017
Author Contributor

Because that got a thumb down, I guess I should clarify that what I meant is that I don't care whether or not we do this. I'm happy to do either :-) I don't care whether it is assert(false) or assertionFailure.

@helje5
Copy link
Contributor Author

@helje5 helje5 commented Nov 12, 2017

I enabled Issues on my fork https://github.com/ZeeZide/http/issues and marked those which may be necessary to merge this PR with a Required for PR label.

These tests pass on Linux if run one at a time,
but there's some issue so if you run them all,
the first test passes and the second one hangs. 

Still need to fix that.
@carlbrown
Copy link
Contributor

@carlbrown carlbrown commented Nov 15, 2017

@helje5 I added a PR against your fork that has a batch of changes for getting tests to run on the Mac. (There will be another one once I find/fix a bug on Linux).

Gets the simple end-to-end tests running on Mac
@tanner0101 tanner0101 mentioned this pull request Dec 6, 2017
helje5 added 4 commits Dec 7, 2017
... this was utter non-sense, no idea why I did this :-)
The listen socket is already non-blocking and it already
deals w/ nio accept. 🤦‍♀️
Remove the specific algorithm for selecting a base
queue and use an optional closure to return one, as
suggested by Johannes.

By default this will not assign a target queue and
let GCD deal with this. See related thread on
PR #96.
@GeorgeLyon
Copy link

@GeorgeLyon GeorgeLyon commented Dec 11, 2017

Don't want to derail this PR, but I think leaking the DispatchQueue the server uses into the handler is mistake. Read and write operations on a descriptor don't need to be synchronized as far as I am aware. An example where the server and handler synchronization primitives are separate can be found here: https://github.com/GeorgeLyon/Server

@helje5
Copy link
Contributor Author

@helje5 helje5 commented Dec 11, 2017

Don't want to derail this PR, but I think leaking the DispatchQueue the server uses into the handler is mistake.

It is an optimisation. I don't really like it either, but the alternatives seem too expensive to me (require the response-writer to be free-threaded).

Read and write operations on a descriptor don't need to be synchronized as far as I am aware

Yeah, but I don't see how that helps us in any way. The goal here is to allow for asynchronous handlers which support back pressure. That implies more book-keeping. Which needs to be synchronised.

P.S.: Please lets continue the discussion on-list.

helje5 added 2 commits Dec 18, 2017
The connection doesn't need to know about the
server anymore. Instead we pass in a closure to
be called when the connection tears down.

Note: there is an intentional cycle between
server and connection. It will be broken when
the connection is done.
... a little better I guess.
@helje5
Copy link
Contributor Author

@helje5 helje5 commented Mar 1, 2018

@helje5 helje5 closed this Mar 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.