Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java Language Bindings #5

Closed
benjchristensen opened this issue Feb 27, 2015 · 12 comments
Closed

Java Language Bindings #5

benjchristensen opened this issue Feb 27, 2015 · 12 comments

Comments

@benjchristensen
Copy link
Contributor

Another aspect of Reactive Streams IO that interests me is having language specific APIs as bindings on top of the network protocol and using Reactive Streams Publisher to expose the streams.

Specifically I plan on defining RS interfaces in Java for client/server functionality of HTTP/1.1, SSE, WebSockets, HTTP/2, and TCP functionality as well as the RS.io network protocol we are now defining.

The intent would be for common RS.io interfaces to be used to implement the networking library (on top of various transport implementations such as Netty) and then allow various RS implementations like Akka, Rx, Reactor, etc to layer on top. The hope is for collaboration to enable solid, efficient, battle-tested reactive stream oriented networking APIs that can fit into our various opinionated apps.

screen shot 2015-02-26 at 11 55 33 pm

The reason I'm seeking for this is that we all seem to keep reinventing the wheel defining APIs for input/output on protocols that don't change, and I'm wondering if by putting our heads together we can do better and get a foundation we can work together on while allowing the exploration and opinions to happen a layer above? Our collective move to RS seems like a good chance to work together before we all go and build our different circumference wheels that all roll the same way.

Is this something that is too opinionated for RS, or would interfaces of this sort be useful across multiple libraries?

If it is of interest, should they be put in another repo such as reactive-streams-io-jvm or as a sub-module of reactive-streams-io?

UPDATE: This original posting was not very clear so I have clarified below: #5 (comment)

@danarmak
Copy link

If I understand correctly, you're suggesting defining Java interfaces that would describe the RS.io protocol-level messaging, and writing a standard implementation for bridging these RS.io interfaces with standard RS. And leave it to implementers to write different bridges from the RS.io interfaces to the actual transports: one would implement RS.io on top of Netty and TCP, another on top of akka and http, etc.

What I'm afraid of is that different transport libraries (netty, akka-http, NIO sockets / websockets, maybe more exotic things) have different models of asynchronous communications, and perhaps of identifying and connecting to remote parties. And a single set of RS.io interfaces might make it inconvenient or inefficient to bridge some of them. I don't have enough knowledge of all the major transport implementation libraries to judge if this would happen.

Even if it did happen, I think there would be good value in publishing such a library for those who can use it, just not mandating that RS.io implementations MUST use it. An RS.io implementation would be expected to expose standard RS to the user, and experience would show if most or all of them could use this library.

@NiteshKant
Copy link

I have created PR #7 to discuss the bindings for TCP clients and servers.

If I understand correctly, you're suggesting defining Java interfaces that would describe the RS.io protocol-level messaging

Me & Ben were discussing offline over this and the intention is to create much higher level clients and servers that abstracts the protocol interaction and provide a RS way of reading and writing data on a TCP socket.
These client/server will communicate using the RS.io protocol for propagating RS constructs like subscribe, cancel, request(n) over network boundaries but the details would not be exposed to the users of these abstractions.

What I'm afraid of is that different transport libraries (netty, akka-http, NIO sockets / websockets, maybe more exotic things) have different models of asynchronous communications, and perhaps of identifying and connecting to remote parties.

Indeed and I think it will be laborious if not restricting to create a common contract on how these libraries should handle transport. The intention here would be to create interfaces solely based on RS SPI and the actual implementations will use their own idiosyncratic ways to interact with the transport. If we can get to a good API design of how clients & servers should look like with help from all parties here, I think it will be a pretty good indication of how different users use TCP at the application layer.

@benjchristensen
Copy link
Contributor Author

I'm thinking I should take this pursuit to a different repo. If it stays in RS perhaps 'reactive-streams-net-jvm'?

The intent would be Reactive Stream APIs for common networking. It may be too opinionated to get agreement but I'd like to try before I do in on my own.

@rkuhn @viktorklang are you okay with me creating another repo for pursuing this even if it fails, or should I do it in a Netflix or personal repo?

I would then close this issue and leave reactive-streams-io to focus just on the network protocol.

@danarmak
Copy link

@benjchristensen here are my 2 cents, FWIW.

RS and RS.io are both intended to allow interoperability between unrelated implementations, including (with RS.io) outside the JVM. On the other hand, a bridge between RS and RS.io is 'merely' an implementation, a library. And libraries don't need to be standardized.

Your first comment on this ticket suggested another set of standard interfaces - call them the JVM representation of RS.io - with a bridge library converting them to ordinary RS. Then those RS.io interfaces could be implemented by different people using different networking libraries and transports.

But if the RS.io network protocol is standardized, and if I've already written a JVM implementation for it, then implementing the RS interfaces on top of that (perhaps using a generic RS implementation) should be trivial. The RS.io protocol structure should (at least in my own proposal) be almost identical to RS signalling anyway. The 'bridge' library you suggest would be very small and simple. And I don't think something like that needs a standardization effort with every one involved here, or that you should wait for a 👍 from everyone here on every change to it.

@benjchristensen
Copy link
Contributor Author

I apologize for not being clearer. My intention with Reactive Streams IO (RS.io) is to create a network protocol for RS that is independent of language, transport and implementation.

In addition to RS and RS.io, I am interested in network APIs for both client and server for TCP, UDP, HTTP/1, HTTP/2, WebSockets, SSE, etc. I and colleagues like @NiteshKant have been exploring these APIs at Netflix and interacting with others such as @jbrisbin to see if these APIs make sense standardized or not. We can obviously create them on our own, but they feel so similar to what each of us keep defining, I'm curious as to whether there is value in have standardized RS Network APIs (RS.net).

The proposed value would be that different underlying transport implementations could expose the same RS Network APIs that different RS implementations could all consume.

For example, with HTTP/1, we all need to model these things:

  • request headers and body
  • response headers and body
  • mime/content types
  • status codes
  • etc

Using Reactive Streams, an HTTP/1 response is most like something like this:

Publisher<HttpResponse<T>> response = submitGet(...);

The HttpResponse then would have the status codes, headers, etc and expose the body as another Publisher, something perhaps like this:

Publisher<T> content = httpResponse.getContent();

There are only so many ways that these various network protocols can be combined with RS since RS is defined, and the protocols are defined.

If these RS.net interfaces existed, then a networking library like Akka.io, Spray, Netty, etc could implement the transport layer however it deems best and expose the various protocols using RS.net interfaces.

Then consumers could use existing bindings for Akka Streams, RxJava, Reactor etc for consuming, transforming and composing the IO.

I see value in this layer of abstraction between transport implementations and stream implementations and will end up building it for my needs. Before I do it on my own I want to find out if others see value in this and would like to participate.

@danarmak
Copy link

@benjchristensen thanks, that's much clearer now!

For semantically simple transports like TCP, UDP, or WebSockets, I think such interfaces would be of great value. Particularly with one or more generic interfaces or types representing a 'stream of bytes' so that one could write e.g. an RS.io implementation without binding to a particular transport or a particular transport implementation.

On the other hand, HTTP is complex enough that people are forever reinventing the wheel modeling it. Personally I very much like the akka-http effort: it's a clean immutable model, it follows the HTTP specification to the letter without omitting or simplifying any details, it has native Java and Scala support, and its entities are already defined in terms of RS.

However, looking at how much effort was needed to arrive at this stage (e.g. the akka-http model still had to fix problems found in the spray model), I have to caution against doing it yet again. My job for several years now has involved writing HTTP proxies which have to support most if not all possible HTTP behavior, and I've been repeatedly at odds with different HTTP libraries, both on the JVM and on the CLR, not modeling the protocol correctly. I think the best bet at this point would be to try to standardize the akka-http model classes and maybe publish them separately from akka.

Taking a step back from that, there are many network protocols around and it would be wonderful for them all to have good published models matching the specification. But the example of HTTP leads me to feel each one should be written by an expert in that protocol, unless it's as easy to describe as, say, SOCKS.

The whole effort sounds to me very interesting and I might want to participate, but I must confess I really don't enjoy writing Java, only Scala. So I'll probably confine myself to writing Scala implementations once the interfaces are ready ;-)

@jbrisbin
Copy link

jbrisbin commented Mar 2, 2015

@smaldini and I are working on extracting some of the things we did in Reactor's net module that put Reactive Streams on top of Netty. We just need to replace some references to our Stream with Publisher and things like that and we can push our own PR for discussion.

@NiteshKant
Copy link

@jbrisbin @smaldini 👍

@jbrisbin
Copy link

jbrisbin commented Mar 3, 2015

I created a gist [1] that sketches out some new interfaces based on our use of Reactive Streams in our -net support, which is pluggable with Netty and ZeroMQ currently. I made some changes to what we have, but tried to keep the same ideas.

We use a little different vocabulary for these components. I prefer Channel rather than Connection because it should be possible to create a completely passive Channel that does not maintain any active connection to the peer until it needs to write data. This is for use cases in cross-data-center cases where it's impractical to maintain a long-lived connection but the user still wants the low-level speed and efficiency of a binary connection.

writes

  Publisher<Boolean> write(Publisher<? extends OUT> data);

In order to maintain the flow of the Reactive Stream, writes must be performed with a Publisher only. They also have all the callbacks required in a Subscriber to signal the peer of any abnormal conditions.

The concept of "flush" is a little more complicated because that only really applies to certain kinds of channels that don't write all of their data immediately (which is not every kind of channel e.g. HTTP). If the kernel only exposes a write(Publisher), however, that can be handled by configuration that indicates a "flush" onComplete. Convenience methods can be created that use a FlushableSingletonPublisher that takes a single value, causes it to be written to the underlying transport's channel, and then flushes that channel based on configuration.

connect

  Publisher<CHAN> connect(ClientSocketOptions opts);

  Publisher<CHAN> connect(ClientSocketOptions opts, Publisher<Client<IN, OUT, CHAN>> reconnect);

To make systems resilient, there should be a provision for reconnection when a client has a channel interrupted. We have a special class that tracks the number of attempts and returns a possibly different InetSocketAddress to support connecting to a variety of endpoints, depending on availability. I wasn't sure how to represent that without this class, so I chose a simple Publisher that passes the Subscriber the channel so they can choose to either try to connect to the previous address, a different one, change options, or what have you.

I also chose to represent the connect as taking an options class (again, convenience methods could abound over this) since we actually specify those at creation time of the client. But if I had it to do over again, I'd pass the options on every connect so I could change them without having to create a new client instance.

Having it to do over again, I'd also rather have the connect() return a Publisher I could subscribe to rather than have the Client itself be the Publisher. This allows for creating multiple connections from a single client but doing different things depending on the peer you're connecting to. If you make the Client a Publisher, then you're fixing the Client to deal with only one connection pipeline which can be limiting.

intercept

  <I, O, C extends Channel<I, O>> Client<I, O, C> intercept(Function<CHAN, C> fn);

I haven't found a good way to represent the channel pipeline interception necessary to do transformations on objects for tasks like serialization/deserialization. The problem is that you need a Function<T,V> on the incoming side and a Function<V, T> on the outgoing. Here I chose to represent this function by simply returning the Channel with transformed input and output, which would be the case if you started with a Channel<Buffer, Buffer> but wanted to deal with a Channel<FramedRequest, FramedResponse>.

I think it's important to point out that these interfaces will not be used by users directly. They are for the transport module's use only. That module could provide the convenience methods necessary to make things easier to do (like write data of type T).

EDIT[1]: We've also found dealing with Void to be problematic in a number of ways. IMO it's better to use a Boolean, even if you know it's always going to be true to avoid having to use nulls because the type is Void. We filter out nulls in our streams anyway, so we've had to make changes ourselves to move away from that.

EDIT[2]: We also need to be able to listen to more than just open and close events. We have the ability to listen for Netty's readIdle and writeIdle events now and we'll need that in this as well. I suspect we'll need an event domain class that represents commons channel events (OPEN, CLOSE, READ_IDLE, WRITE_IDLE) and expose a Publisher for those.

EDIT[3]: We probably don't even need a close() on the Channel since that will be handled by the Subscription.cancel.

[1] - https://gist.github.com/jbrisbin/cc6600d3cc63c442ea50

@jbrisbin
Copy link

jbrisbin commented Mar 3, 2015

There's nothing preventing these abstractions from being extended to file-based use as well. It would be nice to be able to connect to an mmap-backed file and shunt data into it until complete, at which time the client would do a sendfile on it and ship it across the network.

In my microbenchmarks, using sendfile on microbatched data is bar none the highest throughput way to ship data around. That's why Spark does it that way AFAIK. :)

@benjchristensen
Copy link
Contributor Author

I'm closing this out in favor of discussing Java bindings for general network protocols in https://github.com/reactive-streams/reactive-streams-net-jvm

@benjchristensen
Copy link
Contributor Author

This conversation is moved to reactive-streams/reactive-streams-net-jvm#1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants