Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide entire message including channel to subscription callbacks #385

Closed
wants to merge 1 commit into from

Conversation

carter-thaxton
Copy link

Provide the entire message object, including channel, id, and any data added by extensions, as a second argument to subscription callbacks.

This does not break the API in any way, and mirrors the data provided to extensions.

This fixes issue #119

…extensions, as a second argument to subscription callbacks. Does not break API
@justcfx2u
Copy link

Tossing in my "+1". Patch is succinct. Passing additional information around in the message payload as "duplicate" information does not seem as elegant as this modification appears to be.

@jcoglan
Copy link
Collaborator

jcoglan commented Jul 4, 2015

This change makes me nervous, for several reasons. It conflates several concerns that the Faye client deliberately keeps separate. Those concerns are:

  1. The content of application data that users publish
  2. The details of the Bayeux protocol
  3. The mechanism by which messages are routed
  4. The transport mechanism -- WebSocket, XHR, etc.

The client currently only exposes the first of these to the user, by design. The extension layer exposes the second. This means that users, on first using the library, are not presented with extraneous data; the data the comes out of subscribe() listeners is exactly that which goes into publish() calls. This symmetry keeps the client unsurprising to use, and hiding the enclosing Bayeux message discourages inappropriate uses of protocol elements like clientId.

The separation between 1 and 2 also tends to keep certain kinds of operations apart. Extensions are frequently used to mutate data between the client and the transport layer. Every message goes through your extensions once and only once, and are passed to each extension sequentially rather than in parallel. (Messages are not passed through extensions a second time if the transport needs to retry sending them.) This means you can mutate it and be sure that nothing else is using the message at the time you're mutating it. Functions passed to subscribe() are more like event listeners, which act more like parallel processors than sequential ones. Mutating data inside event listeners is an anti-pattern; many listeners can be bound to the same event (the same channel in Faye terms) and mutating event data makes the app's behaviour dependent on the order that event listeners are called. This causes hard-to-debug problems in large systems. Given that Faye exposes the full message mostly so that the user can mutate it, I don't think exposing it via an event listener API is appropriate.

This also points to a potential usability pitfall in this design: it will not be immediately apparent that the data property of the second argument is actually the same object as the first argument. This will cause surprises should people not realise this and mutate one or the other. It's unusual for one function argument to just be a substructure of another, because such redundancy is unnecessary.

Having an event listener emit more than one value makes the use of higher-level abstractions harder, because it makes listeners harder to compose. For example, subscribe() currently returns a Promise that's used to indicate when the subscription becomes active. However, the value it returns could also be extended to act as a Stream of messages, or users could convert it to one using Bacon or ReactiveX. If we give complete Bayeux messages the type Message, and their data portion the type Data, then such streams would be of type Stream Data, that is, a Stream that contains Data values. If the subscribe() callback also yielded a Message value, this would need to be accommodated into the type of the stream, making it either Stream Message or Stream (Data, Message). The former would constitute a breaking change to users of the stream, and the latter would also constitute a complication that makes the stream harder to compose.

This points to a subtlety in what we mean by 'breaking change'. Yes, in JavaScript you can add a parameter to a function and existing callers will not throw errors. However this is an accident of JavaScript's design; a client in Ruby that uses lambdas or bound methods for callbacks would break if the arity of subscribe callbacks were changed. JS might not crash if we make this change, but we have changed the function's type and this is liable to break abstractions built on this API, if they want to incorporate this change. (Some may incorporate it implicitly by using arguments rather than explicit parameters.) It may also break code that passes a function to subscribe() by name, where the function accepts multiple arguments the user (or later maintainer) is not aware of; this is why [1, 2, 3, 4].map(parseInt) has surprising results.

Even if you're not using a formal abstraction like a stream or some other functor, this is likely to require downstream changes compared to the solution I proposed in #119: listeners will often hand a message off to other functions for processing, and if the listener receives more arguments, those must also be handed off to processing functions, and such functions and their call sites will need to be changed to do this. (This is really just an informal way of saying that this change breaks the type of the event stream.)

Finally, this change encourages users to conflate 1 and 3, which I think is an anti-pattern. To me, a channel is just a "thing you get data out of", it is not itself meaningful data. If you think of every channel as a stream, then subscribing to a wildcard is equivalent to constructing the union of all the streams matching that pattern. Picture what happens if you do the equivalent thing with arrays:

var arr1 = ['alice', 'bob', 'carol'],
    arr2 = ['dan', 'eve', 'fred'];

var merged = arr1.concat(arr2);

If you then try to process merged, it would be odd to want to know the name of the variable that the source arrays that went into merged are bound to. Instead, you would want each value in merged to be self-contained enough to be interpreted on its own, without needing to know where it came from. You see the same thing in APIs, for example resources requested using an ID will frequently contain the ID in the response:

$ curl api.example.com/things/4

{
  "id": 4,
  # etc
}

This allows the response to be processed without having to carry around the URL it was derived from. I would apply the same pattern when using Faye: if a message is completely self-contained, it can be passed around to any part of the application, and those sites won't need to know which channel the message came from, or even that it came from Faye at all. This allows for decoupling and stops the application become deeply coupled to the Faye client or the Bayeux protocol.

However, if you do need to identify the source of the message, the small extension I propose in #119 lets you do this while retaining all the benefits I have outlined above, because from the subscribe() listener's point of view the message is still a single, self-contained value. This separation of concerns allows a lot of things to vary independently (which is historically why the Faye API has been so stable) and allows for useful compositions, which I feel this change makes significantly harder. In this sense I think the #119 solution is more "elegant" for preserving modularity.

@jcoglan
Copy link
Collaborator

jcoglan commented Jul 4, 2015

I should clarify that the above is just me pedantically working through my own reasoning about why I might feel a certain way about a change. Although I have tried to focus on objective properties of the code, all of it is in service of weighing something very fuzzy (i.e. usability) and I might be wrong.

That's why I've not closed this issue -- if I'm wrong, I'd like to be told.

@jarthod
Copy link

jarthod commented Sep 29, 2015

Hi @jcoglan

While I do understand the motivation between this choice to keep everything isolated, I still think this missing features breaks expectations for developers as most messaging solutions does this.

It pushes people to add the information to the data hash (what we do currently), but this feels a bit hackish IMO.

Also I think adding the channel as a second parameter is the good option and would not hurt compatibility, even in ruby as you can check the arity of the callback to ensure full compatility:

def on_message(&block)
  block.arity == 2 ? yield('data', 'channel') : yield('data')
end

http://rubyfiddle.com/riddles/e689b

Finally, I like your simple extension suggestion in #119, but it's a bit annoying to have to implement it yourself and add it everywhere you publish (it can be from several places or processes, js & ruby)

We can update our PR (#395, which is most complete about this feature I think) with arity check to be fully backward-compatible if you're interested.

@wankdanker
Copy link

I am in support of PR #395. Any pubsub interface I've used (admittedly not many) that supported wildcard subscriptions would pass the channel as an argument to the message callback.

I understand @jcoglan's reasoning and would suggest that PR #395 be modified to toggle this behavior based on a configuration option passed to the client constructor.

Basically, it should be opt-in for those who want it. That way it won't break backwards compatibility and those who do opt-in will be aware of the potential breakage or special handling needed for Streams or Promises or whatever.

That's my take. :)

@jcoglan
Copy link
Collaborator

jcoglan commented Nov 7, 2015

Sorry for taking my time over this. While I still think there are better solutions to this problem, as I outlined above, I understand this is a convenience that a bunch of people expect (seeing as there have been multiple similar PRs about it). I would like to introduce something that we can be sure won't break existing code.

I don't want to introduce new parameters into existing callbacks, since that would break variadic functions and prevent conversion of such callbacks into a streams as I mentioned above. Arity checking is a hack as is really complicated to get right in general, and is useless if the actual subscriber function is being indirected through another function anyway. I think the cleanest way to add this feature is to introduce a new method that explicitly signals that you want the channel as an argument.

I am not going to yield the entire message because that would mix application and protocol data. It breaks the symmetry of "what you pass to publish() comes out of subscribe()", it assumes client events correspond to protocol messages in general, it allows users to change data that might break the client and couple their app to the protocol, and other such problems. This is why we have extensions: to make it explicit when you're touching the protocol so you have a clearer awareness of what you're doing.

(We also have extensions entirely so people can make changes like this one without needing the core client to be changed, but I understand that over time some common use cases can make their way into core. But I also think that requiring subscribers to know where a message came from, because the message's own content is insufficient to process it, produces highly coupled designs, and so it should feel like a hack if you want that behaviour, but I digress.)

Here's my proposal. 52d6df2 adds a new method to Subscription called withChannel(). It lets you bind a listener to a channel while explicitly signalling that you want the channel in your callback. You use it like this:

// if I publish {"channel": "/chat/alice", "data": {"hello": "world"}}, then
// channel is '/chat/alice' and message is {hello: 'world'}
client.subscribe('/chat/*').withChannel(function(channel, message) { /* ... */ });

withChannel() returns the subscription itself so you can still use it as a promise:

var sub = client.subscribe('/chat/*').withChannel(function(channel, message) { /* ... */ });

sub.then(function() { /* ... */ });

I've put channel as the first argument rather than the second, partly because it reads better and partly because it makes sure that 'normal' subscribe() callbacks and withChannel() callbacks have different types; if message were the first arg then (because JS doesn't check arity) it would be too easy to accidentally use a subscribe() callback in withChannel() position and vice versa, leading to subtle errors. This is a bit like the reasoning for putting error first in Node callbacks: to make sure you handle it. If you're using withChannel() then I want to make sure you actually need and use that channel argument and don't silently ignore it as you could if message were the first arg. It makes sure the author's intent is clear.

How do people feel about this design? If it seems okay then I will add more tests and port it to Ruby, and release it in version 1.2.

@jarthod
Copy link

jarthod commented Nov 7, 2015

Seems like a good option to me, it reads nice and doesn't break anything.

@wankdanker
Copy link

Does there happen to be a client level event emitted for messages? Something like this would be good enough for my needs:

client.on('message', function (channel, message) { /* ... */ });
client.subscribe('/chat/*');

Otherwise, .withChannel seems fair.

@justcfx2u
Copy link

Since I chimed in before, I'll go ahead and add my $0.01; I like the 'withChannel' idea. Your reasoning seems spot-on. Since I have already written workarounds long ago I don't know when I will have the chance to test this new mechanism but I might try porting/converting an existing implementation just to see how it goes. Timeframe TBD, unfortunately.

@toxaq
Copy link

toxaq commented Feb 20, 2016

Is client.subscribeWild('/recipient/9db0b131/*', function(message, channel){...}) an option? I wanted to use a wildcard in the same way that I'd use a routing engine for differentiating processing on the serverside. I'd never considered how jcoglan explains how he considers using each channel as a stream that could be merged. I don't see the benefit of channels if the data needs to be self descriptive other than as a base level filter (obviously this represents my bias of expectation and isn't representative of anyone but me).

@jcoglan
Copy link
Collaborator

jcoglan commented Apr 27, 2016

The withChannel() idea I proposed above has finally been implemented in both Ruby and JS, and will appear in version 1.2. I'll close this issue and thank all of you for your patience. I'm sorry this didn't happen sooner but I've not had a lot of time to work on Faye lately.

@jcoglan jcoglan closed this Apr 27, 2016
@penguinpowernz
Copy link

Thanks for all your hard work @jcoglan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants