Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflation #45

Closed
EugenDueck opened this issue Aug 3, 2012 · 12 comments
Closed

Conflation #45

EugenDueck opened this issue Aug 3, 2012 · 12 comments

Comments

@EugenDueck
Copy link
Contributor

I'd like to see engine.io support a feature called conflation, i.e. the removal of some messages based on criteria like

  • the current performance of the client targeted to receive the message
  • the frequency of messages
  • a newer message making stale all older messages of the same "topic" (a term used in messaging software, somwhat related to how socket.io uses "rooms")

So a more general version of the volatile feature in socket.io.

Conflation is especially beneficial to performance of both server and client, when broadcasting and multicasting (e.g. rooms) are frequently used, and protects the server from getting bogged down memory-wise by a single slow consumer who'll cause the buffer to grow to heaven (or until the heartbeat kills the connection, which could still be a lot depending on the msg frequency / heart beat frequency rate and the number of slow consumers). [1]

In case my description above failed to convey the usefulness of conflation, http://magmasystems.blogspot.jp/2006/08/conflation.html has a brief description of that feature and its application to the distribution of price quotes in finance. IBM, too, use conflation for the same purpose: http://publib.boulder.ibm.com/infocenter/imds/v3r0/index.jsp?topic=/com.ibm.imds.doc/welcome.html .

If engine.io wants to enable conflation feature based on the client's performance consuming messages, it has to get support from the engine.io layer, because that feature depends on the client's state (connection open and drained?), which is - understandably - hidden from the application layer. Conflation based on message frequency alone can obviously be done completely in the application layer, as the application has control over how often it calls emit and can throttle it without the help of engine.io.

There's a rather straightforward way to implement it so that it is both flexible in terms of the conflation logic, yet does not require complex logic inside engine.io itself, and I've actually already implemented it in socket.io v0.9.8.

Here's a simplified pseudo code diff, leaving out a couple of intermediate steps:

before

  • in myApp
    • io.emit(myJavascriptObject);
  • in socket.io/transport
    • transport.write(encodePacket(myJavascriptObject))

after

  • in myApp, configuration
    • io.set('conflater', function(messages) { /* for example: */ return [messages[messages.length - 1]]; });
  • in myApp, runtime
    • io.emit(myJavascriptObject);
  • in socket.io/transport
    • conflationBuffer.push(myJavascriptObject);
    • .onDrained(function() { transport.actualWriteMessages(encodePackets(io.get('conflater')(conflationBuffer))); }

If no conflater function has been configured during initialization, no buffering or calling conflater will be done.

The conflater message can

  • just return that buffer unchanged, in which case no conflation is performed
  • simply remove elements from the array, performing conflation
  • remove elements and replace them with fewer or different elements, i.e. performing aggregation
  • even add elements, for whatever unknown reason I don't know and don't currently care about

The above is simplification of the algorithm is not the whole truth, however: In reality, the functions in socket.io/transport are NOT given the myJavascriptObject, as provided by the client, but the already encoded version of it. They only get to see encodePacket(myJavascriptObject). There are 2 good reasons for this:

  • myJavascriptObject will be encoded only once (in SocketNamespace.packet(..))
  • the transports can be given serialized packet versions straight out of the RedisStore or whatever other store there might be that has the need to serialize messages

Now, I would not want to hand the encoded message into the application layer for the following 2 reasons:

  • the app layer shouldn't know about how stuff gets encoded
  • the app layer will have a hard time working on encoded strings, rather than on proper JS objects

I have an idea how to solve this in a way that

  • hides the lower level encoding internals from the app layer
  • avoid multiple encodings / decodings of the same message (i.e. caching of results)
  • avoids having to maintain a 'cache hash table' or similar with all the related problems (when to garbage collect the cache?)
  • works for both scenarios without the need to serialize up until the point data is sent to the client (in socket.io: MemoryStore) and with that need (RedisStore)

The question is - is the conflation feature deemed important enough, and is it further considered impossible to implement without adding changing engine.io, as I believe? In that case, I would actually like to prepare a pull request.

@EugenDueck
Copy link
Contributor Author

Somehow I must have forgotten to paste in the footnote that I refer to in the text. So I'll try to reproduce it:

[1] This could be implemented by either one of following or some combination, potentially configurable:

  • Make the buffer bounded, and when it gets full, kick client
  • Use a ring buffer, where older entries get overwritten
  • Call buffer = conflate(buffer) when buffer is full - but this depends on conflate(buffer) returning less elements than it receives, but that could be checked, and a fallback to kick client could be implemented

@EugenDueck
Copy link
Contributor Author

Quoting my last statement:

... depends on conflate(buffer) returning less elements than it receives

This reminds me that calling this feature conflation would probably be too narrow. If I'll come up with a better name, I'll post it here, but conceptually, this is a general feature that lets application code manipulate (conflate, augment, modify) the buffer, just before being actually sent to a client. Somewhat long for a name...

@rauchg
Copy link
Contributor

rauchg commented Aug 7, 2012

I want this

@rauchg
Copy link
Contributor

rauchg commented Aug 7, 2012

@EugenDueck I wonder if all that we should do is add a callback to send. In the case of websocket, we fire when the ws transport fires the callback. In the case of polling, when the incoming GET poll request consumes the message.

Then it's up to a higher-level impl to decide on the conflation mechanics.

@EugenDueck
Copy link
Contributor Author

I wonder if all that we should do is add a callback to send.

That's basically what I'm doing, except that - in my socket.io implementation that I just tested yesterday and it seems fine - I wrap the packets in a new Packet class, which holds and caches

  • the data variant of the packet (i.e. the JS object)
  • the on-the-wire variant of the packet

It offers methods getAsData and getAsOnTheWireOrSomething (i forgot the exact name I used). Those are the ones that are used by socket.io, and those methods will create the representation if it does not exist and cache it for reuse.

Otherwise there'd be a lot of waste in back- and forth-encoding, especially with multicast (rooms) and broadcast.

Will try to prepare a engine.io variant of it as soon as time permits. Question: Do you agree that the callback (provided by the app that is using socket.io/engine.io) should not see the on-the-wire representation but the JS object?

@rauchg
Copy link
Contributor

rauchg commented Aug 7, 2012

Agreed completely. In fact, in engine.io Socket hols an array of packet objects, and only the Transport deals with encoding logic.

@EugenDueck
Copy link
Contributor Author

Alright. Will hopefully get a pull-request done later today.

Question about the rough timeline. When talking engine.io 1.0, are you talking days, or weeks, or months?

@rauchg
Copy link
Contributor

rauchg commented Aug 7, 2012

Days

@EugenDueck
Copy link
Contributor Author

Gotcha. One more question about the big picture: As I don't see them in engine.io: stores will stay in socket.io?

@rauchg
Copy link
Contributor

rauchg commented Aug 7, 2012

Yep. In fact, that's why I want the reliable messaging layer directly in socket.io. It'd be really cool if processes could die, come and go, yet messages stay in let's say redis.

@EugenDueck
Copy link
Contributor Author

Indeed.

I'm sorry this is getting even more off-topic, but in terms of the big picture, which it is good to always have when making non-trivial changes: Does (the current and/or the future) socket.io's RedisStore allow to non-sticky sessioned load balancing? (Just haven't had the time to dig into it yet)

@rauchg
Copy link
Contributor

rauchg commented Aug 8, 2012

Per discussion, we're moving it into a different module.

@rauchg rauchg closed this as completed Aug 8, 2012
dave-r12 pushed a commit to dave-r12/engine.io that referenced this issue Jul 17, 2016
…ble-and-method-parameter-names-should-comply-with-a-naming-convention-fix-1

squid:S00117 - Local variable and method parameter names should comply with a naming convention
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants