Conflation #45

EugenDueck · 2012-08-03T04:55:03Z

I'd like to see engine.io support a feature called conflation, i.e. the removal of some messages based on criteria like

the current performance of the client targeted to receive the message
the frequency of messages
a newer message making stale all older messages of the same "topic" (a term used in messaging software, somwhat related to how socket.io uses "rooms")

So a more general version of the volatile feature in socket.io.

Conflation is especially beneficial to performance of both server and client, when broadcasting and multicasting (e.g. rooms) are frequently used, and protects the server from getting bogged down memory-wise by a single slow consumer who'll cause the buffer to grow to heaven (or until the heartbeat kills the connection, which could still be a lot depending on the msg frequency / heart beat frequency rate and the number of slow consumers). [1]

In case my description above failed to convey the usefulness of conflation, http://magmasystems.blogspot.jp/2006/08/conflation.html has a brief description of that feature and its application to the distribution of price quotes in finance. IBM, too, use conflation for the same purpose: http://publib.boulder.ibm.com/infocenter/imds/v3r0/index.jsp?topic=/com.ibm.imds.doc/welcome.html .

If engine.io wants to enable conflation feature based on the client's performance consuming messages, it has to get support from the engine.io layer, because that feature depends on the client's state (connection open and drained?), which is - understandably - hidden from the application layer. Conflation based on message frequency alone can obviously be done completely in the application layer, as the application has control over how often it calls emit and can throttle it without the help of engine.io.

There's a rather straightforward way to implement it so that it is both flexible in terms of the conflation logic, yet does not require complex logic inside engine.io itself, and I've actually already implemented it in socket.io v0.9.8.

Here's a simplified pseudo code diff, leaving out a couple of intermediate steps:

before

in myApp
- io.emit(myJavascriptObject);
in socket.io/transport
- transport.write(encodePacket(myJavascriptObject))

after

in myApp, configuration
- io.set('conflater', function(messages) { /* for example: */ return [messages[messages.length - 1]]; });
in myApp, runtime
- io.emit(myJavascriptObject);
in socket.io/transport
- conflationBuffer.push(myJavascriptObject);
- .onDrained(function() { transport.actualWriteMessages(encodePackets(io.get('conflater')(conflationBuffer))); }

If no conflater function has been configured during initialization, no buffering or calling conflater will be done.

The conflater message can

just return that buffer unchanged, in which case no conflation is performed
simply remove elements from the array, performing conflation
remove elements and replace them with fewer or different elements, i.e. performing aggregation
even add elements, for whatever unknown reason I don't know and don't currently care about

The above is simplification of the algorithm is not the whole truth, however: In reality, the functions in socket.io/transport are NOT given the myJavascriptObject, as provided by the client, but the already encoded version of it. They only get to see encodePacket(myJavascriptObject). There are 2 good reasons for this:

myJavascriptObject will be encoded only once (in SocketNamespace.packet(..))
the transports can be given serialized packet versions straight out of the RedisStore or whatever other store there might be that has the need to serialize messages

Now, I would not want to hand the encoded message into the application layer for the following 2 reasons:

the app layer shouldn't know about how stuff gets encoded
the app layer will have a hard time working on encoded strings, rather than on proper JS objects

I have an idea how to solve this in a way that

hides the lower level encoding internals from the app layer
avoid multiple encodings / decodings of the same message (i.e. caching of results)
avoids having to maintain a 'cache hash table' or similar with all the related problems (when to garbage collect the cache?)
works for both scenarios without the need to serialize up until the point data is sent to the client (in socket.io: MemoryStore) and with that need (RedisStore)

The question is - is the conflation feature deemed important enough, and is it further considered impossible to implement without adding changing engine.io, as I believe? In that case, I would actually like to prepare a pull request.

The text was updated successfully, but these errors were encountered:

EugenDueck · 2012-08-04T22:30:58Z

Somehow I must have forgotten to paste in the footnote that I refer to in the text. So I'll try to reproduce it:

[1] This could be implemented by either one of following or some combination, potentially configurable:

Make the buffer bounded, and when it gets full, kick client
Use a ring buffer, where older entries get overwritten
Call buffer = conflate(buffer) when buffer is full - but this depends on conflate(buffer) returning less elements than it receives, but that could be checked, and a fallback to kick client could be implemented

EugenDueck · 2012-08-04T22:38:39Z

Quoting my last statement:

... depends on conflate(buffer) returning less elements than it receives

This reminds me that calling this feature conflation would probably be too narrow. If I'll come up with a better name, I'll post it here, but conceptually, this is a general feature that lets application code manipulate (conflate, augment, modify) the buffer, just before being actually sent to a client. Somewhat long for a name...

rauchg · 2012-08-07T00:38:47Z

I want this

rauchg · 2012-08-07T00:45:40Z

@EugenDueck I wonder if all that we should do is add a callback to send. In the case of websocket, we fire when the ws transport fires the callback. In the case of polling, when the incoming GET poll request consumes the message.

Then it's up to a higher-level impl to decide on the conflation mechanics.

EugenDueck · 2012-08-07T02:37:04Z

I wonder if all that we should do is add a callback to send.

That's basically what I'm doing, except that - in my socket.io implementation that I just tested yesterday and it seems fine - I wrap the packets in a new Packet class, which holds and caches

the data variant of the packet (i.e. the JS object)
the on-the-wire variant of the packet

It offers methods getAsData and getAsOnTheWireOrSomething (i forgot the exact name I used). Those are the ones that are used by socket.io, and those methods will create the representation if it does not exist and cache it for reuse.

Otherwise there'd be a lot of waste in back- and forth-encoding, especially with multicast (rooms) and broadcast.

Will try to prepare a engine.io variant of it as soon as time permits. Question: Do you agree that the callback (provided by the app that is using socket.io/engine.io) should not see the on-the-wire representation but the JS object?

rauchg · 2012-08-07T02:43:09Z

Agreed completely. In fact, in engine.io Socket hols an array of packet objects, and only the Transport deals with encoding logic.

EugenDueck · 2012-08-07T02:45:56Z

Alright. Will hopefully get a pull-request done later today.

Question about the rough timeline. When talking engine.io 1.0, are you talking days, or weeks, or months?

rauchg · 2012-08-07T02:46:49Z

Days

EugenDueck · 2012-08-07T02:51:10Z

Gotcha. One more question about the big picture: As I don't see them in engine.io: stores will stay in socket.io?

rauchg · 2012-08-07T02:56:44Z

Yep. In fact, that's why I want the reliable messaging layer directly in socket.io. It'd be really cool if processes could die, come and go, yet messages stay in let's say redis.

EugenDueck · 2012-08-07T03:26:33Z

Indeed.

I'm sorry this is getting even more off-topic, but in terms of the big picture, which it is good to always have when making non-trivial changes: Does (the current and/or the future) socket.io's RedisStore allow to non-sticky sessioned load balancing? (Just haven't had the time to dig into it yet)

rauchg · 2012-08-08T00:38:25Z

Per discussion, we're moving it into a different module.

…ble-and-method-parameter-names-should-comply-with-a-naming-convention-fix-1 squid:S00117 - Local variable and method parameter names should comply with a naming convention

EugenDueck mentioned this issue Aug 8, 2012

implemented conflation #55

Closed

rauchg closed this as completed Aug 8, 2012

darrachequesne pushed a commit that referenced this issue May 8, 2020

README: documented reconnect (closes #45)

e105250

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conflation #45

Conflation #45

EugenDueck commented Aug 3, 2012

EugenDueck commented Aug 4, 2012

EugenDueck commented Aug 4, 2012

rauchg commented Aug 7, 2012

rauchg commented Aug 7, 2012

EugenDueck commented Aug 7, 2012

rauchg commented Aug 7, 2012

EugenDueck commented Aug 7, 2012

rauchg commented Aug 7, 2012

EugenDueck commented Aug 7, 2012

rauchg commented Aug 7, 2012

EugenDueck commented Aug 7, 2012

rauchg commented Aug 8, 2012

Conflation #45

Conflation #45

Comments

EugenDueck commented Aug 3, 2012

EugenDueck commented Aug 4, 2012

EugenDueck commented Aug 4, 2012

rauchg commented Aug 7, 2012

rauchg commented Aug 7, 2012

EugenDueck commented Aug 7, 2012

rauchg commented Aug 7, 2012

EugenDueck commented Aug 7, 2012

rauchg commented Aug 7, 2012

EugenDueck commented Aug 7, 2012

rauchg commented Aug 7, 2012

EugenDueck commented Aug 7, 2012

rauchg commented Aug 8, 2012