Added multi-processor support for cube servers #106

Open
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants

godsflaw commented Jan 3, 2013

If options.workers is absent, this code should fork a worker for every CPU on the host. Obviously options.workers may be used to override this default. Please update wiki documentation accordingly.

@godsflaw godsflaw Added multi-processor support for cube servers
If options.workers is absent, this code should fork a worker for every
CPU on the host.  Obviously options.workers may be used to override
this default.  Please update wiki documentation accordingly.
53a7a8f
Collaborator

RandomEtc commented Mar 5, 2013

I like the idea of this but I don't know a lot about the node cluster module. Does it support udp and websockets and everything else?

I might split this out into a separate bin file to keep the basic collector/emitter very simple and clear.

I worry a bit about some of the timing logic in event.js, around setInterval (not in your code, which is quite clear). I'm still getting familiar with the small details of Cube so I'm not sure if there are weird race conditions or if there's any duplicated work if there's more than one collector or evaluator handling the same event type?

Also - have you run into CPU issues with only a single instance?

godsflaw commented Mar 5, 2013

I think it is wise to dig and see if there are any race conditions. We did run into problems with inserts and especially queries against a single server, which may have caused me to make my change a little hastily. At quick inspection, it looked like everything was contained well within a single server instance. That is, it looks like one could get parallelism simply by running more collectors and evaluators, which made me think it was ideal for cluster.

We've been running this code for a few months and it's handling 250 largish documents a second with more than 10 indexes in one of the event collections. It produced the speedup we needed, and appears to run well.

It is worth noting that, for some of my stats where I present a percentage, very rarely I will get values back (and cache them) that are over 100%. This throws the cubism graphs scaling off. This bug, however, could exist in a number of places and is likely unrelated. Other than that, all my other limiting factors are related to MongoDB, and there are no other observable bugs.

Collaborator

RandomEtc commented Mar 5, 2013

Thanks for the extra notes. Let's keep this pull request open for the time being - if anyone has time to look more thoroughly at the use of intervals and timeouts in Cube and how they interact with node's cluster module then please post here. If I start looking into it I'll post back with an update.

Marsup commented Mar 7, 2013

We are also running horizontally scaled collectors and evaluators, but we did it before cluster made it to node. It seems to go well on our side too except for one thing : if you plug the collectors to a collectd, you must never send any "derive" event, since these events depend on the previous value to compute the actual value before inserting into mongo, you will of course have very unexpected values depending on the collector you reach :)

I would add to the matter that scaling the server itself is one good thing, but I was thinking that it would be even better to scale the computations themselves. With the cluster mode you improve your responsiveness with many clients, but the computations will still take a good amount of time individually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment