Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of Layer 4 Hash Balancing, fixes several issues and supporting reverse proxy ips #17

Open
wants to merge 29 commits into
base: master
from

Conversation

@wzrdtales
Copy link

wzrdtales commented Dec 14, 2014

Foreword

This edit does not affect the behavior in any way. The backward compatibility is ensured.

Fixing following Issues

Fixes #15
Fixes #6
Probably Fixes #8, but there maybe is a need to look for more than only the ip in a specific header.

Introduction

As there was a need to support variables on layer 4 of the OSI Model, because of relying on IPs doesn't work if something is in between node and the user.

Prolog to Proxied connections

If you're using a proxy, like you do in many constellations, e.g. using a varnish as cache, using a Cloudflare like CDN or
using DDoS Protections which are build on a so called reverse proxy Server.
There are many cases where you may not be able to avoid proxying the users requests,
before they reach the node Application.

The Problem:

If we proxy any connection, the real IP will be lost. The original implementation of sticky-sessions
worked only on layer 3 of the OSI Model. But the Information we need, is right now on layer 4.

Note: Only versions smaller than 0.11.4 and greater than 0.9.6 are supported.
The reason for this is that the behavior of onread in net.js has changed:
https://github.com/joyent/node/blob/v0.11.14-release/lib/net.js#L492-L514

Reasoning the usage of onread core component

As I 'catch' the packet, to parse informations from the http header, I needed to push the packet back to the reading queue as it has never been received.

This was done by accessing the private element _handler which contains the method onread, which gets called whenever a file descriptor receives a buffer.
https://github.com/joyent/node/blob/v0.10.33-release/lib/net.js#L496-L560

If there is any other way to accomplish this, I would be pleased if you would tell me :)
Accessing this component is not really beautiful, as it changes often and is not stable and/or backward compatible. That is also the reason why this fix only works in the rnage of 0.9.7 to 0.11.

README.md Outdated
});
server.on( 'connection', function( socket )
{

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

Style.

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

Still applies ;)

worker = workers[ipHash % workers.length];
worker.send('sticky-session:connection', c);
});
internals.seed = internals.random.integer(0x0, 0x80000000);

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

May be we could use crypto.randomBytes(4).readUInt32BE(0, true)?

/**
* Access 'private' object _handle of file decriptor to republish the read packet.
*/
function node96Republish( fd, data )

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

What 96 means here?

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

Nvm, I see.

@wzrdtales

This comment has been minimized.

Copy link
Author

wzrdtales commented Jan 12, 2015

should mean node 0.9.6

*/
function node96Republish( fd, data )
{
fd._handle.onread( new Buffer( data ), 0, data.length );

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

Do we really care about node 0.9.x anymore?

var index =version.indexOf('.');

//Writing version to internals.version
internals.version.sub = Number( version.substr( index + 1 ) );

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

Number => parseInt(..., 10)

@wzrdtales

This comment has been minimized.

Copy link
Author

wzrdtales commented Jan 12, 2015

well I do not care about 0.9.x but some may still rely on it.

worker.send( { cmd: 'sticky-session:connection', data: c.data }, fd );
}

function sticky(options, callback) {

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

This is a huuuge function. What if we could create a class and make parts of this function and instance methods, shared variables - a in-object properties (this.prop) ?

* Validating the version, as onread has changed multiple times.
* https://github.com/joyent/node/blob/v0.11.4-release/lib/net.js#L487-L519
*/
if( internals.version.major > 0 )

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

if (...) {: space after if, no space after (, no space before ) and { on the same line.

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

This applies to every if statement ;)

* https://github.com/joyent/node/blob/v0.11.4-release/lib/net.js#L487-L519
*/
if( internals.version.major > 0 )
throw new Error( 'sticky-sessions using layer4 is to old to be used with node major version ' + internals.version.major + ' please check for an udpate!' );

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

80 column limit, please.

strData = strData[ 0 ];

//Send ackknownledge + data and real ip adress back to master
process.send( { cmd: 'sticky-session:ack', realIP: strData, data: data }, msgData );

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

({ and no need in space character before ).

* Reading data once from file descriptor and extract ip from the header.
*/
msgData.once( 'data', function( data )
{

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

function (...) { all on same line without spaces after ( and before)`.

{

/**
* Reading data once from file descriptor and extract ip from the header.

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

80 column limit.

msgData.once( sync.event, function()
{
internals.republishPacket( msgData, msg.data );
} );

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

No space before ).

@@ -1,10 +1,12 @@
{
"name": "sticky-session",
"description": "Sticky session balancer based on a `cluster` module",
"version": "0.1.0",
"version": "0.2.0",

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

Please leave it as it is for now :) I'll bump the version after landing this thing.

README.md Outdated
@@ -3,14 +3,83 @@
A simple performant way to use [socket.io](http://socket.io/) with a
[cluster](http://nodejs.org/docs/latest/api/cluster.html).

## Technical

Sticky Sessions are Hash Balanced by IP. Optionally layer4 header informations, for proxied connections, can be hashed.

This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

80 column limit everywhere.

README.md Outdated

Specifies on which event sticky-sessions should **listen** if **isSyncable** is set to **true**.


This comment has been minimized.

Copy link
@indutny

indutny Jan 12, 2015

Owner

Why 3 newlines?

}

function StickyAgent(options, callback)
{

This comment has been minimized.

Copy link
@indutny

indutny Mar 24, 2015

Owner

Please put { on a previous line.

{
if (msg.cmd === 'sticky-session:ack') {

patchConnection(msg, c, self);

This comment has been minimized.

Copy link
@indutny

indutny Mar 24, 2015

Owner

Why not make it a method? self.patchConnection(msg, c)

This comment has been minimized.

Copy link
@wzrdtales

wzrdtales Mar 24, 2015

Author

I don't want it to be a method that is accessable, thats the reason why i use it as function.

console.error('sticky-session: worker died');
spawn(i);
});

self.workers[i].on('message', function(msg, c)
{

This comment has been minimized.

Copy link
@indutny

indutny Mar 24, 2015

Owner

Please put { on a previous line.

var workers = [];
for (var i = 0; i < num; i++) {
// Master will spawn `num` workers
self.workers = [];

This comment has been minimized.

Copy link
@indutny

indutny Mar 24, 2015

Owner

You have similar thing in a prototype... why?

function StickyAgent(options, callback)
{
var version = process.version.substr(1);
var index =version.indexOf('.');

This comment has been minimized.

Copy link
@indutny

This comment has been minimized.

Copy link
@wzrdtales

wzrdtales Mar 24, 2015

Author

Wow I did this while I was not at my notebook and did this online in the online editor, to not forgot to do it... One time no linter and you oversee this so fast

/**
* Worker process
*/
listener: function(msg, socket) {

This comment has been minimized.

Copy link
@indutny

indutny Mar 24, 2015

Owner

I'd rather do it StickyAgent.prototype.listener = function listener(msg, socket) {. You have one excessive indent because of the prototype = { ... } thing.

This comment has been minimized.

Copy link
@wzrdtales

wzrdtales Mar 24, 2015

Author

haha, well ok. I wanted to save StickyAgent.prototype. for every method, but I'm fine with that.

* Reading data once from file descriptor and extract ip from the
* header.
*/
if( socket )

This comment has been minimized.

Copy link
@indutny

indutny Mar 24, 2015

Owner

if (socket)?

This comment has been minimized.

Copy link
@wzrdtales

wzrdtales Mar 24, 2015

Author

Style failure again... yeah, also I can omit this ceck here. Don't know why I checked for it to be defined anymore.

@indutny

This comment has been minimized.

Copy link
Owner

indutny commented Mar 24, 2015

Some style nits for you, sorry for this :( .

@wzrdtales

This comment has been minimized.

Copy link
Author

wzrdtales commented Mar 24, 2015

@indutny no problem, will take a look at this, this evening

@ZloyDyadka

This comment has been minimized.

Copy link

ZloyDyadka commented Mar 24, 2015

Waiting for update :)

@wzrdtales wzrdtales mentioned this pull request Mar 24, 2015
@SmithPR

This comment has been minimized.

Copy link

SmithPR commented Apr 1, 2015

Quick question for those of you working on this: Is there a reason why it would not work to use an http.server instance instead of a net.server and wait for the 'request' event, which has header information already parsed? One such header is 'x-forwarded-for', which can be used to tell the original IP in the case where a request arrived via proxy. ( http://stackoverflow.com/a/8107922 )

Would the solution mentioned above not work, or did you decide against using a layer 4 server implementation such as http.server for performance/compatibility reasons?

@wzrdtales

This comment has been minimized.

Copy link
Author

wzrdtales commented Apr 2, 2015

1 performance, 2 there were some problems related to websockets, I'm sorry I can't tell you anymore which, but this is a while ago now when I first implemented a http.server solution. But I wanted to realize this without the http stack anyway for optimal performance.

StickyAgent.prototype.seed = 0;
StickyAgent.prototype.header = 'x-forwarded-for';
StickyAgent.prototype.republishPacket = node96Republish;
StickyAgent.prototype.sync = {

This comment has been minimized.

Copy link
@indutny

indutny Apr 2, 2015

Owner

Please move this and above to the constructor ;)

This comment has been minimized.

Copy link
@wzrdtales

wzrdtales Apr 2, 2015

Author

Ok, will do this in the evening ;)

socket.once('data', function(data) {

var strData = data.toString().toLowerCase();
var searchPos = strData.indexOf(self.header);

This comment has been minimized.

Copy link
@indutny

indutny Apr 2, 2015

Owner

Why not do:

this.headerRe = new RegExp(
  '\r\n' + this.header.replace(/([^a-z0-9])/g, '\\$1') + '\s*:\s*(.*?)\r\n',
  'g');

strData.match(this.headerRe);

This comment has been minimized.

Copy link
@wzrdtales

wzrdtales Apr 2, 2015

Author

Because:

    var strData = ["Host: www.google.de",
    "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0",
    "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language: de-DE,de;q=0.5",
    "Accept-Encoding: gzip, deflate",
    "DNT: 1"
    ].join("\r\n");

    var header = "host";

    var start = new Date();

    for( var o = 0; o < 10; ++o )
    {

        for( var i = 0; i < 100000; ++i )
        {
         var headerRe = new RegExp(
           '\r\n' + header.replace(/([^a-z0-9])/g, '\\$1') + '\s*:\s*(.*?)\r\n',
           'g');

         strData.match(headerRe);
     }


     console.log("Regex: " + (new Date() - start))
     start = new Date();


     for( var i = 0; i < 100000; ++i )
     {
         var searchPos = strData.indexOf(header);
         var endPos = 0;

         searchPos = strData.indexOf(':', searchPos) + 1;
         strData = strData.substr(searchPos);

         endPos = strData.search(/\r\n|\r|\n/, searchPos);
         strData = strData.substr(0, endPos).trim();
     }               

     console.log("No Regex: " + (new Date() - start))
     start = new Date();

 }

Regex: 86
No Regex: 31
Regex: 69
No Regex: 27
Regex: 64
No Regex: 28
Regex: 71
No Regex: 29
Regex: 63
No Regex: 26
Regex: 63
No Regex: 27
Regex: 66
No Regex: 28
Regex: 64
No Regex: 28
Regex: 64
No Regex: 27
Regex: 63
No Regex: 28

If you want it via regex, which is naturally heavier I can change it. Your decision, I would prefer searching for the index and read.

This comment has been minimized.

Copy link
@wzrdtales

wzrdtales Apr 2, 2015

Author

Actually I see, the regex is faster if I store the RegExp object permanently and use it. I will do this too this evening.

var strData = ["Host: www.google.de",
"User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: de-DE,de;q=0.5",
"Accept-Encoding: gzip, deflate",
"DNT: 1"
].join("\r\n");

var header = "host";

var start = new Date();
var headerRe = new RegExp(
   '\r\n' + header.replace(/([^a-z0-9])/g, '\\$1') + '\s*:\s*(.*?)\r\n',
   'g');


for( var o = 0; o < 10; ++o )
{

    for( var i = 0; i < 100000; ++i )
    {
        strData.match(headerRe);
    }


console.log("Regex: " + (new Date() - start))
start = new Date();


for( var i = 0; i < 100000; ++i )
{
 var searchPos = strData.indexOf(header);
 var endPos = 0;

 searchPos = strData.indexOf(':', searchPos) + 1;
 strData = strData.substr(searchPos);

 endPos = strData.search(/\r\n|\r|\n/, searchPos);
 strData = strData.substr(0, endPos).trim();
}               

console.log("No Regex: " + (new Date() - start))
start = new Date();

}

Regex: 24
No Regex: 32
Regex: 11
No Regex: 27
Regex: 9
No Regex: 28
Regex: 10
No Regex: 28
Regex: 9
No Regex: 28
Regex: 9
No Regex: 29
Regex: 10
No Regex: 27
Regex: 10
No Regex: 28
Regex: 10
No Regex: 29
Regex: 9
No Regex: 29

This comment has been minimized.

Copy link
@wzrdtales

wzrdtales Apr 3, 2015

Author

Ok while I looked into this, I realized what you're suggesting is not only slower because there was something missing, as I thought match would just return the (.*) matched value. Haven't used regex that much in javascript yet and suspected it behaves similar to other languages. It also fails if I try to use it with exec with a global defined regex, because of a race condition. For test I prepared the following benchmark:

var strData = ["Host: www.google.de",
"User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0",
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: de-DE,de;q=0.5",
"Accept-Encoding: gzip, deflate",
"DNT: 1"
].join("\r\n").toLowerCase();

var header = "host";
var copy;

var start = new Date();
var headerRe = new RegExp(
 '(\r\n|^)' + header.replace(/([^a-z0-9])/g, '\\$1') + '\s*:\s*(.*?)\r\n',
 'g');


for( var o = 0; o < 10; ++o )
{
  var missedMatches = 0;

  for( var i = 0; i < 100000; ++i )
  {
    var matches = headerRe.exec(strData);
    if ( matches === null )
      ++missedMatches;
    else
      copy = matches[2].trim();

  }


  console.log("Regex exec without global defined regex: " + (new Date() - start) + "ms on 100000 Records with " + missedMatches + "missed matches");
  start = new Date();

  missedMatches = 0;

  for( var i = 0; i < 100000; ++i )
  {
    var headerRe = new RegExp(
     '(\r\n|^)' + header.replace(/([^a-z0-9])/g, '\\$1') + '\s*:\s*(.*?)\r\n',
     'g');
    var matches = headerRe.exec(strData);
    if ( matches === null )
      ++missedMatches;
    else
      copy = matches[2].trim();
  }


  console.log("Regex exec with local defined regex: " + (new Date() - start) + "ms on 100000 Records with " + missedMatches + "missed matches");
  start = new Date();

  missedMatches = 0;

  for( var i = 0; i < 100000; ++i )
  {
    copy = strData.match(headerRe);

    if( copy === null )
    {
      ++missedMatches;
      continue;
    }

    copy = copy[0];

    var searchPos = copy.indexOf(':', 0) + 1;

    var endPos = copy.search(/\r\n|\r|\n/, searchPos);
    copy = copy.substr(searchPos, endPos).trim();
  }


  console.log("Regex match: " + (new Date() - start) + "ms on 100000 Records with " + missedMatches + "missed matches");
  start = new Date();

  missedMatches = 0;


  for( var i = 0; i < 100000; ++i )
  {
   var searchPos = strData.indexOf(header);
   var endPos = 0;

   if (searchPos === -1) {
    ++missedMatches;
    continue;
  }

  searchPos = strData.indexOf(':', searchPos) + 1;

  endPos = copy.search(/\r\n|\r|\n/, searchPos);
  copy = copy.substr(searchPos, endPos).trim();
}

console.log("search and replace: " + (new Date() - start) + "ms on 100000 Records with " + missedMatches + "missed matches");
start = new Date();

missedMatches = 0;

}

Results in:
Regex exec without global defined regex: 27ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 106ms on 100000 Records with 0missed matches
Regex match: 83ms on 100000 Records with 0missed matches
search and replace: 34ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 22ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 104ms on 100000 Records with 0missed matches
Regex match: 87ms on 100000 Records with 0missed matches
search and replace: 28ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 17ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 104ms on 100000 Records with 0missed matches
Regex match: 86ms on 100000 Records with 0missed matches
search and replace: 28ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 17ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 104ms on 100000 Records with 0missed matches
Regex match: 82ms on 100000 Records with 0missed matches
search and replace: 26ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 17ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 102ms on 100000 Records with 0missed matches
Regex match: 83ms on 100000 Records with 0missed matches
search and replace: 27ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 16ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 103ms on 100000 Records with 0missed matches
Regex match: 82ms on 100000 Records with 0missed matches
search and replace: 28ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 16ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 102ms on 100000 Records with 0missed matches
Regex match: 85ms on 100000 Records with 0missed matches
search and replace: 26ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 17ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 101ms on 100000 Records with 0missed matches
Regex match: 81ms on 100000 Records with 0missed matches
search and replace: 28ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 16ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 102ms on 100000 Records with 0missed matches
Regex match: 83ms on 100000 Records with 0missed matches
search and replace: 27ms on 100000 Records with 0missed matches
Regex exec without global defined regex: 17ms on 100000 Records with 50000missed matches
Regex exec with local defined regex: 102ms on 100000 Records with 0missed matches
Regex match: 83ms on 100000 Records with 0missed matches
search and replace: 27ms on 100000 Records with 0missed matches

@SmithPR

This comment has been minimized.

Copy link

SmithPR commented Apr 2, 2015

@wzrdtales Thanks, I can see now that you have also replaced the HTTP Server insance in the master process (as in the current version of this module) with a TCP server. That would certainly help with supporting other protocols such as web socket. Have you tested your header parser with protocols other than HTTP, or is it unnecessary since Web socket connections are initiated by HTTP upgrade requests?

The reason for my interest here is that I am working on an a multiprocess application and am attempting to implement sticky-sessions... Unfortunately, using this plugin directly will probably not be an option, since I need access to communication channels with the child processes in order to meet other requirements.

@indutny

This comment has been minimized.

Copy link
Owner

indutny commented Apr 2, 2015

@BSSolo interesting point.

@wzrdtales how will all of these work with, for example, SPDY?

@wzrdtales

This comment has been minimized.

Copy link
Author

wzrdtales commented Apr 2, 2015

@BSSolo This would work with any protocol, as long as you have header informations you can parse. In the case of websockets you always have the initiation which has the http headers.
You would need to add header informations to your initial packet(s) if you're using other protocols.

@indutny If you don't mind I would take time to answere this more thoroughly :) But I'm right now on the way home now, so expect an answer, but a couple of hours away. And if we're talking about SPDY, we are on the way talking about HTTP 2 which in favour google dropped SPDY.

@SmithPR

This comment has been minimized.

Copy link

SmithPR commented Apr 2, 2015

@wzrdtales In that case, I wish I knew what problems related to web servers you encountered with the http.server approach. Not because there's anything wrong with the way you are doing it, but for my own selfish purposes as mentioned above.

HTTP proxies can be used to load balance web socket connections, after all.

@wzrdtales

This comment has been minimized.

Copy link
Author

wzrdtales commented Apr 2, 2015

@BSSolo Unfortunately as I said, I don't know it that specific anymore. I can quickly type down this today or tomorrow, I have 2 Projects on my schedule today for which I want to finish features. So maybe I get it today to reproduce this, if not then tomorrow. We have a public holiday tomorrow here in germany, so tomorrow is definitely a day I have much more time then usual.

@SmithPR

This comment has been minimized.

Copy link

SmithPR commented Apr 2, 2015

@wzrdtales Thank you, just don't stress out too much about this. You have already been very helpful!

@SmithPR

This comment has been minimized.

Copy link

SmithPR commented Apr 2, 2015

@wzrdtales One thing to note for the http.server approach, Node automatically rejects all 'upgrade' requests unless there's an 'upgrade' event handler set. Listeners on both 'request' and 'upgrade' would be needed.

Edit: Seeing what might be a real problem now. Once the 'request' event etc. has fired, it's not trivial to write information back to the socket such that the same event will be fired again on the child process's httpServer. Writing the output of each new socket to a buffer and then feeding buffer contents back into the socket when a layer-4 event is fired miiight work, or it might not, depending on the implementation of http.server.

@wzrdtales

This comment has been minimized.

Copy link
Author

wzrdtales commented Apr 3, 2015

@BSSolo I haven't begun to reproduce this yet, working on some other things. But to give some hints, from what I remember the problem was.
I had the http server balancing working, transferring the socket from one worker to another another and feed him again with the data. But something with the websockets did not work, I don't know it that specific anymore though.

edit: I will also look to take some time today, or this weekend to answer one of the questions from before. Didn't got it yesterday.

@SmithPR

This comment has been minimized.

Copy link

SmithPR commented Apr 3, 2015

@wzrdtales Don't worry about it, scanning the header from the TCP socket's event is likely the simplest solution anyway, once all of the corner cases are handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.