Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

realloc is slow #77

Closed
jfromaniello opened this issue Mar 20, 2017 · 6 comments
Closed

realloc is slow #77

jfromaniello opened this issue Mar 20, 2017 · 6 comments

Comments

@jfromaniello
Copy link

jfromaniello commented Mar 20, 2017

It seems to me that doing this:

const pbf = new PBF();
Response.write(myobject, pbf);

Is slow because it does a bunch of realloc internally, i.e. it creates a bigger a Uint8Array and copy the content inside.

If the length of your messages is similar or constant it seems better to do this:

var currentEncodingLength = 16;
function encode(message) {
  const pbf = new PBF(currentEncodingLength);
  Response.write(message, pbf);
  const result = pbf.finish();
  const length = Math.ceil(result.length / 16) * 16;
  currentEncodingLength = Math.max(currentEncodingLength, length);
  return result;
}

I got a big win by doing this.

@kjvalencik
Copy link
Collaborator

You could likely achieve a similar win by using a pool of buffers.

@jfromaniello
Copy link
Author

@kjvalencik thanks! How could I do that?

@kjvalencik
Copy link
Collaborator

In node, Buffers are already pooled unless you use a slow buffer. In this case, your example would already be pooling.

I'm not sure if browsers do this, but you could also explicitly do it with an object pooling library. In that case, you may get better performance pooling Pbf instances. Something like (warning, untested code):

const Pbf = require("pbf");
const Bluebird = require("bluebird");
const MemoryPool = require("memory-pool");

function PbfPool({ size = 0, growBy = 256 } = {}) {
	const pool = new MemoryPool(size, () => new Pbf(), growBy);

	return Bluebird
		.resolve(pool.allocate())
		.disposer(pbf => pool.free(Object.assign(pbf, {
			pos : 0
		})));
}

const getPbf = PbfPool();

function encodeResponse(message) {
	return Bluebird.using(getPbf(), pbf => Response.write(message.pbf).finish());
}

@jfromaniello
Copy link
Author

jfromaniello commented Mar 20, 2017

I am working on node.js (v4) currently, I am doing this:

function encode(message) {
  const pbf = new PBF();
  Response.write(message, pbf);
  return new Buffer(pbf.finish());
}

Then I end up writing the result in a socket (every message is length prefixed). Is this the right thing to do?

I think the new Buffer call actually copies the the Uint8Array into a buffer. So, I tested a bunch of different things:

// 300,000 ops/secs:
const pbf = new PBF();
Response.write(message, pbf);
return new Buffer(pbf.finish());

// 600,000 ops/secs:
const pbf = new PBF(32);
Response.write(message, pbf);
return new Buffer(pbf.finish());

// 1,100,000 ops/secs:
const pbf = new PBF(typedArray);
Response.write(message, pbf);
return new Buffer(pbf.finish());
//typedArray is a Uint8Array(32) defined at the beginning of the application.

Given that the code is always sync, running on the same thread etc and the slice of the array is copied by new Buffer, I think the last one is safe but I would like to hear your opinion and if there is a better way in node.js

@mourner
Copy link
Member

mourner commented Mar 20, 2017

The Buffer class in Node 4.x and higher extends Uint8Array. So you can create a buffer without an expensive copy. For example, using this module which also explains the problem well. It seems like a usage issue, not something we should change on the pbf side.

@mourner mourner closed this as completed Mar 20, 2017
@jfromaniello
Copy link
Author

@mourner I agree with you, it seems to me that this is not an issue on pbf. I was using this issue to try to figure the fastest way to encode with your help.

I hope someone find this info useful, but here is how I understand it so far.

  • new PBF() accepts a Uint8Array. If you pass something else, it will be used to instantiate Uint8Array.
  • pbf.finish() returns an instance of Uint8Array. It is usually a "subarray" of the internal array.
  • in node.js 4.x and higher you can convert an Uint8Array doing Buffer.from(array). This does not iterate and copy, the buffer points to the same memory than the array.
  • if you do new Buffer(typedArray) in node.js, it does copy the array.

with all the above in mind, I wrote this benchmark of the different ways I could imagine encoding a message into a buffer:

const Benchmark = require('benchmark');
const suite     = new Benchmark.Suite();
const PBF       = require('pbf');
const toBuffer  = require('typedarray-to-buffer');

//this is the PBF generated file:
const Response  = require('../messages/Response').Response;

//this is a sample message:
const response = {
  request_id: 'abcdefg',
  'take': {
    conformant: false,
    remaining: 10,
    reset: 20,
    limit: 30
  }
};

//this is used in the 3rd test
const array = new Uint8Array(32);


const poolSize = 1024 * 5000;
var pool = new Uint8Array(poolSize);

suite
.add('Encoding with reallocs', function() {
  const pbf = new PBF();
  Response.write(response, pbf);
  return toBuffer(pbf.finish());
})
.add('Encoding with an starting size', function() {
  const pbf = new PBF(32);
  Response.write(response, pbf);
  return toBuffer(pbf.finish());
})
.add('Encoding on an existing array and copy to buffer', function() {
  const pbf = new PBF(array);
  Response.write(response, pbf);
  //this copy the array into a new buffer iterating over
  return new Buffer(pbf.finish());
})
.add('Encoding using a pool', function() {
  const pbf = new PBF(pool);
  Response.write(response, pbf);

  const view = pbf.finish();

  //remove the used part of the pool
  pool = pool.subarray(view.length);
  if (pool.length < 32) {
    // console.log('allocating pool', pool.length);
    pool = new Uint8Array(poolSize);
  }

  //convert the ArrayBufferView to buffer.
  return toBuffer(view);
})
.on('cycle', function(event) {
  console.log(String(event.target));
})
.on('complete', function() {
  console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run({ 'async': true });

The result of this benchmark on my machine is this:

» node bench/encode.js
Encoding with reallocs x 254,456 ops/sec ±3.27% (76 runs sampled)
Encoding with an starting size x 472,597 ops/sec ±1.60% (77 runs sampled)
Encoding on an existing array and copy to buffer x 1,215,514 ops/sec ±1.54% (84 runs sampled)
Encoding using a pool x 746,853 ops/sec ±2.56% (78 runs sampled)
Fastest is Encoding on an existing array and copy to buffer

Does this benchmark looks good to you, can you think on a faster way?

My code is here:
https://github.com/limitd/protocol/blob/same_typed_array/bench/encode.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants