Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle negative varint #56

Closed
wants to merge 1 commit into from
Closed

Conversation

kjvalencik
Copy link
Collaborator

Resolves #55

@@ -374,30 +375,73 @@ Pbf.prototype = {
}
};

function readVarintRemainder(val, pbf) {
var buf = pbf.buf, b;
function readVarintRemainder(l, u, p) {
Copy link
Collaborator Author

@kjvalencik kjvalencik Jul 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function uses single character variables names to keep it under 600 characters. Longer functions are not inlined by v8, which drastically decreases performance.

@@ -74,12 +76,12 @@ test('readVarint64', function(t) {
test('readVarint & writeVarint handle really big numbers', function(t) {
var buf = new Pbf();
var bigNum1 = Math.pow(2, 60);
var bigNum2 = Math.pow(2, 69);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a valid number to serialize in varint. The 64th bit on a signed integer is the sign bit. If we try to use the full 10th bytes, there is no way to distinguish the sign bit from the rest of the number.

@kjvalencik
Copy link
Collaborator Author

Unfortunately, binary literals aren't added until ES2015, which, IMHO, would make this easier to read / understand. Any thoughts on using constants as a pseudo-binary literal? E.g.,

// Instead of 0x80
var b10000000 = parseInt('10000000', 2);

@mourner
Copy link
Member

mourner commented Jul 22, 2016

Whoa. I somehow missed all the notifications for this PR, sorry for a late reply!

Since this is a substantial change, it will probably affect benchmark numbers — how do those look? Also, how much of a breaking change the isUnsigned parameter is? Should we worry about it, or try to add the change without changing the public API (e.g. with additional methods)?

@kjvalencik
Copy link
Collaborator Author

kjvalencik commented Jul 22, 2016

The isUnsigned value parameter isn't very important unless you are trying to encode uint64 that are greater than 2^63.

This did seem to affect the benchmarks somewhat. I think it may be because of the larger functions.

# Old
decode vector tile with pbf x 902 ops/sec ±1.47% (87 runs sampled)
encode vector tile with pbf x 589 ops/sec ±1.70% (84 runs sampled)

# New
decode vector tile with pbf x 875 ops/sec ±1.07% (84 runs sampled)
encode vector tile with pbf x 515 ops/sec ±1.38% (85 runs sampled)

I'm not sure on the precise impacts because the benchmarks are very noisy on my machine--possibly due to other processes running.

@mourner
Copy link
Member

mourner commented Jul 22, 2016

trying to encode uint64 that are greater than 2^63.

@kjvalencik interesting! Are you relying on this in your use case? I wonder whether it's something that could justify a 13% write performance drop, or something that we could document as a limitation instead.

@kjvalencik
Copy link
Collaborator Author

I'm not relying on it anywhere. Since we are always going to a 64-bit float anyway, having that large of a value would result most likely result in loss of precision anyway.

However, my use case does depend on putting negative numbers in varint fields. If I have some time, I'll try tweaking the code and see if I can determine where the slowdown is occurring. Even if the code is more verbose, there might be some shortcuts for smaller numbers.

@mourner
Copy link
Member

mourner commented Jul 22, 2016

Can we drop the isUnsigned parameter from the public API while retaining the fix then?

@kjvalencik
Copy link
Collaborator Author

I pushed a version without the isUnsigned parameter at kjvalencik@8ea9edb

Interestingly, it must have been something about handling unsigned that was causing the slowdown. After removing it, the new benchmarks are somewhat faster than before.

decode vector tile with pbf x 920 ops/sec ±1.16% (86 runs sampled)
encode vector tile with pbf x 615 ops/sec ±1.28% (83 runs sampled

If you would like to go forward with that, I will add some documentation in the README and update this PR. Thanks!

}
}

if (toNum(low, high, true) < Math.abs(val)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narrowed it down to this line causing the performance issue. Changing this to the following fixes the speed regression.

    if (val >= 0x8000000000000000 || val < -0x8000000000000000) {
        throw new Error('Given varint doesn\'t fit into 10 bytes');
    }
decode vector tile with pbf x 920 ops/sec ±1.04% (89 runs sampled)
encode vector tile with pbf x 613 ops/sec ±1.11% (83 runs sampled

@mourner
Copy link
Member

mourner commented Jul 22, 2016

@kjvalencik fantastic! Let's clean up, rebase and it's good to merge. I wonder what caused the speedup though — the vector tile in the bench shouldn't have many big varints if at all.

@kjvalencik
Copy link
Collaborator Author

@mourner I was thinking about backwards compatibility. Even if I remove the isUnsigned flag, it breaks backwards compatibility because previously all values were treated as unsigned.

Instead, what do you think about flipping it to isSigned. That way, if the value isn't provided, it will be false and the API doesn't change. But, it's still available for new built files to take advantage of.

@kjvalencik
Copy link
Collaborator Author

@mourner Flipped isUnsigned to isSigned so that the API is backwards compatible.

@@ -120,9 +120,9 @@ function compileFieldRead(ctx, field) {
case 'bool': return prefix + 'Boolean()';
case 'enum':
case 'uint32':
case 'uint64':
case 'uint64': return prefix + 'Varint(false)';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just use readVarint() for more compact code, or will it affect performance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any performance cost of coercing to boolean. 👍

@mourner
Copy link
Member

mourner commented Aug 25, 2016

Can you rebase on master please?

@kjvalencik
Copy link
Collaborator Author

Yes, but I won't have a chance until Monday. Thanks!

@mourner mourner mentioned this pull request Aug 25, 2016
@mourner
Copy link
Member

mourner commented Aug 25, 2016

-> #60

@mourner mourner closed this Aug 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants