Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze varints and zigzag encoding #1

Closed
olt opened this issue Jun 11, 2013 · 8 comments
Closed

Analyze varints and zigzag encoding #1

olt opened this issue Jun 11, 2013 · 8 comments

Comments

@olt
Copy link

olt commented Jun 11, 2013

The 0.01 draft needs at least three bytes for all values above 128. Varint encoding as used by protobuf only uses two bytes for values between 128 and 16384.

https://developers.google.com/protocol-buffers/docs/encoding

@nicklasaven
Copy link
Contributor

Well, the first one will need three because you need one byte to give the new size. But the coordinates after that uses only 2 bytes until next change.

@nicklasaven
Copy link
Contributor

That link was very interesting about zig zag encoding. But if I understand right you sacrify one bit to tell there is a next byte. Is that correct? Then one byte will only give you the range from -64 to 63. Or do I miss the point?

@olt
Copy link
Author

olt commented Jun 11, 2013

Ok, I missed that the size is for all following coordinates.

For signed integers it is right that on byte can store from -63 to 64 (IIRC). But you always store numbers in the smallest possible way and do not need to change the sizes back and forth.

@nicklasaven
Copy link
Contributor

Yes, you are right. Very interesting. Here I think it comes down to statistics about real world data. What I have noticed is that it is quite big differences in overview maps and more detailed maps. More detailed maps often have only a few meters between the points and hen almost all vertex points is stored as 1 byte even with a precision of 1 decimal. With 1 decimal it can hold values from -12.7 to + 12.7.

@olt
Copy link
Author

olt commented Jun 11, 2013

Yeah, only a comparison with real world data will show which method is better. But varints could be used for all numbers (IDs and also first coordinates) and should benefit for small geometries.

@nicklasaven
Copy link
Contributor

Yes, that is an important point about ID and first coordinate.

Another question is what is fastest to parse in javascript, since that is another bottle neck.

hmm, it just struck me.
With this varint, you will sacrifice one bit per byte, not only for the whole value.
so 16 bits will only give -8191 to +8191 instead of -32767 to +32767 with INT16

@nicklasaven
Copy link
Contributor

I have added varInt as mthod 1 in the spec and very soon in the PostGIS and javascript implementations.

It looks very promising :-)

@nicklasaven
Copy link
Contributor

VarInt is now the one and only method in TWKB spec and PostGIS implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants