-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fancy encoding for properties #26
Comments
I like the indexes trick from vector-tile-spec, it's pretty neat: https://github.com/mapbox/vector-tile-spec/blob/master/1.0.1/vector_tile.proto#L32-L39 — encode both names and values once; use a single integer array for pairs of indexes to both |
For stricter schema based formas (e.g., shapefile) going into this, I think it would make sense to encode names once, and then encode an array of values (using
Can you get away with the same approach for non-strict schema formats like GeoJSON? I was thinking in those cases it would be possible to scan the GeoJSON and extract all unique keys, then store a bitmask (encoded as repeated pseudocode:
|
@brendan-ward it may work but it's hard and slow to decode and is a bit complicated... What do you think about storing values as a global array as well? The way I commented above is simple and should be almost as efficient. |
@mourner The index approach you suggested could work great too; I was only offering up other alternatives I had been considering (should have made that more clear, sorry). Definitely not advocating for slower performance or complexity of implementation. I guess my concern would be that while |
Well, we could bump it to uint64 — it's still varint-encoded so shouldn't affect sizes, but I think uint32 (0 to 4294967295) is plenty. |
@mourner the more I look at your approach, the more I'm also convinced it is the right way to go, especially if you are planning to store unique string values (why not extend that to |
Closing in favor of #27 |
Since @mourner is checking this library out, might as well:
Could we encode properties in a more efficient way? Right now we're re-encoding property name & value for every single feature. This could save a lot of space if (a) we encode names once and then do an array or (b) we do some magic around structs
The text was updated successfully, but these errors were encountered: