TWKB feedback for discussion #5

twpayne · 2014-02-27T14:08:06Z

Refs openlayers/openlayers#1615 and openlayers/openlayers#1718.

The TWKB format is very promising, particularly in the use of delta encoding for small transfer sizes. However, I think there are a number of opportunities for improvement:

It is not possible to know if an arbitrary blob of binary data contains TWKB data. This could be fixed by adding a header.
When streaming data, it is impossible to know if enough data has read to decode a complete geometry/feature. The only way way to know if a geometry/feature is complete is to attempt to read the geometry/feature and see if you run out of bytes.
The variable precision is interesting, but needs to allow for negative values as well. For projections that use metres the coordinate values are large (e.g. EPSG:3857 has coordinate values from -2e7 to 2e7) and so precision 0 ("no decimal places") is still 1m accuracy, which is over-kill.
The format as-is does not allow any kind of extension. For example, with large geometries it might be worthwhile to transmit a pre-computed extent, and with features it might be worthwhile to transmit arbitrary attributes as well as the feature's numerical id.

Concrete suggests:

Use of a binary container format, e.g. RIFF or PNG-style chunks. Such a container format will give identification, structure, and extensibility.
Support delta-encoding as one of many ways of encoding coordinates. Other ways could include normal WKB, and an uncompressed encoding based on OpenLayers 3's flat coordinates structure. The advantage of flat coordinates structures are that they have virtually zero parsing overhead, are very friendly to garbage collectors, and are already in the correct format to transfer to the GPU for WebGL rendering.

nicklasaven · 2014-03-01T18:41:04Z

Thanks Tom, this is great things to discuss.

Warning, this will be a little long.

I have a little different perspective. I do not say the conclusions is right though.

I fully agree that the precission must support negative values. I think also supporting bounding boxes is good. Maybe also srid/epsg should be supported.

I look at twkb as ”tiny wkb”. By that I mean that it is a smaller and more flexible variant if wkb. I don't look at it as a complete transportation format wich would lead me to the same conlusion as you do that it needs a structured container. I think twkb can be carried by a container but not be the container itself.

I will try to explain why I think twkb shall only be the spatial part of the data.

By keeping it strictly a spatial format it is easier to make it fit for more different situations. I imagine using twkb as the format to read from a PostGIS database to a desktop gis for instance, instead of using wkb. In that scenario it is important that a single point is small. The smallest single point in twkb is now 4 bytes. A single point with real data is about 10 bytes. This query returns 9 (bytes).
select length(st_astwkb('point(123456 7654321)'::geometry,0));
Putting a point like this in some sort of container will easily double the size.
At the other end I think that a use case for twkb can be to send a mixture of different geometries from a file or directly from database to Open Layers for instance working as vector tiles. That is a much more complex situation where you want the geometries to be chunked as tiles for fast access, but you also want to be able to identify each and every single geometry.
The point is that I think it is easier to make twkb optimized for a broad usage if it only handles the spatial part.
Another reasaon for keeping the attributes out of twkb is that I hope we will be better to handle data in a more relational way also client side in the future. I do not often get the opportunity to work with gis questions at work, but now I am working on a small project. I have the municipalities in Norway, it is if I remember right 434 of them. Then I want to apply a lot of different data on those polygons. Then I have no reason to combine the attributes with the polygons already at the server. I just want to push the polygons as compressed as possible and then send the data separetly. The user then uses a dropdown to choose data set. Isn't that a quite common situation? In my opinion this ”spatial is specal” paradigm too often leads to very bloated data sets even server side. I think we all have seen tables like {geometry, road_name, region_name, country_name....}. I asked a question about that long time ago at gis.stackexchange:
http://gis.stackexchange.com/questions/3468
A complex and complete container I think will make it harder to handle the data in the most efficiant way because the format itself will dictate how you are supposed to pack and relate your data.

To say this in another way: I think twkb shall be a candidate for the geometry type in geopackage http://www.opengeospatial.org/standards/geopackage
not a competor to the geopackage itself.

From this perspective tthe discussion is what belong to twkb and what belong to an outher container.
I can see the point in putting the size in twkb when streaming data without any separator. In my example : http://sandbox.jordogskog.no//twkb_node/ The geometries is sent 1 by 1 through websocket so it is not a real streaming. Maybe the size should be included also from my perspective.
About supporting encoding methods liek the one in wkb or flat coordinates I disagree. If there is a situation when there is a point in encoding as wkb, why not use wkb?

One very interesting part of this is also how to look at vector data in general. When dealing with rasterdata and tiles it is important that the data you send is ”ready to show” when it reaches the client. That the work client side is minimal. That is because every zooming needs new data. I think that one of the most interesting things with handling vector data instead is that the client can use the same data sent from the server for many zoom levels. It will cost in simplification and so on but the format you get the data in from the server gets less important. Then it is more important that you get the data as compressed as possible since you will get a lot more details than you ask for when zoomed out. From that way of seeing it it is not htat bad if you have to read the data sequentially and parse it. You do that, put it in some way of storage and use it from there when the user zooms in.

Parsing twkb is also faster in most computors I have tested than parsing the same data as JSON.
http://sandbox.jordogskog.no/twkb_test/
The exception is chrome at iPad. There must be some bug. It works great at chrome on Linux, Android and Windows, but not iPad. But Safari works great.

Does it make sense to use this perspective of twkb that it only is for the spatial part?
To me it seems like a way to make it optimized for both server side and client side. I must admit that I have had more focus in the packaging part in PostGIS htan the usage client side. That is why discussions like this is very important if twkb shall be usable.

twpayne · 2014-03-03T12:16:16Z

Thanks for the reply.

OK, so if I understand correctly, the goal with TinyWKB is to be an alternative to WKB, i.e. a way of encoding a geometry in binary, with no encoding of attributes. In this case, does it make sense to remove the encoding of the "id" attribute both as applied to a single geometry (types 1, 2, 3, 4, 5, 6, 7) and when applied to multiple distinct geometries (types 21, 22, 23, 24)?

To avoid the "spatial is special" problem you highlight, how about using an existing compact format (e.g. MsgPack, Cap'n Proto, BJSON etc.)?

Have you compared the size of TinyWKB compared to the same data in WKB and gzip'd? Note that almost all clients and servers support transparent gzip'ing over HTTP, and that the gzip'ing can be done in fast, native code in the browser, in parallel with JavaScript execution.

nicklasaven · 2014-03-03T21:35:32Z

Not only an alternative to wkb. By adding the id as a part of the geometry type it is more independent of the container. The geometries can be aggregated to tiles or used as parts in a topological model client side and so on.

About the question if type 21 to 24 is relevant I think you have a point. I am not sure. They don't add so much value from just putting many individual geometries after each other. When I created them I thought that they maked sense as they are created from a aggregation function in PostGIS.
They might be of value if they have a two level id so the whole twkb of type 21 for instance have have one id independent of the id of the geometries it is build of.

About other compact formats as you mention I don't know. I haven't tested. But I believe that a format and compression tailored for it's purpose can be more efficient than a generic one. What I want to avoid is to say "spatial is so special" so we don't have to follow common good practice like good database design. It is very possible that I use the arguments for my own purposes only, but that is not my intention :-)

About gzipping wkb I haven't tried it before but I did now with one singele example so it doesn't give the whole trought. But I used the "areal types" layer that I use her:
http://sandbox.jordogskog.no/twkb_node/
as twkb it became 982 kb.
As wkb it ended as 6.8 Mb
When gzipping the wkb-file it was reduced to 4.8 Mb.

The compare is not fair since the twkb ony holds 5 decimals and the wkb have space for full double precision values. But you cannot reduce that space it occupies and obviously in y example at least the gzipping wasn't doing that big differnce. With geojson the zipping makes big difference. A zipped geojson is not that much bigger than twkb. At least less than the double. You can see it here:
http://sandbox.jordogskog.no/twkb_test
I didn't find any way to get the sizes of the compressed php-files but the dev-tools shows. And the uncompressed websockset resposes sizes is showed.

Another gain I think at least when reading the geometry directly from the database is that the server never have to handle the full blown uncompressed geometries. In PostGIS the geometry is read and parsed the same way as when it is used in any native spatial funtion like ST_Area or ST_Distance. From that it is directly encoded and compressed. It is never parsed to wkb or some other format. I think that can be of importance on high traffic servers. It is quite a lot faster to write a big table to a new table as twkb instead of wkb internally in the database. I read that as the data pumping inside the database is a good thing to reduce. That will not be reduced by methods where a web serer software for instance packs data into some great format.

But I guess you are right that the ungzipping is more efficient when we are in a javascript environment since it is done in native code. But also parsing wkb can be done in parallel with web workers, or ?

But as I have said earlier, this is only my thoughts I and my intention is that twkb shall survive my own ego so I hope that we find some way to make it good. This type of discussions is very important to get there.

nicklasaven · 2014-03-31T19:16:20Z

About opening for negative precision values:
As it stands now it is 4 bits used for the precision value. I think that should be enough even for a signed value.
Then it can be from -8 to +7

But how to encode it as signed?

Should we use the two's compliment method as ordinary integers or "zig-zag enconding" like the varInt used for the rest of twkb?

Or is there other options too?

nicklasaven · 2014-07-28T20:27:59Z

Tom, I suggest that we follow your suggestion and add size of the geometry as an option, see #7 and #8

twpayne · 2014-07-28T20:39:52Z

Good stuff @nicklasaven! Note that I'm no longer working in the open source/geospatial/ol3 world and so can no longer contribute to the project. All the best!

nicklasaven · 2014-07-28T21:04:30Z

That was bad news Tom.
ATB
Nicklas

twpayne mentioned this issue Feb 27, 2014

Adding TWKB format openlayers/openlayers#1615

Closed

pramsey closed this as completed Oct 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TWKB feedback for discussion #5

TWKB feedback for discussion #5

twpayne commented Feb 27, 2014

nicklasaven commented Mar 1, 2014

twpayne commented Mar 3, 2014

nicklasaven commented Mar 3, 2014

nicklasaven commented Mar 31, 2014

nicklasaven commented Jul 28, 2014

twpayne commented Jul 28, 2014

nicklasaven commented Jul 28, 2014

TWKB feedback for discussion #5

TWKB feedback for discussion #5

Comments

twpayne commented Feb 27, 2014

nicklasaven commented Mar 1, 2014

twpayne commented Mar 3, 2014

nicklasaven commented Mar 3, 2014

nicklasaven commented Mar 31, 2014

nicklasaven commented Jul 28, 2014

twpayne commented Jul 28, 2014

nicklasaven commented Jul 28, 2014