Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flatgeobuf as a source format #777

Open
andrewharvey opened this issue Jul 26, 2019 · 17 comments
Open

flatgeobuf as a source format #777

andrewharvey opened this issue Jul 26, 2019 · 17 comments

Comments

@andrewharvey
Copy link
Contributor

One of the performance bottlenecks of using Tippecanoe is generating the GeoJSON input, which is both expensive (performance wise) to produce and I assume for tippecanoe to read (though I must say, it's still very fast).

I commonly use ogr2ogr to read a range of input formats (frequently ESRI FileGDB) and pipe out GeoJSONSeq for tippecanoe to read, however this is slow.

The performance of flatgeobuf looks really compelling, if Tippecanoe could naively read it, that might prove to improve performance greatly for some use cases.

@gertcuykens
Copy link

gertcuykens commented Jul 26, 2019

Totally agree that geojson is not made for heavy lifting, nor does geobuf. But I would stick with something protobuf because that is already used for generating the tiles in the first place. (2de protobuf works better in golang but that is my problem) Also I like to suggest to reduce as much moving parts as possible especially when dealing with C++ code base :)

@e-n-f
Copy link
Contributor

e-n-f commented Jul 27, 2019

Thanks. I'll have to take a look at this.

@gertcuykens
Copy link

gertcuykens commented Jul 27, 2019

Although I am going to shoot myself in the foot here because this solution requires a lot of work to be able to generate flatgeobuf file in golang applications, I still believe in the bigger picture that geojson has to be replaced with something better. My preference was something protobuf like but if it going to be flatgeobuf instead then so be it. see flatgeobuf/flatgeobuf#7 asked for colaboration to make sure it get the best chance of succes.

@bjornharrtell
Copy link

Great to see this interest! :) I'm fairly confident flatgeobuf is well suited as a serialization format to replace geojson when performance matters. Note that flatgeobuf is based on flatbuffers not protobuf which is also for performance reasons that are well explained at http://google.github.io/flatbuffers/. Flatbuffers has go lang support so it's "just" a matter of wiring that up to my format. I have limited experience with go but I'm open to try to provide the language support for go ASAP if that means you are willing to be a (very) early adopter.

@gertcuykens
Copy link

gertcuykens commented Jul 27, 2019

Thanks, I will test but for some reason flatbuffers are only faster in C++, in golang it's turn out gogoprotobuf is faster then flatbuffers source: https://github.com/alecthomas/go_serialization_benchmarks I guess more time has been invested in protobuf optimisation because of other heavely used systems like grpc etc that all use protobuf. Also if I am not mistaken tippecanoe uses https://github.com/mapbox/protozero for its tile generation that is also way faster then regular protobuf if i am not mistaken

@bjornharrtell
Copy link

bjornharrtell commented Jul 27, 2019

Interesting. I think you are right, it's likely because the flatbuffers Go implementation hasn't got the attention of the reference C++ one. However, there is also a C and Rust implementation that seem to perform close to the C++ one so the potential is there and if I read the numbers correctly the flatbuffers Go implementation is not far behind gogoprotobuf even in its current state so perhaps it's not a dealbreaker?

I have been considering whether to use flatbuffers or protobuf as a base for some time but I remain convinced that flatbuffers has some desirable properties over protobuf, even if things like this is not making the choice easier. I'm not too keen on that protobuf need "special" implementations and/or considerations/constraints to be fast.

@gertcuykens
Copy link

gertcuykens commented Jul 27, 2019

Agree performance isn't going to be a key factor, more like the boilerplate quality that protoc and flatc creates. I will try to make some examples but last time I looked into this a good amount more code was needed to initialise / modify a flatbuffer object then a protobuf object in go if I recall correctly. Note that all the protobuf variations still use the same proto files so for me that's fair enough as long they don't break the wireformat and are just implementation details.

@bjornharrtell
Copy link

bjornharrtell commented Jul 27, 2019

I see your point but I'm not sure transforming between two protobuf schemas would be that much prettier and I guess potentially significantly slower than both flatbuffer to protobuf or flatbuffer to flatbuffer in the optimal case. However, I can definitely understand the want of using a single serialization method/format in the use case discussed and protobuf is of course more common so it's another sound argument to respin flatgeobuf on protobuf but for several reasons I'm not prepared to go that path right now without further consideration/motivation.

@bjornharrtell
Copy link

bjornharrtell commented Jul 27, 2019

Another thought - do you not have an abstraction layer for the transformation already between GeoJSON and protobuf? I've implemented such abstraction layers in the flatgeobuf language support for C++, .NET, Java and TS/TS and I imagine it should be possible to find a good target for Go too, perhaps https://github.com/paulmach/orb?

Of course for maximum performance transformation should be kept to a minimum, so for example in my GDAL driver implementation I'm accessing the flatbuffer directly and I can zero-copy at least the coordinate arrays because GDAL and flatgeobuf and share the same basic memory model for coordinate arrays.

@gertcuykens
Copy link

https://github.com/paulmach/orb is definitely the way to go for implementing geojson objects in go :) But you don't have to worry about a geojson interface I guess, my problem I think and allot of other people's problem is how to feed / handle for example this ridiculously giant osm planet pbf to tippecanoe. Add the moment I read and schuff osm stuff into geojson or geobuf files because we don't have another choice :) So any wireformat were tippecanoe agree on as input source is going to be a huge win. So for example if we all agree on a stream of small records based on FlatBuffer Schema (https://google.github.io/flatbuffers/flatbuffers_guide_tutorial.html) and I can go from osm.pbf to saving a file on disk without requiring 128GB of memory and tippecanoe can handle it to we are golden :D

@andrewharvey
Copy link
Contributor Author

^ all of this is good, but tangential to this ticket for potential flatgeobuf support in tippecanoe, since tippecanoe is written in C.

For me it's only worth supporting in tippecanoe if it's going to be faster to pipe flatgeobuf to tippcanoe from ogr2ogr, compare to piping GeoJSONSeq to tippecanoe.

It's still a nice to have for me (and low priority) since the core GeoJSON support is rock solid, isn't going anywhere since it's a hugely popular and supported format.

@bjornharrtell
Copy link

@andrewharvey hmm yes I was starting to get confused because I couldn't find any Go code in tippecanoe.

Raw read performance of FlatGeobuf in GDAL is about 30 times faster than GeoJSON. I will make some measurements for write performance soon but I expect it to be in the same ballpark (without spatial index generation).

Even if not high priority to rework this part of tippecanoe it would be a nice experiment and a possible motivation to get FlatGeobuf accepted in GDAL, so I'm interested in contributing if time permits.

@gertcuykens
Copy link

gertcuykens commented Jul 28, 2019

No Go has indeed nothing to do with tippecanoe itself, but more with building tools that generate input for tippecanoe. Like for example ogr2ogr does to generate input for tippecanoe. So ogr2ogr should be able to be replaced by as many tools or programing languages possible. For example nodejs is extensively used by mapbox for creating tools and want to be sure to point out ogr2ogr is just a small part of the bigger picture here when considering a universal format to feed to tippecanoe.

@1riggs
Copy link

1riggs commented Feb 9, 2021

Has this been added to tippecanoe? I see there is support for geobuf files but I assume that's only mapbox/geobuf files and not flatgeobuf, as produced by say ogr2ogr.

@bjornharrtell
Copy link

@1riggs unfortunately no progress AFAIK. I still don't have a finished reference implementation in Go. Would be fun to do but my interest in Go is unfortunately being eclipsed by Rust.

@bdon
Copy link

bdon commented Feb 14, 2022

I'm tracking this issue in a new repository at https://github.com/protomaps/tippecanoe/issues/2 - input is welcomed.

@bdon
Copy link

bdon commented Mar 27, 2022

This has been implemented in https://github.com/protomaps/tippecanoe , although with only the minimum Geometry Types and Column types support to convert all GDAL-created Natural Earth FGBs with identical output as GeoJsonSeq.

I'm seeing a general 5-10x speedup for the parsing phase vs GeoJsonSeq, not to mention that FGB creation should also be smaller and much faster than GeoJsonSeq. No streaming support yet though. Happy to help look at people's FGBs but will move convo into https://github.com/protomaps/tippecanoe/issues/2 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants