Replies: 4 comments 8 replies
-
Absolutely yes - I'd very much like to move away from libprotobuf to something built on protozero. There'd certainly be a performance boost, and libprotobuf is more complex to build now that Google have shunted it into Abseil. I had a brief look at it a couple of months back; reconstructing vector tile features for |
Beta Was this translation helpful? Give feedback.
-
As well as all other formats that it supports. With all workloads involving geometry, libosmium reading PBFs is never a performance limiter. It's always building or processing geoms or IO stuff. |
Beta Was this translation helpful? Give feedback.
-
A few things re libosmium: @cldellow > For users on distros where it's not packaged, building it might be a hassle. There is nothing to build, libosmium is a header only library. And I wouldn't describe it as "heavyweight", because due to the header-only-architecture you can just use the parts you need and ignore the others. So libosmium is quite easy to use and integrate. It depends on the use case whether you need other libraries that libosmium uses, if you don't want to parse OSM XML files for instance, you don't have the dependency on expat. @cldellow > It sounds like MapBox's protozero would be a good alternative Interestingly enough I developed protozero exactly for the problem you are discussing here, to replace use of Googles protobuf libraries in libosmium because it had too much memory allocation overhead. So I can recommend using protozero, but I am biased. Just keep in mind that protozero is a very low-level library and you really have to understand the details of the Protobuf format to use it properly, it is very easy to shoot yourself in the foot or implement something that will not work in some corner case. @systemed > I'd tend to agree that libosmium doesn't bring us much over and above protozero - our use case is fairly specialised. It's interesting that you see it that way, because I created libosmium for exactly your use case (and many others of course). Reading OSM files quickly is what libosmium does and many tools read them several times. If you see the need to reimplement OSM file parsing, then libosmium is failing you in some way and I'd like to improve libosmium so it also works for you. Maybe this is not possible, because libosmium has made some decisions in its architecture which will affect its use too much. For instance it is implemented in a way that it works with all sorts of OSM file formats, hiding the details from the user of the library. Of course you can make some shortcuts, if that is not important to you. In the end there are some tradeoffs here. I'd rather hope you'll spend the time improving OSM file reading in libosmium where many people can use that improvement then rolling your own duplicating effort. But again, I am biased. @cldellow > I tried to read the planet using the osmium_count example... As @pnorman mentioned this is not the best example for measuring performance, but there is a problem somewhere in libosmium that limits multi-threading that I haven't found yet. There are circumstances (with large numbers of CPUs) where it seems threads are blocking each other for no reason. Debugging multithreading performance is hard, and it seems I have reached the limits of my capabilities there. (I don't even now how "real" that problem is, even measuring that is hard. Maybe this is just limits of the concurrency of several CPUs having to access the same memory or so.) I'd very much appreciate any help with that, if you can speed up libosmium, many people will benefit from that. @cldellow > Tilemaker can skip entire PrimitiveBlock structures when looking for nodes, ways and relations. By contrast, it looks like libosmium skips them at the PrimitiveGroup level, which still requires reading and decompressing the PrimitiveBlock structure. I'd have to look into the details here, its a long time ago that I wrote that code. But it might be one of those corner cases in the format where what you can do depends on whether you implement the spec or whether you implement something that works with the files currently out there. Libosmium tries very hard to make sure it can read any file that is created according to the spec. I believe this is important for compatibility with current and future software. Maybe I am wrong and there is some performance to be gained here, then we should improve libosmium. (Btw this is not a theoretical problem: We do have problems with some popular PBF reading software that doesn't parse PBFs according to the spec. This makes further development of the PBF format difficult, even though the format has extensibility builtin due to its Protobuf roots, but we can't use that because it breaks that software.) @cldellow > It doesn't look like libosmium exposes an interface that operates at the PrimitiveBlock level, ... That's true. This goes back to the "hide the file format" from the user architecture. But I am willing to rethink that, as I told the developer with the PR you mentioned. The reason that PR was not merged was because it was far from ready to be merged as acknowledged by that developer themselve. The PR had some interesting ideas but ultimately the developer gave up on it after some misunderstandings about what I needed from him to push this forward. So where does that leave us (in my opinion)? Id would be great if we can improve libosmium in a way that it would work for tilemaker. There will always be some (performance and other) tradeoffs using a general library rather than rolling your own, we are all aware of that. I am a big fan of creating libraries, in the short run it always seems to be more work, but in the long run its saves everybody some effort. I created protozero as its own library and didn't just add that code to libosmium, because I thought it works on its own and is reasonably self-contained. And this later allowed other uses like for vtzero. Maybe there is a way to re-write some of the PBF-parsing code in libosmium so it becomes usable outside the wider libosmium context and can stand on its own. So that at least libosmium and tilemaker can share that lower-level PBF code. Maybe inside libosmium, maybe it has to go into an intermediary small library? |
Beta Was this translation helpful? Give feedback.
-
I do like the idea of contributing back to libosmium, though! I might take a stab at some PRs that try to tackle the queue contention. For some broader context--as part of this discussion, I reviewed some of the libosmium code. It frankly seems to be written to a much higher level of quality than what I can currently write. :) So when I consider the task of proposing an architecture change that is suitable for a general-purpose library plus implementing that change to a suitable level of quality before I can unblock my immediate task, I find myself thinking I should start with something less ambitious. |
Beta Was this translation helpful? Give feedback.
-
Do you have any thoughts on switching away from libprotobuf?
It has nice developer ergonomics, but for the OSM use case of "very big structures with lots of strings", it feels like it has bad performance and memory characteristics -- strings get materialized as
std::string
, which causes a lot of heap churn.It sounds like MapBox's protozero would be a good alternative (you've mentioned in #60 (comment) and #502 (comment))
We could either roll our own reading on top of protozero, or consider depending on libosmium, who use protozero.
My sense is that libosmium is quite a general, heavyweight tool compared to Tilemaker's use case. For users on distros where it's not packaged, building it might be a hassle. Further, we'd have to have some knowledge of what it's doing under the covers--for example, on a brief skim of its documentation and GitHub issues, I think we'd need to set some configuration options to avoid it trying to do its own threading and kernel page cache handling.
On the other hand, it's potentially quite nice to have a ready-made OM and API for the OSM PBF format.
My instinct is it's probably better to reinvent the wheel and roll our own reading on top of protozero. (And eventually, use vtzero for reading/writing vector tiles, so we can drop libprotobuf entirely.)
Beta Was this translation helpful? Give feedback.
All reactions