New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion to C++ #187
Conversion to C++ #187
Conversation
Added a very basic XML parsing unit test, which is run in the standard autoconf manner. Also ported the existing regression tests to be run from the autoconf test harness as well. Part of this meant pulling most of the code into a temporary library so that it can be re-used by all the binaries, including the unit tests. Hopefully this won't cause any major issues - it's all libtool-based, so should be portable, and is statically linked to the resulting binary so needs no installation.
No actual functionality changes should have been made. However, it has been a long time since C++ was really forward-compatible from C, so there was a bunch of syntax and type stuff that needed changing.
…he output, or rest of the statics yet.
…re, spatial_ref_sys) in the event that creating these extension fails. previously this was hardcoded for postgis bits to debian like system. for hstore it used to just fail outright
…function pointers
…rked them const.
…ns instead of passing osmdata_t object. clean up main by breaking out input function pointer into seprate function call. break out output_t creation into seprate function call. moved cleanup of pointers that osmdata_t holds into its destructor
…ackend Conflicts: output-pgsql.cpp
remove unneded module, fixes errors on default Python installation
replace variable length arrays with std::vector
@zerebubuth, let's stick this on the EWG agenda for the 6th. (17:30 UTC, #osm-ewg) |
Update parse-pbf.c replacing "protobuf_c_default_allocator" with "NULL" so that it compiles with protobuf 1.0.1.
Use "Update parse-pbf.c for protobuf 1.0.X" commit from master branch
@@ -121,6 +121,14 @@ fi | |||
dnl Check for pthread library | |||
AX_PTHREAD(,[AC_MSG_ERROR([no])]) | |||
|
|||
dnl Check for Boost libraries | |||
AX_BOOST_BASE([1.49], , [AC_MSG_ERROR([cannot find Boost libraries, which are are required for building avecado. Please install libboost-dev.])]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be osm2pgsql, not avecado
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You got me! Copy paste fail!
Am 07.10.2014 18:02 schrieb "Paul Norman" notifications@github.com:
In configure.ac:
@@ -121,6 +121,14 @@ fi
dnl Check for pthread library
AX_PTHREAD(,[AC_MSG_ERROR([no])])+dnl Check for Boost libraries
+AX_BOOST_BASE([1.49], , [AC_MSG_ERROR([cannot find Boost libraries, which are are required for building avecado. Please install libboost-dev.])])should be osm2pgsql, not avecado
—
Reply to this email directly or view it on GitHub
https://github.com/openstreetmap/osm2pgsql/pull/187/files#r18552486.
Looking at the Coverity scan, most of the issues raised seem to be in largely untouched C-code and have to do with memory. Nothing blocking, though we should review it later |
…1.48 The PPA only goes back to 1.54, so we use the standard packages for 1.48 and don't check between that and 1.54
@apmon, any updates on the review? fyi, matt, kevin and myself are in the denver office this week |
Check multiple boost versions with travis, drop boost requirement to 1.48
Prefix isnan with std
@lonvia any thoughts from a gazetteer point of view? |
int CopyActive; | ||
unsigned int BufferLen; | ||
|
||
char Buffer[BUFFER_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should be converted to a std::string just like in table.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed. i flirted with changing more of this many times but thought it was too risky since i wasnt much of an expert with nominatim. if we could it would be really nice to use table_t directly in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mechanics behind filling the table are more or less directly copied from output_pgsql, so shouldn't be really a problem to use table_t. But it's nothing urgent. I'd be happy to look into it, once this is merged.
I've mainly looked at output-gazetter now and skimmed the other changes shortly. I've put some comments in the code. Some other random musings:
|
Taking this on. |
Also increments the version to 0.87.0 Closes osm2pgsql-dev#187 Fixes osm2pgsql-dev#156 Fixes osm2pgsql-dev#186 Fixes osm2pgsql-dev#105 Fixes osm2pgsql-dev#111
C++ osm2pgsql rewrite
This PR is substantial, as this diff is thousands of lines of additions and removals.
Summary
C++98 conversion
osm2pgsql has been rewritten in object-oriented C++98 with use of boost libraries. Any server that someone is contemplating running as rendering server should support this, as other components require support of this. In particular, any setup that can compile Mapnik should compile this without issues.
Despite the substantial rewrite, parts of osm2pgsql are still using C idioms such as
void *
pointers.C++ brings substantial advantages for writing and not having to manually manage memory and prevent reading out of bounds.
Note: Do a
make distclean
when switching to/from this branchFixes #156
Boost
The version requirement of Boost has been kept to one that Ubuntu 12.04 supports out of the box, and any machine able to run Mapnik 2 should have a recent enough version of Boost.
Unit test
Unit tests have been added for the portions rewritten in the conversion. This unit test framework can easily be extended.
Fixes #186
Validation
As our goal was not to change the osm2pgsql output, a validation of the output was conducted with a planet from 130904. The rendering tables were checked for number of rows, area/length, and checks were done with selected tags. Way/relations were checked for number of rows, and total number of array members.
The only variation seen was in middle tables, which was expected. To facilitate addition of the multi-backend it became necessary to move discarding delete tags from the middle to backend. This means that the middle tables have a complete representation of OSM data, regardless of style files.
In-memory pending tracker
Previously osm2pgsql stored a pending flag on the middle ways and relations tables. This was undesirable for a few reasons
On a full planet import, the pending information for all of the ways is approximately 30MB, so memory usage is not an issue.
The move from in-db to in-memory for the middle was a gain of 7% on single-threaded middle ways, on top of earlier gains in other stages, and overall gains from having smaller less bloated tables.
Fixes #105
Fixes #111
Multi-backend
The new multi-backend has been developed for situations involving multiple database servers. It allows the middle slim tables to be in a different database than the rendering tables. This allows the rendering tables to be replicated with PostgreSQL replication without also replicating the slim tables, the rendering database servers to be provisioned with less database disk space, and multiple styles to be updated from one osm2pgsql process.
Because of the increased complexity of configuration when multiple servers are involved, the multi-backend takes a JSON file as a configuration, rather than increasing the already excessive number of osm2pgsql command-line options.
This feature is still in development, and will be of interest to only those running multiple rendering databases.
Performance
Although the conversion to C++ was not done for performance reasons, in practice the use of standard library data structures allows for easier performance gains. Future optimizations should be significantly easier. Overall, there is a performance gain, probably due to in-memory pending.
Performance tests were carried out on the machine used for previous testing (Hetzner EX40-SSD, i7-4770, 32GB RAM, 2x240GB SSD in RAID0) as well as extracts on a lower-end machine (i5-3570 @ 3.4 GHz, 8GB RAM, 500GB 3.5" 7200RPM HDD). These represent one of the fastest possible machines for osm2pgsql imports, and a typical developer desktop with a single mechanical drive.
Unlike some other results, performance numbers do include index creation, as that time could be different thanks to in-memory pending changes.
Versions benchmarked were 8fc8964 (current master) and 4b6489a. Pending relations are not present in a standard import, so do not impact the benchmarks. Note: Index numbers are still generating, but should not significantly depend on the changes, and will certainly be done before discussion is complete.
Command lines used were
osm2pgsql/osm2pgsql --create --style osm2pgsql/default.style --flat-nodes flat.nodes --cache 20000 --slim --number-processes 8 planet-130904.osm.pbf
for full planet tests.osm2pgsql/osm2pgsql --create --style ~/osm2pgsql/default.style --cache 6000 --slim --number-processes 4 germany-140601.osm.pbf
Negative percentages are faster
Note: Germany only needed 2577MB of cache, but I didn't know this number until after I ran the import, but this doesn't effect the conclusions.
Known issues
The console output is different, and no better than before.