Conversion to C++ #187

pnorman · 2014-09-29T19:53:03Z

C++ osm2pgsql rewrite

This PR is substantial, as this diff is thousands of lines of additions and removals.

Summary

Rewrite in C++98
Uses boost
Adds unit tests
Performance increases
In-memory pending tracker
New multi backend, which allows separation of middle tables and rendering tables, and rendering tables to be updated on multiple databases at once, with different styles or lua transforms

C++98 conversion

osm2pgsql has been rewritten in object-oriented C++98 with use of boost libraries. Any server that someone is contemplating running as rendering server should support this, as other components require support of this. In particular, any setup that can compile Mapnik should compile this without issues.

Despite the substantial rewrite, parts of osm2pgsql are still using C idioms such as void * pointers.

C++ brings substantial advantages for writing and not having to manually manage memory and prevent reading out of bounds.

Note: Do a make distclean when switching to/from this branch

Fixes #156

Boost

The version requirement of Boost has been kept to one that Ubuntu 12.04 supports out of the box, and any machine able to run Mapnik 2 should have a recent enough version of Boost.

Unit test

Unit tests have been added for the portions rewritten in the conversion. This unit test framework can easily be extended.

Fixes #186

Validation

As our goal was not to change the osm2pgsql output, a validation of the output was conducted with a planet from 130904. The rendering tables were checked for number of rows, area/length, and checks were done with selected tags. Way/relations were checked for number of rows, and total number of array members.

The only variation seen was in middle tables, which was expected. To facilitate addition of the multi-backend it became necessary to move discarding delete tags from the middle to backend. This means that the middle tables have a complete representation of OSM data, regardless of style files.

In-memory pending tracker

Previously osm2pgsql stored a pending flag on the middle ways and relations tables. This was undesirable for a few reasons

The data didn't need to be persisted between runs
After any run, pending should have been false for all rows
MVCC wasn't necessary, so we weren't getting any advantages from PostgreSQL
Loading into the tables with pending=true then updating pending=false on a later stage lead to significant table and index bloat, including a partial index which was 3.6GB after import instead of empty
Having pending in-memory made parts of threading easier

On a full planet import, the pending information for all of the ways is approximately 30MB, so memory usage is not an issue.

The move from in-db to in-memory for the middle was a gain of 7% on single-threaded middle ways, on top of earlier gains in other stages, and overall gains from having smaller less bloated tables.

Fixes #105

Fixes #111

Multi-backend

The new multi-backend has been developed for situations involving multiple database servers. It allows the middle slim tables to be in a different database than the rendering tables. This allows the rendering tables to be replicated with PostgreSQL replication without also replicating the slim tables, the rendering database servers to be provisioned with less database disk space, and multiple styles to be updated from one osm2pgsql process.

Because of the increased complexity of configuration when multiple servers are involved, the multi-backend takes a JSON file as a configuration, rather than increasing the already excessive number of osm2pgsql command-line options.

This feature is still in development, and will be of interest to only those running multiple rendering databases.

Performance

Although the conversion to C++ was not done for performance reasons, in practice the use of standard library data structures allows for easier performance gains. Future optimizations should be significantly easier. Overall, there is a performance gain, probably due to in-memory pending.

Performance tests were carried out on the machine used for previous testing (Hetzner EX40-SSD, i7-4770, 32GB RAM, 2x240GB SSD in RAID0) as well as extracts on a lower-end machine (i5-3570 @ 3.4 GHz, 8GB RAM, 500GB 3.5" 7200RPM HDD). These represent one of the fastest possible machines for osm2pgsql imports, and a typical developer desktop with a single mechanical drive.

Unlike some other results, performance numbers do include index creation, as that time could be different thanks to in-memory pending changes.

Versions benchmarked were 8fc8964 (current master) and 4b6489a. Pending relations are not present in a standard import, so do not impact the benchmarks. Note: Index numbers are still generating, but should not significantly depend on the changes, and will certainly be done before discussion is complete.

Command lines used were

osm2pgsql/osm2pgsql --create --style osm2pgsql/default.style --flat-nodes flat.nodes --cache 20000 --slim --number-processes 8 planet-130904.osm.pbf for full planet tests.
osm2pgsql/osm2pgsql --create --style ~/osm2pgsql/default.style --cache 6000 --slim --number-processes 4 germany-140601.osm.pbf

Stage		`8fc8964`		`4b6489a`		Change
Stage		i7 SSD	i5 HDD	i7 SSD	i5 HDD	i7 SSD	i5 HDD
Processing	Nodes	759	689	1018	699	+34%	+1.5%
	Ways	4315	587	4566	580	+5.8%	-1.2%
	Relations	9863	12210	9187	11636	-6.9%	-4.7%
	Total	14937	13486	14771	12915	-1.1	-4.2%
Pending Ways		4989	2001	3810	1287	-24%	-36%
Ordering and indexes		9500	27976	###	###	+###%	+###%
Grand Total		29467	43519	###	###	+###%	+###%

Negative percentages are faster

Note: Germany only needed 2577MB of cache, but I didn't know this number until after I ran the import, but this doesn't effect the conclusions.

Known issues

The console output is different, and no better than before.

Added a very basic XML parsing unit test, which is run in the standard autoconf manner. Also ported the existing regression tests to be run from the autoconf test harness as well. Part of this meant pulling most of the code into a temporary library so that it can be re-used by all the binaries, including the unit tests. Hopefully this won't cause any major issues - it's all libtool-based, so should be portable, and is statically linked to the resulting binary so needs no installation.

No actual functionality changes should have been made. However, it has been a long time since C++ was really forward-compatible from C, so there was a bunch of syntax and type stuff that needed changing.

…he output, or rest of the statics yet.

…re, spatial_ref_sys) in the event that creating these extension fails. previously this was hardcoded for postgis bits to debian like system. for hstore it used to just fail outright

…ess correctly.

…function pointers

…rked them const.

…ns instead of passing osmdata_t object. clean up main by breaking out input function pointer into seprate function call. break out output_t creation into seprate function call. moved cleanup of pointers that osmdata_t holds into its destructor

…e reading

…ackend Conflicts: output-pgsql.cpp

remove unneded module, fixes errors on default Python installation

replace variable length arrays with std::vector

pnorman · 2014-10-03T20:50:16Z

@zerebubuth, let's stick this on the EWG agenda for the 6th. (17:30 UTC, #osm-ewg)

Update parse-pbf.c replacing "protobuf_c_default_allocator" with "NULL" so that it compiles with protobuf 1.0.1.

Use "Update parse-pbf.c for protobuf 1.0.X" commit from master branch

pnorman · 2014-10-07T22:02:21Z

configure.ac

@@ -121,6 +121,14 @@ fi
 dnl Check for pthread library
 AX_PTHREAD(,[AC_MSG_ERROR([no])])

+dnl Check for Boost libraries
+AX_BOOST_BASE([1.49], , [AC_MSG_ERROR([cannot find Boost libraries, which are are required for building avecado. Please install libboost-dev.])])


should be osm2pgsql, not avecado

You got me! Copy paste fail!
Am 07.10.2014 18:02 schrieb "Paul Norman" notifications@github.com:

In configure.ac:

@@ -121,6 +121,14 @@ fi
dnl Check for pthread library
AX_PTHREAD(,[AC_MSG_ERROR([no])])

+dnl Check for Boost libraries
+AX_BOOST_BASE([1.49], , [AC_MSG_ERROR([cannot find Boost libraries, which are are required for building avecado. Please install libboost-dev.])])

should be osm2pgsql, not avecado

—
Reply to this email directly or view it on GitHub
https://github.com/openstreetmap/osm2pgsql/pull/187/files#r18552486.

pnorman · 2014-10-15T15:57:34Z

Looking at the Coverity scan, most of the issues raised seem to be in largely untouched C-code and have to do with memory. Nothing blocking, though we should review it later

…1.48 The PPA only goes back to 1.54, so we use the standard packages for 1.48 and don't check between that and 1.54

pnorman · 2014-10-20T21:43:19Z

@apmon, any updates on the review? fyi, matt, kevin and myself are in the denver office this week

Check multiple boost versions with travis, drop boost requirement to 1.48

Prefix isnan with std

pnorman · 2014-10-24T01:53:16Z

@lonvia any thoughts from a gazetteer point of view?

lonvia · 2014-10-24T20:34:09Z

output-gazetteer.hpp

+    int CopyActive;
+    unsigned int BufferLen;
+
+    char Buffer[BUFFER_SIZE];


probably should be converted to a std::string just like in table.cpp

indeed. i flirted with changing more of this many times but thought it was too risky since i wasnt much of an expert with nominatim. if we could it would be really nice to use table_t directly in here.

The mechanics behind filling the table are more or less directly copied from output_pgsql, so shouldn't be really a problem to use table_t. But it's nothing urgent. I'd be happy to look into it, once this is merged.

lonvia · 2014-10-24T20:52:58Z

I've mainly looked at output-gazetter now and skimmed the other changes shortly. I've put some comments in the code. Some other random musings:

Please check for and remove trailing white space.
buffer.hpp/.cpp does not seem to be used anywhere. There are includes in output-psql and output-multi but I cannot find any reference to the buffer class.
In output-gazetter: any particular reason why split_tags() and escape_array_record() are global methods instead of class methods?
Could naming conventions for variables, classes, members etc. be made consistent? (I'm aware that the C code wasn't any better but no reason to keep bad habits up.)

pnorman · 2014-10-25T07:24:05Z

Please check for and remove trailing white space

Taking this on.

Also increments the version to 0.87.0 Closes osm2pgsql-dev#187 Fixes osm2pgsql-dev#156 Fixes osm2pgsql-dev#186 Fixes osm2pgsql-dev#105 Fixes osm2pgsql-dev#111

zerebubuth and others added 30 commits May 8, 2014 17:26

Added first simple tests for middle.

8f6a507

Added dependency on Boost C++ utility libraries.

32b1b84

Added test for postgres middle.

9b61fae

Updated README for new Boost dependency and tests.

d1d21d0

Fix typo in temporary path.

df03ca0

D'oh, need to be superuser before creating tablespaces.

dee8d77

Renamed C files to C++ in preparation for destaticification.

3966274

No actual functionality changes should have been made. However, it has been a long time since C++ was really forward-compatible from C, so there was a bunch of syntax and type stuff that needed changing.

Made middle structure use virtual functions. Have not yet touched t…

eb95391

…he output, or rest of the statics yet.

update regression test to search for contrib sql files (postgis, hsto…

a2ac1df

…re, spatial_ref_sys) in the event that creating these extension fails. previously this was hardcoded for postgis bits to debian like system. for hstore it used to just fail outright

fix merge conflict

74f4650

Fix bug in use of strncpy().

0d28634

Propagate exit status information from the tests out to the test harn…

5a3e921

…ess correctly.

Changed output_t instances to be of class type with vtables.

84ed64f

remove a bit of c syntax for initializing the output_t pgsql structs …

df7678b

…function pointers

merge unstructification from mla

54a1912

Removed some static variables from output-pgsql.

d2001ec

Made enable_way_area not static.

cebccd0

Make s_table not static.

75d9f19

Made a note that flagsname and tagflags are OK as statics, and ma…

5baf333

…rked them const.

Wildcard matching should operate on const char arrays.

521c7bd

add comments to main for input file usage. add comments for style fil…

0eadfd0

…e reading

Removed a lot of static, global state variables from output_pgsql_t.

97d48cf

Merge remote-tracking branch 'MapQuest/kk_multiback' into mla_multi_b…

c2341d3

…ackend Conflicts: output-pgsql.cpp

start constifying and destaticifying middle-pgsql

310a475

Removed static allocations from methods in output_pgsql_t.

44a8e7e

Don't need extern any more.

9405ba4

Add very simple escaping test.

1b9d83e

Reserve correct amount of space for pgsql escaping.

09d39bb

zerebubuth and others added 3 commits October 3, 2014 11:45

Merge pull request #23 from alex85k/py-nopwd

d2d77a1

remove unneded module, fixes errors on default Python installation

replace variable length arrays with std::vector

aeff786

Merge pull request #24 from alex85k/array-to-vector

6b1a6cb

replace variable length arrays with std::vector

rachekalmir and others added 2 commits October 5, 2014 14:19

Update parse-pbf.c for protobuf 1.0.X

69bbb51

Update parse-pbf.c replacing "protobuf_c_default_allocator" with "NULL" so that it compiles with protobuf 1.0.1.

Merge pull request #25 from alex85k/from-master

6814348

Use "Update parse-pbf.c for protobuf 1.0.X" commit from master branch

boerngen-schmidt mentioned this pull request Oct 6, 2014

Huge performance loss when using --extra-atributes #189

Open

pnorman mentioned this pull request Oct 7, 2014

Turn off autovacuum during import #155

Closed

pnorman reviewed Oct 7, 2014
View reviewed changes

kevinkreiser added 2 commits October 15, 2014 10:50

dont use a freed pointer in a log line

e98e404

Merge remote-tracking branch 'origin/cpp_conversion' into cpp_conversion

edae78b

kevinkreiser and others added 3 commits October 15, 2014 14:59

and dont use tabs where you should use space

9128e8a

Prefix isnan with std

ce4c860

Check multiple boost versions with travis, drop boost requirement to …

d291f74

…1.48 The PPA only goes back to 1.54, so we use the standard packages for 1.48 and don't check between that and 1.54

pnorman added 2 commits October 20, 2014 16:22

Merge pull request #28 from pnorman/travis_boost

5c19247

Check multiple boost versions with travis, drop boost requirement to 1.48

Merge pull request #27 from pnorman/std_isnan

64e0572

Prefix isnan with std

lonvia reviewed Oct 24, 2014
View reviewed changes

pnorman added a commit to pnorman/osm2pgsql that referenced this pull request Oct 25, 2014

Merge C++ branch

739d422

Also increments the version to 0.87.0 Closes osm2pgsql-dev#187 Fixes osm2pgsql-dev#156 Fixes osm2pgsql-dev#186 Fixes osm2pgsql-dev#105 Fixes osm2pgsql-dev#111

pnorman closed this Oct 25, 2014

pnorman mentioned this pull request Oct 25, 2014

Segmentation fault after pending ways #191

Closed

pnorman deleted the cpp_conversion branch October 25, 2014 08:56

pnorman mentioned this pull request Oct 25, 2014

Buffer.cpp not used #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion to C++ #187

Conversion to C++ #187

pnorman commented Sep 29, 2014

pnorman commented Oct 3, 2014

pnorman Oct 7, 2014

kevinkreiser Oct 7, 2014

pnorman commented Oct 15, 2014

pnorman commented Oct 20, 2014

pnorman commented Oct 24, 2014

lonvia Oct 24, 2014

kevinkreiser Oct 24, 2014

lonvia Oct 24, 2014

lonvia commented Oct 24, 2014

pnorman commented Oct 25, 2014

Conversion to C++ #187

Conversion to C++ #187

Conversation

pnorman commented Sep 29, 2014

C++ osm2pgsql rewrite

Summary

C++98 conversion

Boost

Unit test

Validation

In-memory pending tracker

Multi-backend

Performance

Known issues

pnorman commented Oct 3, 2014

pnorman Oct 7, 2014

Choose a reason for hiding this comment

kevinkreiser Oct 7, 2014

Choose a reason for hiding this comment

pnorman commented Oct 15, 2014

pnorman commented Oct 20, 2014

pnorman commented Oct 24, 2014

lonvia Oct 24, 2014

Choose a reason for hiding this comment

kevinkreiser Oct 24, 2014

Choose a reason for hiding this comment

lonvia Oct 24, 2014

Choose a reason for hiding this comment

lonvia commented Oct 24, 2014

pnorman commented Oct 25, 2014