Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Recluster rendering servers #86
According to @cquest the OSM FR servers gain over 25%-50% rendering throughput when they recluster them after about a year by reducing table and index bloat. This can be done without a full outage and only stopping updates, but we probably want to wait until #79 is done to increase capacity. The reclustering depends on database IO and CPU, which are not maxed out.
The overall pan is to create a new copy of tables, build new indexes, then replace old tables. Because update frequency is more important for osm.org than other hosts, I'd recommend doing it slightly differently. Instead of reclustering all the tables, do one table, resume updates and let them catch up, do another, etc.
Starting with the points table and progressing by table size minimizes the disk usage. I think there's enough free space that it doesn't matter, but this is a best practice.
My recommendation is the following is done on both servers, starting with whichever has gone the longest since the initial import
DROP SCHEMA recluster; DROP SCHEMA backup;
In case of a problem a rollback can be done by restoring the table from the backup schema.
If diffs are mistakenly restarted early the state file needs to be reset and diffs re-run.
Why not wait for a reimport?
The OpenStreetMap Carto Lua branch which will require a reimport with hstore is not out of development. We have a few open issues before we can merge and are lacking in deveoper time for these issues. Once the lua branch is merged we will still be releasing 2.x releases which will work with the old database to allow time to change over.
Doing a reimport with the current settings is in some ways better, but requires either a full outage of the server, a fair amount of database disk space, or the possibility of updates being down for an extended time, and the certainty that updates will be stopped for about a day.
 If the old DB slim tables are dropped this saves room, but stops any updates on the old DB
I'm running a test on the server for testing old-style multipolygons. It's got faster single-threaded performance and absurdly faster drives, but it should give an indication. I'll add times when it's done.
This seems to be in the wrong place, unless you're seriously suggesting that we turn all that into a chef recipe...
Personally I really don't want to get involved with doing something that insanely complicated and risky.
We have a third tile server coming hopefully in a matter of weeks at which point my plan was to reimport everything on all three servers anyway. I was assuming that would be with the new style but if it has to be with the old one then fine.
It sounds like, from the above instructions, that this can be done "online" (apart from stopped diffs) and doesn't interrupt rendering so could it potentially be automated?
If it's possible to write a script which does this, and incorporate that into a quarterly / bi-annual / annual run, then that would be great. It sounds like otherwise it's a lot of manual CLI work interspersed with periods of waiting.
Also, doing each table like this means it only requires the overhead of duplicating the largest table, right?
Yes, doing it offline is much simpler.
Yes, if there's enough room and IO capacity I'd do all the tables at once in parallel, but this way reduces the extra space required.
Well there's no way that something that complicated and long running can be run directly from chef so the only way it could be done is by writing a script that chef installs which runs from cron.
But it's so intricate that I'm really reluctant to run it automatically with no human supervision. I haven't read it all but the fact there are 13 steps and vast amounts of custom SQL scares the pants of me. What are the recovery measures if any step goes wrong would be my first question? What is the risk of breaking something in a way such that there is no way to back out?
In any case there are all sorts of non-automatable things in there like "check tiles are still rendering" and "wait for updates to complete".
Most of the SQL is taken from openstreetmap-carto.
Recovery steps are there, but not clear enough.
Based on feedback, I'll write a script that handles the SQL. I think I can even handle locking in case updates were not stopped.
We do run many "long-running" processes on various servers; database dumps, planet file conversions, etc... in an automated, regular fashion. So we just need to try and make sure that it interferes with the regular rendering as little as possible, and doesn't wedge itself too often.
I was wondering if it's possible to wait for
The first time I did a reclustering, it reduced I/O by 80% :)
Then I can see I/O slowly going up again because of fragmented data in postgresl pages.
I'm now stopping the updates, recluster one table, let updates run again, then do another one, etc... so all the process is done online and updates are not stopped too long.
Regarding slim tables, there is a huge benefit to reindex planet_osm_ways... its GIN index gets larger than the data part !
I'm also planning to test pg_repack which is supposed to do the same as RECLUSTER but without requiring exclusive access lock on the table.
Don't forget the GRANT on the new tables ;)
On more step is ext4 defrag on the postgresql files... it helps the kernel merging I/Os to more than 8KB pages which even for many SSD are very small chunks of data where they are not so efficient. This is theorical, I've not checked the real benefit.
I thought updates were a daemon, not a cron job?
Writing the lock file would stop it if you write it between runs. If you write it during a run, it'll do nothing since it'll clean up at the end.
pg_repack won't work for the rendering tables as it requires a UNIQUE index on a NOT NULL column. It would be useful for the slim tables, and on the API database.
Yup, you're right, it is. Although it's very simple so would be easy to convert to cron if we wanted. But it seems like there's no reason to at the moment.
What I had in mind was something like:
The only, rather major, flaw is that this isn't how we run
That's a shame. I'm almost tempted to see how long an