-
-
Notifications
You must be signed in to change notification settings - Fork 472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Postgres connection polling or query connection limit #885
Comments
The default number of processes is normally set to the number of processes available in the system. If you got a number that is larger than 20, it makes me wonder if you have an unusually large system or if something is going wrong with the detection. Independently of that your suggestion to cap the number of processes in the default is still good one. If concurrency gets too high, postgres does not fare too well in my experience. I'd probably go for 15. @pnorman, any opinions? |
The number of connections used is about num tables * num processes, so I'd expect 80 connections for 20 processes, which would fit in the defaults. The limit is based on # of threads, not # of cores, so it's pretty easy to exceed 20 with only two sockets.
Postgres' parallelism has gotten better with each version, and I would go much higher than 15. It'd depend on hardware, but I'd generally try 75% of hardware threads, keeping in mind that some stages are single-threaded and will not parallize at all. The problem that I see is that each process needs a connection for each table instead of one connection for the entire process. This is a real issue with the multi backend, where you might have 30+ tables. |
On further thought, 15 is still far too much. Parallel execution is used in two places now:
In both cases the number of parallel threads should not be too high. Indexing requires a lot of memory and having too many in parallel running actually slows down the process. Updates are generally run in parallel with other load (like rendering) and should therefore also not take up to many CPUs. I'd rather go for max 4 as the default. |
In my case the machine has 40 threads (2x 10-core w/ HT xeon). So, how many tables is that? 3? (nodes, ways, relations?). It easily broke beyond psql connection limit. I think I limited it to 20 processes with great results. I didn't run exhaustive tests with more and less processes though. |
The processing stage opens quite a few connections to the database. This can be avoided when no objects are pending. Improves on osm2pgsql-dev#885.
@pnorman Any objections having the default at 4 threads maximum? |
I could see that working now that it doesn't impact a normal import. |
I did some testing with a smaller osm.pbf and the PostGres default max_connections of 100 on a many cores processor.
osm2pgsql version 0.94.0 (64 bit id space) |
Thanks for testing. Version 1.0.0 might do a little bit better as we are now reusing some of the connections. It's still worth finally adding a limitation. |
Parallel processing is used only when indexes are created and in the second processing stage of updates. In both cases the number of parallel threads should not be too high. Indexing requires a lot of memory and having too many in parallel running actually slows down the process. Updates are generally run in parallel with other load (like rendering) and should therefore also not take up too many CPUs. Fixes osm2pgsql-dev#885.
The processing stage opens quite a few connections to the database. This can be avoided when no objects are pending. Improves on osm2pgsql-dev#885.
The default connection amount (--number-processes) of the tool currently seems to be too high for a default server configuration. Besides, not always a huge number of simultaneous connections works better than just a few.
The result now is that in a default postgresql server setup with, say, 100 max connections limit, which can be less depending on the system (see max_connection here), is being exceeded to import, for example the Canada data obtained from geofabrik.de exceeds by an unknown amount (it does not report before failing) with PostgreSQL 10, like this:
osm2pgsql
could either pool the connections to the database, throttling it as soon as the first gets denied, use a smaller amount of connections, or get the current max_connections from the database and use no more than that (or use below it by a threshold to avoid conflict by other running applications), to a minimum of one (1) connection to the database.The maximum connections set up to a database can be queried via the
show max_connections
command within the database. An example output from the client would be:(works at least for postfix 9 an 10 versions -- mine is 10.6, and this one is probably 9.1
In this example, albeit the server had a limit for 100 connections, the application failed with the connection limit error message above. Setting
--number-processes
to 20, for example, worked fine.As a quick fix for this, I'd say a number of 20 by default would be reasonable for most set-ups, while allowing a generous level of parallel queries running at once. Using the max connections amount available is not always true and should be analyzed in a case-by-case, so I'd say setting a very high connection amount is not good, by the server's defaults alone.
If the developer team things 20 is way too few, the choice for 100 would at least be likely to work in default postgres installations.
The text was updated successfully, but these errors were encountered: