New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.o5m really slow compared to .osm.pbf #473

Closed
leblowl opened this Issue Oct 26, 2015 · 7 comments

Comments

Projects
None yet
2 participants
@leblowl

leblowl commented Oct 26, 2015

So with o5m I only get node processing speeds of around 50k/s. Converting the same dataset to osm.pbf with osmconvert results in speeds of around 1300k/s for nodes. I also get this funky osm2pgsql Warning: unknown .o5m dataset id: 0xdb warning but it never seems to effect the overall import result. I have tried importing various filtered subsets of planet_osm, filtered using osmfilter. Is this normal behavior? I was expected o5m to be just as fast. Please let me know if ya'll need any more info, thank you!

@leblowl

This comment has been minimized.

Show comment
Hide comment
@leblowl

leblowl Oct 26, 2015

Using a command like this: osm2pgsql -d gis -S osm2pgsql.style ../../data/streets.o5m --slim --cache 36000 --flat-nodes node.cache --hstore --number-processes 16

leblowl commented Oct 26, 2015

Using a command like this: osm2pgsql -d gis -S osm2pgsql.style ../../data/streets.o5m --slim --cache 36000 --flat-nodes node.cache --hstore --number-processes 16

@pnorman

This comment has been minimized.

Show comment
Hide comment
@pnorman

pnorman Oct 26, 2015

Collaborator

What version? Also, how many hardware threads does your hardware have, and why 36000 cache? The latter two are unlikely to be the cause of any problems, but that much cache is unnecessary and not what --help recommends.

Collaborator

pnorman commented Oct 26, 2015

What version? Also, how many hardware threads does your hardware have, and why 36000 cache? The latter two are unlikely to be the cause of any problems, but that much cache is unnecessary and not what --help recommends.

@leblowl

This comment has been minimized.

Show comment
Hide comment
@leblowl

leblowl Oct 26, 2015

Version 0.89.0-dev (64bit id space)

CPU info:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Model name:            Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Stepping:              7
CPU MHz:               2148.203
CPU max MHz:           2500.0000
CPU min MHz:           1200.0000
BogoMIPS:              3991.36
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23

I was unsure if extra cache made a difference as I saw quite a few other examples around the web with larger cache sizes.

I've used --number-processes 10 too, not much difference. I think mostly the drives & then various psql settings I have no clue about are my bottleneck.

leblowl commented Oct 26, 2015

Version 0.89.0-dev (64bit id space)

CPU info:

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Model name:            Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
Stepping:              7
CPU MHz:               2148.203
CPU max MHz:           2500.0000
CPU min MHz:           1200.0000
BogoMIPS:              3991.36
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5,12-17
NUMA node1 CPU(s):     6-11,18-23

I was unsure if extra cache made a difference as I saw quite a few other examples around the web with larger cache sizes.

I've used --number-processes 10 too, not much difference. I think mostly the drives & then various psql settings I have no clue about are my bottleneck.

@leblowl

This comment has been minimized.

Show comment
Hide comment
@leblowl

leblowl Oct 26, 2015

I just pulled the latest changes & rebuilt. With osm.pbf of a streets extract I am now getting speeds up to 3500 k/s on nodes so far. With o5m, it still sits around 50 k/s...

leblowl commented Oct 26, 2015

I just pulled the latest changes & rebuilt. With osm.pbf of a streets extract I am now getting speeds up to 3500 k/s on nodes so far. With o5m, it still sits around 50 k/s...

@pnorman

This comment has been minimized.

Show comment
Hide comment
@pnorman

pnorman Oct 27, 2015

Collaborator

Testing with -O null, 22000 cache, slim, flat-nodes and a 150923 planet I get 4301 k nodes/s with PBF and 1915 k nodes/s with o5m. This happened to be with a cmake build with included -O2, so I retried with git master, and got similar results, although slower in both cases.

There is however a difference between my test dataset and yours, I was testing on a full planet, and you've got a subset.

Could you try osm2pgsql -d gis -S osm2pgsql.style -O null --flat-nodes nodes.bin --slim --cache 22000 --number-processes 1 with

  • the roads as PBF
  • the roads as o5m
  • the planet as PBF
  • the planet as o5m (osmconvert planet-latest.osm.pbf -o=planet-latest.o5m, takes about 15-30 minutes to convert)

You don't need to let it complete all the nodes, if you abort it after 60 seconds that's fine. Just do it the same on each test.

Collaborator

pnorman commented Oct 27, 2015

Testing with -O null, 22000 cache, slim, flat-nodes and a 150923 planet I get 4301 k nodes/s with PBF and 1915 k nodes/s with o5m. This happened to be with a cmake build with included -O2, so I retried with git master, and got similar results, although slower in both cases.

There is however a difference between my test dataset and yours, I was testing on a full planet, and you've got a subset.

Could you try osm2pgsql -d gis -S osm2pgsql.style -O null --flat-nodes nodes.bin --slim --cache 22000 --number-processes 1 with

  • the roads as PBF
  • the roads as o5m
  • the planet as PBF
  • the planet as o5m (osmconvert planet-latest.osm.pbf -o=planet-latest.o5m, takes about 15-30 minutes to convert)

You don't need to let it complete all the nodes, if you abort it after 60 seconds that's fine. Just do it the same on each test.

@pnorman

This comment has been minimized.

Show comment
Hide comment
@pnorman

pnorman Nov 7, 2015

Collaborator

We've changed the o5m parser to use libosmium, can you check if there are still any issues?

Collaborator

pnorman commented Nov 7, 2015

We've changed the o5m parser to use libosmium, can you check if there are still any issues?

@leblowl

This comment has been minimized.

Show comment
Hide comment
@leblowl

leblowl Nov 20, 2015

Hey there, sorry it has taken me so long to respond.

I just got a chance to test out the new update & all looks good. Using osm2pgsql -d gis -S osm2pgsql.style -O null --flat-nodes nodes.bin --slim --cache 22000 --number-processes 1 and the same streets dataset as before results in speeds of 3768.7 k nodes/s with o5m and 3707.1 k nodes/s with osm.pbf. Note: these are both snapshot speeds of when I hit ^c, but in general the speeds look about the same so far - about what I would expect.

I also no longer get this funky osm2pgsql Warning: unknown .o5m dataset id: 0xdb message.

Much thanks!

leblowl commented Nov 20, 2015

Hey there, sorry it has taken me so long to respond.

I just got a chance to test out the new update & all looks good. Using osm2pgsql -d gis -S osm2pgsql.style -O null --flat-nodes nodes.bin --slim --cache 22000 --number-processes 1 and the same streets dataset as before results in speeds of 3768.7 k nodes/s with o5m and 3707.1 k nodes/s with osm.pbf. Note: these are both snapshot speeds of when I hit ^c, but in general the speeds look about the same so far - about what I would expect.

I also no longer get this funky osm2pgsql Warning: unknown .o5m dataset id: 0xdb message.

Much thanks!

@pnorman pnorman closed this Nov 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment