Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duplicate key error in slim mode #16

Closed
milovanderlinden opened this issue Apr 8, 2013 · 10 comments
Closed

duplicate key error in slim mode #16

milovanderlinden opened this issue Apr 8, 2013 · 10 comments

Comments

@milovanderlinden
Copy link

I am merging two planets from planet.openstreetmap.nl. The planets for Aruba and Curacao. Although both islands are pretty far apart, they share one common relations; the nautical boundary.

During import in regular mode, all goes well. But in slim mode, I am getting errors:

--2013-04-08 15:43:51-- http://planet.openstreetmap.nl/aruba/planet-aruba-130408.osm.pbf
Resolving planet.openstreetmap.nl (planet.openstreetmap.nl)... 2a00:d10:101::13:1, 93.186.179.161
Connecting to planet.openstreetmap.nl (planet.openstreetmap.nl)|2a00:d10:101::13:1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1069130 (1.0M)
Saving to: ‘planet-aruba-130408.osm.pbf’

100%[===============================================================================================================================================================================>] 1,069,130 1.83MB/s in 0.6s

2013-04-08 15:43:52 (1.83 MB/s) - ‘planet-aruba-130408.osm.pbf’ saved [1069130/1069130]

osm2pgsql SVN version 0.81.0 (64bit id space)

Using projection SRS 4326 (Latlong)
Setting up table: planet_osm_point
NOTICE: table "planet_osm_point_tmp" does not exist, skipping
Setting up table: planet_osm_line
NOTICE: table "planet_osm_line_tmp" does not exist, skipping
Setting up table: planet_osm_polygon
NOTICE: table "planet_osm_polygon_tmp" does not exist, skipping
Setting up table: planet_osm_roads
NOTICE: table "planet_osm_roads_tmp" does not exist, skipping
Allocating memory for dense node cache
Allocating dense node cache in one big chunk
Allocating memory for sparse node cache
Sharing dense sparse
Node-cache: cache=800MB, maxblocks=102401*8192, allocation method=11
Mid: pgsql, scale=10000000 cache=800
Setting up table: planet_osm_nodes
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "planet_osm_nodes_pkey" for table "planet_osm_nodes"
Setting up table: planet_osm_ways
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "planet_osm_ways_pkey" for table "planet_osm_ways"
Setting up table: planet_osm_rels
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "planet_osm_rels_pkey" for table "planet_osm_rels"

Reading in file: planet-aruba-130408.osm.pbf
Processing: Node(112k 112.2k/s) Way(8k 4.40k/s) Relation(50 50.00/s) parse time: 3s

Node stats: total(112174), max(2170360891) in 1s
Way stats: total(8792), max(196359215) in 2s
Relation stats: total(51), max(2323309) in 0s
Committing transaction for planet_osm_point
Committing transaction for planet_osm_line
Committing transaction for planet_osm_polygon
Committing transaction for planet_osm_roads

Going over pending ways...
4885 ways are pending

Using 1 helper-processes
Helper process 0 out of 1 initialised
Process 0 finished processing 4885 ways in 4 sec

All child processes exited

4885 Pending ways took 4s at a rate of 1221.25/s
Committing transaction for planet_osm_point
Committing transaction for planet_osm_line
Committing transaction for planet_osm_polygon
Committing transaction for planet_osm_roads

Going over pending relations...
0 relations are pending

Using 1 helper-processes
Process 0 finished processing 0 relations in 0 sec

All child processes exited

node cache: stored: 112174(100.00%), storage efficiency: 50.92% (dense blocks: 18, sparse nodes: 100935), hit rate: 100.00%
Sorting data and creating indexes for planet_osm_roads
Sorting data and creating indexes for planet_osm_point
Sorting data and creating indexes for planet_osm_polygon
Sorting data and creating indexes for planet_osm_line
Stopping table: planet_osm_nodes
Stopping table: planet_osm_ways
Building index on table: planet_osm_ways (fastupdate=off)
Stopping table: planet_osm_rels
Building index on table: planet_osm_rels (fastupdate=off)
Stopped table: planet_osm_nodes in 0s
Stopped table: planet_osm_rels in 0s
Analyzing planet_osm_roads finished
Analyzing planet_osm_line finished
Analyzing planet_osm_polygon finished
Analyzing planet_osm_point finished
Stopped table: planet_osm_ways in 0s
Copying planet_osm_roads to cluster by geometry finished
Creating geometry index on planet_osm_roads
Creating osm_id index on planet_osm_roads
Copying planet_osm_line to cluster by geometry finished
Creating geometry index on planet_osm_line
Creating indexes on planet_osm_roads finished
Creating osm_id index on planet_osm_line
All indexes on planet_osm_roads created in 2s
Completed planet_osm_roads
Copying planet_osm_polygon to cluster by geometry finished
Creating geometry index on planet_osm_polygon
Creating osm_id index on planet_osm_polygon
Creating indexes on planet_osm_line finished
All indexes on planet_osm_line created in 2s
Completed planet_osm_line
Creating indexes on planet_osm_polygon finished
Copying planet_osm_point to cluster by geometry finished
Creating geometry index on planet_osm_point
All indexes on planet_osm_polygon created in 2s
Completed planet_osm_polygon
Creating osm_id index on planet_osm_point
Creating indexes on planet_osm_point finished
All indexes on planet_osm_point created in 4s
Completed planet_osm_point

Osm2pgsql took 15s overall

Reading in file: planet-curacao-130408.osm.pbf
Processing: Node(34k 34.7k/s) Way(5k 5.02k/s) Relation(30 30.00/s) parse time: 1s

Node stats: total(34735), max(2229592357) in 1s
Way stats: total(5020), max(213219417) in 0s
Relation stats: total(33), max(2676830) in 0s
Committing transaction for planet_osm_point
Committing transaction for planet_osm_line
Committing transaction for planet_osm_polygon
Committing transaction for planet_osm_roads
COPY_END for COPY planet_osm_rels FROM STDIN;
failed: ERROR: duplicate key value violates unique constraint "planet_osm_rels_pkey"
DETAIL: Key (id)=(2323309) already exists.
CONTEXT: COPY planet_osm_rels, line 30: "2323309 1 354 {268396336,202476411,202476412,86298925,202487355,202487361,202487362,86297905,8629810..."

@milovanderlinden
Copy link
Author

By the way; this issue is with a self compiled osm2pgsql grabbed from git master. My system is debian wheezy

@apmon
Copy link
Contributor

apmon commented Apr 8, 2013

This is a well known issue and one that is likely not going to be addressed any time soon.

While the rendering tables allow multiple geometries per osm feature / id and therefore don't have a unique constraint on osm_id, the slim tables are used for update processing and thus need to do lookups based on osm_id, for which it has to be unique.

For performance reasons osm2pgsql uses the COPY command, that batches up inserts into a single command. As such, if there is a unique constraint (or duplicate Key) violation, the whole batch fails. As osm2pgsql does stream based processing, it does not have the ability to go back and reprocess that batch of nodes / ways / relations to properly deal with the duplicate entry. Therefore, it cannot recover from such a duplicate key failure and terminates.

Osm2pgsql therefore doesn't support importing multiple files into one database in slim mode if there are duplicate osm features. Instead, I would recommend merging the two files prior to importing with osm2pgsql. For example with osmosis that supports de-duplication on merge.

@pnorman
Copy link
Collaborator

pnorman commented Sep 12, 2014

@apmon, can we close this, as importing multiple overlapping files in slim mode isn't something osm2pgsql does?

@apmon
Copy link
Contributor

apmon commented Sep 16, 2014

Yes, closing this, as it is not likely that we will support overlapping files in slim mode anytime soon. Given the duplicate key errors happen somewhere deep in the copy operations, osm2pgsql would have to catch them and then replay all statements in the failed copy operation. However, there is currently no concept to cache/buffer those statements to replay them, as everything is done in a memoryless streaming process.

@apmon apmon closed this as completed Sep 16, 2014
@mirabilos
Copy link

@apmon can you point out some howto for:

• Import several extracts once (e.g. Belgium plus Nordrhein-Westfalen plus Rheinland-Pfalz plus Luxembourg)
• Keeping these up to date

Thanks! I’m trying to dig through the infos here, on the OSM wiki, on the switch2osm page, on Osmosis documentation, but I don’t find anything comprehensible to me for doing just that. I don’t think I’ve got the disc space needed for a full Europe extract right now.

By the way… what are the disc space requirements for those extracts from Geofabrik? I’m using up about 8 GiB for the entire program installation plus an imported Belgium extract already…

@iqqmuT
Copy link

iqqmuT commented Apr 3, 2015

Use osmconvert to merge multiple pbf files:

  1. Download pbf files
  2. Convert each pbf file to o5m file: osmconvert extract1.osm.pbf -o=extract1.o5m
  3. Merge o5m files into one pbf file: osmconvert extract1.o5m extract2.o5m -o=merged.osm.pbf
  4. Now you can import merged.osm.pbf with osm2pgsql in slim mode

@mirabilos
Copy link

Ah, nice. Does that also work for the diffs?

@ReckyXXX
Copy link

You can import merged file as @iqqmuT says, but it will dropped already loaded map-data in PostgreDatabase and start new import process.

On the other hand, you can substract already existing data from new region file using:

osmconvert new_region.pbf -o=new_region.o5m
osmconvert already_loaded_map.pbf -o=already_loaded_map.o5m
osmconvert new_region.o5m --subtract already_loaded_map.o5m -o=new_region_cleaned.o5m
osmconvert new_region_cleaned.o5m -o=new_region_cleaned.pbf

And then you can just append new file to PostgreSQL using:
osm2pgsql --append ... new_region_cleaned.pbf
without duplicate key errors.

@mirabilos
Copy link

mirabilos commented Feb 20, 2017 via email

@ReckyXXX
Copy link

@mirabilos :

This doesn’t scale up to, say, five regions.

You can merge all new regions together in one file and then add it to postgresql using my method.
The idea is that you don't need to merge already loaded map file with them and wait for loading it anew.
Substracting with osmconvert is not very time consuming and using it with osm2pgsql --append instead of --create could save a lot of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants