-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[osmium-merge] With 2 input files merge does not clean duplicates #41
Comments
Strictly speaking you are using I never thought about the case you have. It is kind of difficult to understand the difference and explain this to users. And it is easy to fix by removing the special case you mentioned. So I'll fix it. |
Seems I was overhasty. This is not so easy to fix. You say it doesn't happen in the three-file case, but that isn't so. Maybe it does in your case, but generally that is not true. Why do you have files with duplicate entries in the first place? |
I merge generated files into a single one. It's not data from the OSM site but data in a OSM compliant format. The files contain adminstrative boundary of different level. There is no duplicate data in the same file, but the 2 files contains the same nodes and even the same way, like island for exemple. In those cases, the output contains all the nodes and ways from the 2 inputs file. (I use osmconvert to be able to grep inside the pbf files) extract from input file 1 :
extract fom input file 2 :
extract from output file :
but if I use a second time the fra.osm.pbf file then the output is :
My thought : as the timestamp is not the same, the set_union use in the 2 files case does not see the 2 ways as identicals. whereas in the 3 files case no set_union is made. I'm not a cpp dev, but a java one, i might have miss something :) |
If you have two different variants of the same object but with the same id and version than all bets are off. That's not correct data and I can't guarantee any outcome. I'll clarify this in the man page. The reason you are seeing the different behaviour is probably that one algorithm uses == comparison and the other <. But this is not something you can rely upon. |
Mention in man page that object comparison is only done on type, id, and version. See #41.
When I try to merge 2 pbf files containing duplicates, the output still has the duplicates.
merge file1.osm.pbf file2.osm.pbf -o result.osm.pbf
But if I use one of the file a second time, the output does not contains duplicates anymore.
merge file1.osm.pbf file2.osm.pbf file1.osm.pbf -o result.osm.pbf
The issue seems to come from the specific treatment made to process only 2 files, which seems to only do an union between the files.
The text was updated successfully, but these errors were encountered: