Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible regression in the export branch #62

Closed
emilburzo opened this issue Jul 14, 2017 · 11 comments
Closed

possible regression in the export branch #62

emilburzo opened this issue Jul 14, 2017 · 11 comments

Comments

@emilburzo
Copy link
Contributor

emilburzo commented Jul 14, 2017

Hi!

I'm trying to convert a ~130 GB osm file to geojsonseq using the code from the export branch.

The exact command I'm using is: osmium export -vi sparse_file_array -u type_id -r filtered.osm -o filtered.geojsonseq

But I'm getting the following error:

[ 0:00]   other options:
[ 0:00]     index type: sparse_file_array
[ 0:00]     add unique IDs: type and id
[ 0:00]     keep untagged features: no
[ 0:00] First pass through input file (reading relations)...
[31:01] First pass done.
[31:01] Second pass through input file...
osmium: /home/ubuntu/libosmium/include/osmium/storage/item_stash.hpp:157: const size_t& osmium::ItemStash::get_item_impl(osmium::ItemStash::handle_type) const: Assertion `offset < m_buffer.committed()' failed.
21847 Aborted                 (core dumped) osmium export -vi sparse_file_array -u type_id -r $1 -o $2

This worked previously (the file did have about 10 GB less data though), so I'm not sure if it's a regression in the export branch, in libosmium or if I just need more RAM?

I'm going to try to use the code from rev 2511be9 to try and identify the regression, but if you have any ideas please let me know :)

PS: Amazing tool!

@emilburzo
Copy link
Contributor Author

Yep, rev 2511be9 works.

@joto
Copy link
Member

joto commented Jul 14, 2017

The code where this fails was only introduced recently. This looks like a bug in libosmium. Problem is that the place where this bug occurs is really dependent on the input data and will likely not happen with small test files.

What are you using as input? Can you give me that file somehow?

@joto
Copy link
Member

joto commented Jul 14, 2017

@emilburzo Could you try something for me: Find the function should_gc() in libosmium include/osmium/storage/item_stash.hpp line 171 and change it to always return false. The recompile osmium-tool and try your command again. This will need more memory now, so it might be you don't have enough. If it runs through after this change, we'll know for sure that the error is in a the area I think it is.

@emilburzo
Copy link
Contributor Author

emilburzo commented Jul 14, 2017

@joto I changed the should_gc() function:

$ git diff
diff --git a/include/osmium/storage/item_stash.hpp b/include/osmium/storage/item_stash.hpp
index 81d9802..495b7b3 100644
--- a/include/osmium/storage/item_stash.hpp
+++ b/include/osmium/storage/item_stash.hpp
@@ -168,16 +168,7 @@ namespace osmium {
         // buffer grow (*3). The checks (*1) and (*2) make sure there is
         // minimum and maximum for the number of removed objects.
         bool should_gc() const noexcept {
-            if (m_count_removed < 10 * 1000) { // *1
-                return false;
-            }
-            if (m_count_removed >  5 * 1000 * 1000) { // *2
-                return true;
-            }
-            if (m_count_removed * 5 < m_count_items) { // *3
-                return false;
-            }
-            return m_buffer.capacity() - m_buffer.committed() < 10 * 1024; // *4
+               return false;
         }
 
     public:

(is that correct?)

and recompiled osmium-tool, but it still crashes unfortunately:

[ 0:00] Started osmium export
[ 0:00]   osmium version 1.6.1 (v1.6.1-10-g4d44ac9)
[ 0:00]   libosmium version 2.12.2
[ 0:00] Command line options and default settings:
[ 0:00]   input options:
[ 0:00]     file name: /home/ubuntu/filtered-tags.osm
[ 0:00]     file format: 
[ 0:00]   output options:
[ 0:00]     file name: /home/ubuntu/filtered-tags.geojsonseq
[ 0:00]     file format: geojsonseq (without RS)
[ 0:00]     overwrite: no
[ 0:00]     fsync: no
[ 0:00]   attributes:
[ 0:00]     type:      (omitted)
[ 0:00]     id:        (omitted)
[ 0:00]     version:   (omitted)
[ 0:00]     changeset: (omitted)
[ 0:00]     timestamp: (omitted)
[ 0:00]     uid:       (omitted)
[ 0:00]     user:      (omitted)
[ 0:00]     way_nodes: (omitted)
[ 0:00]   linear tags:
[ 0:00]   area tags:
[ 0:00]   other options:
[ 0:00]     index type: sparse_file_array
[ 0:00]     add unique IDs: type and id
[ 0:00]     keep untagged features: no
[ 0:00] First pass through input file (reading relations)...
[31:12] First pass done.
[31:12] Second pass through input file...
osm2geojson.sh: line 6: 18424 Segmentation fault      (core dumped) osmium export -vi sparse_file_array -u type_id -r $1 -o $2

Although it did progress a lot further (~16GB geojsonseq file) than before (~4GB geojsonseq file)

My workflow is:

  • download osm pbf planet dump
  • run osmium tags-filter pbf to osm (with a long list of tags)
  • run osmium export osm to geojsonseq

I don't have where to host that huge file, but I can send you the list of tags (where?) if it helps.

@joto
Copy link
Member

joto commented Jul 14, 2017

I think I found the problem. Can you recompile with newest libosmium master and try again?

@joto
Copy link
Member

joto commented Jul 14, 2017

(And just btw: Why are you writing the file into the osm format and not using pbf? That would make steps 2 and 3 much faster.)

@emilburzo
Copy link
Contributor Author

I think I found the problem. Can you recompile with newest libosmium master and try again?

It went a lot further this time, the geojsonseq output file has ~30 GB (the complete one has ~38 GB)

[ 0:00] First pass through input file (reading relations)...
[31:54] First pass done.
[31:54] Second pass through input file...
osm2geojson.sh: line 6: 27181 Segmentation fault      (core dumped) osmium export -vi sparse_file_array -u type_id -r $1 -o $2

(And just btw: Why are you writing the file into the osm format and not using pbf? That would make steps 2 and 3 much faster.)

Just an assumption I never actually tested (I assumed plaintext would need less processing/be faster).

Thanks for the tip!

@joto
Copy link
Member

joto commented Jul 16, 2017

Okay, that doesn't look good. Can you tell me the exact input data you used, libosmium and osmium-tool software versions involved and the exact commands you used so that i can try to reproduce the problem?

Oh, and how much memory do you have?

@emilburzo
Copy link
Contributor Author

Exact steps for a vanilla Ubuntu 16.04 install:

sudo apt update
sudo apt install -y build-essential cmake zlib1g-dev libexpat1-dev libbz2-dev libboost-program-options-dev libboost-dev
cd ~/
wget https://ftp5.gwdg.de/pub/misc/openstreetmap/planet.openstreetmap.org/pbf/planet-latest.osm.pbf
git clone https://github.com/osmcode/libosmium.git
git clone https://github.com/osmcode/osmium-tool.git
cd ~/osmium-tool
git checkout export
make -j4
wget -O tags http://hq.emilburzo.com/public/tags
~/osmium-tool/build/src/osmium tags-filter -v ~/planet-latest.osm.pbf $(cat tags | tr '\n' ' ') -o ~/filtered-planet.osm
~/osmium-tool/build/src/osmium export -vi sparse_file_array -u type_id -r ~/filtered-planet.osm -o ~/filtered-planet.geojsonseq

libosmium and osmium-tool software versions

I'm using the latest master branch of libsodium and the export branch of osmium-tool

From osmium's output:

  • osmium version 1.6.1 (v1.6.1-10-g4d44ac9)
  • libosmium version 2.12.2

Oh, and how much memory do you have?

30.5 GB (AWS r4.xlarge)

@joto
Copy link
Member

joto commented Jul 17, 2017

I think I have found the problem. It is in libosmium. Can you try with current master?

(And btw: Instead of $(cat tags | tr '\n' ' ') you should be able to just use -e tags.

@emilburzo
Copy link
Contributor Author

Spot on! it worked:

[90:35] Second pass done.
[90:35] Wrote 54921999 features.
[90:35] Encountered 1555 errors.
[90:35] Peak memory used: 17416 MBytes
[90:35] Done.

Thanks for your help (and the very useful tips!).

@joto joto closed this as completed Jul 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants