Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault #71

Closed
marfago opened this issue Apr 10, 2015 · 33 comments
Closed

Segmentation Fault #71

marfago opened this issue Apr 10, 2015 · 33 comments

Comments

@marfago
Copy link

marfago commented Apr 10, 2015

HI,

Im not expert with linux and c libraries, so probaly my description is incomplete or not very handful.
Anyway Im experiencing a problem with libspatialindex.so.3.0.0. Basically I have a plone (python) instance using a rtree library (https://github.com/Toblerity/Rtree). Now and then the instance dies with no clues and message. I've checked the dmesg report from linux and here is what I see:

[1476438.713131] traps: python[317] general protection ip:7fb04f5a2ff9 sp:7fb04a3440b0 error:0 in libspatialindex.so.3.0.0[7fb04f506000+b8000]
[1476763.870627] python[379]: segfault at 0 ip 00007f855e9ccaea sp 00007f85409640f8 error 4 in libc-2.19.so[7f855e944000+1bb000]
[1478334.385962] traps: python[1211] general protection ip:7ff88dbf5ff9 sp:7ff889198390 error:0 in libspatialindex.so.3.0.0[7ff88db59000+b8000]
[1526321.346560] traps: python[17381] general protection ip:7f33d0f48ff9 sp:7f33cc4eb390 error:0 in libspatialindex.so.3.0.0[7f33d0eac000+b8000]
[1527794.507131] traps: python[20645] general protection ip:7f8ccb8c6ff9 sp:7f8cc5766390 error:0 in libspatialindex.so.3.0.0[7f8ccb82a000+b8000]
[1528432.553798] python[20984]: segfault at 22 ip 00007f8f4de55ff2 sp 00007f8f404f6390 error 4 in libspatialindex.so.3.0.0[7f8f4ddb9000+b8000]
[1528892.762885] traps: python[21271] general protection ip:7f97170eeff2 sp:7f9711e90390 error:0 in libspatialindex.so.3.0.0[7f9717052000+b8000]
[1529146.029769] python[21388]: segfault at 0 ip 00007f29b5d25aea sp 00007f298bff9a88 error 4 in libc-2.19.so[7f29b5c9d000+1bb000]
[1529364.634798] python[21435]: segfault at 0 ip 00007fa394931aea sp 00007fa3780cc148 error 4 in libc-2.19.so[7fa3948a9000+1bb000]
[1529636.648433] python[21533]: segfault at 0 ip 00007f7c14cbaaea sp 00007f7bf37f6af8 error 4 in libc-2.19.so[7f7c14c32000+1bb000]

Im working on a linux instance (a VPS):

Linux 3.13.0-46-generic #79-Ubuntu SMP Tue Mar 10 20:06:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Any suggestion is very appreciated. Thank you in advance.

@ghost
Copy link

ghost commented Jun 5, 2015

Hello! Did anybody addressed this issue?

In my company we run on it and it is really breaks RTrees library. We use https://github.com/Toblerity/Rtree inside our API on two places, one as in-memory and other one within file. We prevent it by not using threads in uwsgi, we use separate processes instead and because of that there is no shared memory and then we avoid segfaults. But we have drop in performance.
So, does anybody know when this will be fixed?

Best regards.

@marfago
Copy link
Author

marfago commented Jun 5, 2015

Yes, I used a similar strategy, just keeping a single index instance.

@hobu
Copy link
Member

hobu commented Jun 5, 2015

In my company we run on it ...
So, does anybody know when this will be fixed?

This is not a fruitful strategy for getting bugs fixed by volunteers. We're certainly interested in a patch if you have one to fix it, however, and since it seems so important to your company, maybe it could fund the effort to fix it in terms of dollars to a developer with the capability to do so or in-kind software development expertise to run it down?

I have no doubt there's an edge issue of some kind, but this bug report doesn't give me any thing to go on. A random stack trace with no debugging info is not sufficient. Is there an RTree python script that demonstrates the problem definitively?

@ghost
Copy link

ghost commented Jun 8, 2015

Hello @hobu! Thanks for the reply.

RTree python lib from the https://github.com/Toblerity/Rtree definitively stops working on high load. When lots of users accessing it. In that case it can not build index anymore or if there is index it can not execute any of its functions. It's hard to give you more info because I get errors in the python lib. It just forwards error messages from your libspatialindex and we randomly get errors like these:
RTreeError: LASError in "Index_Intersects_obj": InvalidPageException: Unknown page id 15221
RTreeError: Error in "IndexProperty_GetOverwrite": Property IndexIdentifier was empty
RTreeError: Error in "IndexItem_GetBounds": Property IndexIdentifier was empty
RTreeError: Error in "IndexProperty_GetDimension": Property IndexIdentifier was empty

But when we look into the uWSGI log there are segmentation fault errors like this one above.

If you can tell us where to look and what could cause this I'm sure that our cpp team can take care of it?

@hobu
Copy link
Member

hobu commented Jun 9, 2015

@boyanpro we need a python script that demonstrates the issue without depending on any of your data or any of your server infrastructure. It needs to reliably trigger the error.

@ghost
Copy link

ghost commented Jun 9, 2015

You can use https://github.com/Toblerity/Rtree and simply build an index (mine is 97MB big) and do any of the operations (intersection or contains) and serve it with uWSGI by using threads. Run a JMeter test with high load on it and it will snap. With using processes only (without threads) it works fine because processes can't share memory.

@hobu
Copy link
Member

hobu commented Jun 9, 2015

I need a simple RTree python script that demonstrates the issue. I'm not going to simulate your environment to trigger this issue. You must provide a script that does so.

@ghost
Copy link

ghost commented Jun 9, 2015

You can use one of these two simple examples here: http://toblerity.org/rtree/presentations/geomeetup-12-06-2012-hobu/#/

It fails even with a simplest script. Just give it high load. Even 50-100 users per second will crash it. You can use Jmeter.

@marfago
Copy link
Author

marfago commented Jun 24, 2015

I've attached a zipped file (rename it to zip). Launching a query on it dies with this error:
[715956.692631] python[6940]: segfault at 7f8b91216a7c ip 00007f8bac0c8ffe sp 00007f8b9d65aa98 error 4 in libc-2.19.so[7f8bac031000+1bb000]

index zip

About the concurrency, I can confirm that feeding the rtree index with two threads exits with seg fault.

@Urtiroler
Copy link

Hi,

im responsible for C++ Development in boyanpro's company and as per you suggestion we spent some time on pinpointing/fixing some of the issues. I'd like to mention however that the way you phrased your invitation to help out was not well received and to us did not reflect the spirit of open source development.

One of our C++ experts - sasavilic - will post his comments after this message.
We'll also submit a fix for the issues with error handling that will only work for gcc - but I think you'll get the idea behind it. I would suggest however that you explicitly mention somewhere that this library is not thread safe - since running into multithreading issues is basically guaranteed.

@sasavilic
Copy link

Fixing this library is unfortunately not easy. This library is clearly not designed to work in a multithreaded environment and believe that they might be a few more spots with thread-safety issues.

It is neither thread-safe, nor reentrant. One can not have unprotected shared variables if library is to be reentrant. The way the error are passed over C interface causes segmentation fault under high load.

The smart pointer implementation is unnecessary complex and without clear semantic about pointer ownership and reference counting. The way the reference counting is done is a sorf of magic and really hard to understand and track. One can not expect that (const SmartPointer& p) will be alive after function ends. Even if it did, it is bad practice to store pointer to const reference object. Yet the implementation stores the pointer on it into m_prev and it dereferences it later during release, possibly having silent memory overwrite. We recompiled 1.7.0 with debug information and under high load the PoolPointer (which has same issues as SmartPointer) would return PointerPool's pointer multiple times to the pool.

In previous version of library (i.e. 1.7.0), there can be found a custom alternative for pthread mutex. This is not thread-safe either.

I also wondered, why should one use mutex here at all. Either is the library completely thread-safe, in which case synchronisation should be done much better or there is no shared state and by definition no need for synchronisation and is still thread safe as long as user does not try to share objects. And if the sole intention was to protect index file from multiple un-synchronised access, it would still not be enough, because index could be changed by different process. In 1.7.0 version, I can't see that file locks are being employed anywhere.

I also found a few potential memory leaks, like this when exception is throw for example here. I guess, that deep down there is more to it. And I really don't see a point in allocating visitor and r on heap, when nice stack allocation could do the work and be safe against resource leaking. RAII is you friend.

Until this issues are fixed, I highly discourage using this library in multi-threaded environment.

@hobu
Copy link
Member

hobu commented Jul 16, 2015

I'd like to mention however that the way you phrased your invitation to help out was not well received and to us did not reflect the spirit of open source development.

@Urtiroler can you understand my frustration? @boyanpro has essentially no github contribution history, provides a "bug report" that essentially says, "it's broken, my company depends on it, when will it be fixed?", and then when asked for some detail demonstrating exactly how to break it, replies with nothing specific to help resolve the issue. Multiple times.

I had no doubt there was a threading issue, and there has been a number of bugs in the repository related to the issue. None as well researched or complete as @sasavilic's analysis. There also is a branch that has been slowly worked on to replace a lot of the custom and poorly implemented things like SmartPointer and PoolPointer with C++11 constructs for both efficiency and correctness. The development of this library is entirely volunteer, however, and its development is moving slowly. If your team wishes to work on this branch, it is https://github.com/libspatialindex/libspatialindex/tree/cpp-11 some of the things that @sasavilic points out are incorrect were removed, but plenty of things remain -- especially in the C API.

The C API was not designed to be thread safe. @sasavilic is right that it has unprotected static members getting updated without regard to synchronization, and isn't very clean about RAII when errors happen. The two biggest impediments to making it better or a) lack of tests that cleanly demonstrate threading issues and b) lack of developer time to go through and fix stuff. If your team has the desire and resources to work these issues, I am certainly willing to do whatever I can in support of it, including give you guys direct access to the repository (and RTree's repository too, if necessary).

@davisp
Copy link
Contributor

davisp commented Jul 16, 2015

Just a quick note that @sasavilic's analysis is spot on. There's a lot of needless indirection that prevents a clear understanding of the library. It'd be nice to have the time and resources to try and clean up some of the older bits (like the custom shared pointers) but as @hobu mentioned, no one has gotten to it. I personally try and fix what I can but ended up coding more defensively for some of the harder to fix bugs.

One thing of note about the shared pointers is that its also drastically wrong in a few places. Specifically [1] misses the whole concept of shared_from_this which is why you're seeing the double free stuff. I've resorted to destroying and recreating the Index objects to avoid some of these memory issues.

Also, on the file handling side of things I avoided that entirely as I found it was too untrustworthy. I ended up using the custom storage manager and piped updates to a LevelDB instance that uses write batches to ensure index integrity.

[1] https://github.com/libspatialindex/libspatialindex/blob/master/src/rtree/Leaf.cc#L50-L51

@ghost
Copy link

ghost commented Jul 17, 2015

@boyanpro has essentially no github contribution history, provides a "bug report" that essentially says, "it's broken, my company depends on it, when will it be fixed?", and then when asked for some detail demonstrating exactly how to break it, replies with nothing specific to help resolve the issue. Multiple times.

That is simply not true, I gave the same example to @sasavilic and he was able to reproduce the issue immediately. So, I presume that you was aware of these issues. Now we can see that these issues are not hidden somewhere, but they are obvious. And that is the problem because we have libraries like Rtree from Toblerity Project leaning on libspatialindex, but essentially libspatialindex is broken and that makes Rtree broken, but yet Rtree is the most popular python library for rtrees. So, it would be very useful if you put the notice that libspatialindex is not thread-safe. Btw, my github profile doesn't say anything about my contribution to OS software during past 7 years.

@hobu
Copy link
Member

hobu commented Jul 17, 2015

Ok, I didn't want to argue this anymore, but neither #71 (comment) nor #71 (comment) are code examples that demonstrate the issue. They are rabbit holes to send someone down to see if they can reproduce given your unspecified conditions. As I said, I believe you, and @sasavilic's analysis clearly backs you up, but you can't expect me or any other volunteer developer to run down your issue without a concise demonstration.

yet Rtree is the most popular python library for rtrees

Because it has the best and most complete support for the most things except for multithreaded access. Serialization, bulk loading, multiple splitting strategies. As the person responsible for developing or maintaining both Rtree and libspatialindex, I'm glad you've found them useful. There are tons of pure python rtree implementations to choose from. Maybe there's one that is thread safe. It's a hard challenge to do well (correct, fast, easy to use, featureful).

So, it would be very useful if you put the notice that libspatialindex is not thread-safe.

Please make a pull request doing so in a place you think it is most appropriate, and I'll be happy to merge it.

@kounoupis
Copy link
Member

I should chime in regarding the shared pointer implementation.

If any of you guys is planning to fix it or use shared_ptr instead, have in mind that the PointerPool and PoolPointer implementation, used to get rid of new/delete, provided a 3x speedup (mostly the Region pool). So you cannot just use shared_ptr. You would have to tie that in somehow with some sort of pointer caching. This is what creates the pointer "acquire" mess. I am sure there is a simpler way to do this though, but in the current implementation the solution is not shared_from_this, because if I remember correctly there are two components referencing the same pointer with different reference counts.

Finally, a somewhat related issue in case you decide to simplify things. I remember that converting from vectors to arrays provided a 50% performance improvement (this is related to the Leaf/Index pool).

@rburhum
Copy link

rburhum commented Jul 20, 2015

but you can't expect me or any other volunteer developer to run down your issue without a concise demonstration.

Keyword here is volunteer.

Thank you @hobu and team for the work on libspatialindex. There are many of us that are thankful for projects like this.

@sasavilic
Copy link

So you cannot just use shared_ptr. You would have to tie that in somehow with some sort of pointer caching.

Yes, they can :). Just instead of using default deleter, they must implement they own deleter which then returns pointer to the pool (instead of deleting it).

@kounoupis
Copy link
Member

Correct. My point was to be careful and not "just use shared_ptr", and simply get rid of PointerPool, which is the obvious quick fix.

@hobu
Copy link
Member

hobu commented Jul 20, 2015

My point was to be careful and not "just use shared_ptr", and simply get rid of PointerPool, which is the obvious quick fix.

The current cpp-11 branch does exactly this, btw. It will need some update.

@marfago
Copy link
Author

marfago commented Sep 17, 2015

Can anyone point me to a most recent (but safer) version of this library?
Unfortunately we are using it (even though we are not expert with c++) and we would not be able to fix any problem.
But, even with a low load the library stops and dies with segmentation fault.
Thank you.

@ghost
Copy link

ghost commented Sep 17, 2015

The library is a mess, you have to find another solution.

@marfago
Copy link
Author

marfago commented Sep 17, 2015

Any good alternative which can replace RTree as well?

@ghost
Copy link

ghost commented Sep 17, 2015

We replaced it with python microservice which has PostGIS in the background. So, basically PostGIS does this job (geo spatial query) instead of RTree.

@marfago
Copy link
Author

marfago commented Sep 17, 2015

At this point, libspatialindex should be clearly stated as unreliable.
This trap is catching people and someone (me included) might hurt seriously.

@hobu
Copy link
Member

hobu commented Sep 17, 2015

At this point, libspatialindex should be clearly stated as unreliable.

No. It's a nearly 15 year old library that's never really made a claim about being threadsafe or multithreaded, and you're using in a way that wasn't really intended. This ticket should give anyone looking for "multithread libspatialindex" enough ammo to figure out what's up.

PostGIS is a great solution if what you really want is a multithreaded spatially-aware database. libspatialindex isn't that. You can use libspatialindex to build a PostGIS-like thing if you take care (locking/access) and are willing to invest the resources in cleaning up the issues libspatialindex might have for your scenario. There are multiple, significant examples of this approach.

@rburhum
Copy link

rburhum commented Sep 17, 2015

How dare you @hobu create something for free, shared it with everyone without any monetary reward, and not fix it immediately when it is demanded that you do so - now. Shame on you.

@hobu
Copy link
Member

hobu commented Sep 17, 2015

@rburhum It was @kounoupis that created libspatialindex. I have just acted as the maintainer.

@rburhum
Copy link

rburhum commented Sep 17, 2015

I guess it is shared blame then :)

@ghost
Copy link

ghost commented Sep 18, 2015

15 years old is answer to all of these questions posted here. So, the library is reliable like my uncle's 15 years old VW Golf.

@r-barnes
Copy link
Contributor

I'm using OpenMP for parallelism. Putting a #pragma omp critical around my intersectsWithQuery calls seemed to solve all my problems.

For me, this simple solution works well since I am using an R*-tree to prefilter polygon-overlap calculations. The overlap calculation is significantly more expensive than the R*-tree query, so the cost of a critical section to the wall-time is low.

On my machine not using a critical section around intersectsWithQuery led to a segfault. I have four cores, a hardware environment that's probably common now. If anyone's interested in a reproducible example, I think I could put one together. However, since the foregoing discussion seems to indicate a design which begets race conditions, I think the simplest solution is probably just to use mutexes around queries/updates rather than viewing this as being a bug.

@hobu hobu closed this as completed Oct 6, 2019
@codeananda
Copy link

codeananda commented Jan 12, 2024

You can use libspatialindex to build a PostGIS-like thing if you take care (locking/access) and are willing to invest the resources in cleaning up the issues libspatialindex might have for your scenario. There are multiple, significant examples of this approach.

I've just read through this whole thread. Appreciate all the hard work that has gone on! Would it be possible @hobu if you could share some links to any of the multiple significant examples? I searched for other rtree packages on github but this one is clearly the most established. In fact, I only found rtreelib which hasn't had a commit in 4+ years.

Of particular interest, would be links to efforts that make rtrees multithreaded

@codeananda
Copy link

We replaced it with python microservice which has PostGIS in the background. So, basically PostGIS does this job (geo spatial query) instead of RTree.

We tried using PostGIS but it was super slow (50x slower than rtrees at least).

Instead, we are using geopandas (and potentially dask-geopandas).

The sjoin and sjoin_nearest methods are crazy fast. Just make sure to used vectorised operations (and groupbys) as much as possible and it'll be speedy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants