SIGBUS with ~StubRoutines::jlong_disjoint_arraycopy #442
Comments
Tried with partial mmap ... too slow to use in my case, so i'm stuck with this bug. |
Temporarily I changed main collection on this DB, removed valuesOutsideNodesEnable() and added a more aggressive maintenance, so DB never goes too far past 1G marker... hopefully this will work around the issue. I'll just have to wait for BTree compacting. ... but i'd still like to know WHY i was getting that error |
Hi, sometimes I am getting similar error when using Unsafe storage in This article says it could mean that write into memory mapped file I will start on 1.0.6, so will investigate it as well. |
I got this error when building master. VolumeTest triggered it. Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.004 sec - in org.mapdb.UtilsTest A fatal error has been detected by the Java Runtime Environment:SIGSEGV (0xb) at pc=0x0000000107fa5062, pid=73621, tid=4867JRE version: Java(TM) SE Runtime Environment (7.0_71-b14) (build 1.7.0_71-b14)Java VM: Java HotSpot(TM) 64-Bit Server VM (24.71-b01 mixed mode bsd-amd64 compressed oops)Problematic frame:v ~StubRoutines::jlong_disjoint_arraycopyFailed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java againAn error report file with more information is saved as:/Users//repo/third/MapDB/hs_err_pid73621.logIf you would like to submit a bug report, please visit:http://bugreport.sun.com/bugreport/crash.jsp/bin/sh: line 1: 73621 Abort trap: 6 /Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home/jre/bin/java -jar /Users//repo/third/MapDB/target/surefire/surefirebooter4802539880473851706.jar /Users//repo/third/MapDB/target/surefire/surefire5886035989891522075tmp /Users//repo/third/MapDB/target/surefire/surefire_01025126588681928282tmp |
@niels1voo That is other kind of bug coming from new Unsafe storage. I just fixed it. |
I started investigating this issue in 1.0.6. Most likely its caused by Async Writer. There is some shifting in compaction and I think Async Writer somehow writes to file which has been closed. That could cause this error in 1.0.6. |
I think I understand whats going on now. MapDB forces @freakolowsky I could not replicate issue. But I think current
|
How stable is this snapshot ... the only way to reproduce this bug in my case is running it on one of our production site and i'm currently still handling the lag from last months crash, so i can't afford to make too many mistakes at the moment. Will try to reproduce it in test environment, but i'm not promising anything. |
It will become 1.0.7 in a few days. Perhaps wait for stable release. |
We have also just started experiencing this issue. We've only seen it once in the field, but are trying to reproduce locally on 1.0.6 so we can confirm 1.0.7 does indeed fix the issue. @freakolowsky Were you able to verify 1.0.7 in your environment? |
I have been able to artificially reproduce this (including version 1.0.7) with concurrent access by multiple JVMs. I do not think this is supported because of the lack of file locks, but it does fail in nearly the same way with the SIGBUS. Not sure if that helps or not.
|
I can reproduce this as well, when mapdb runs out of disk space. I think it is caused when mmap write fails for some reasons. |
We have tested out 1.0.7 and don't believe it addresses the SIGBUS problem. We have ancedotal evidence that suggests this happened much more frequently after the 1.0.5 release though. This is starting to become pretty critical for our deployment of MapDB. What do you think the next steps are? |
@mhuffman-r7 I dont have a good answer. I will write more tests and try to isolate this issue. |
Thanks @jankotek. We've still been trying to reproduce this with a smaller case without success, so it certainly must be an issue with scale/size/environment that is hard to pinpoint. If we have a breakthrough we'll post it here. We have considered disabling the memory mapped configuration, but are finding in our usage that degrades performance by about 30-40% overall, so its less than ideal. |
@mhuffman-r7 i'm on 1.0.7, but still running with my workaround (agressive maintenence that keeps my DB small i.e. < 2Gb). Haven't seen SIGBUS since. I also clone DB and overwrite actual with clone every day to deal with node fragmentation (@jankotek until this is handeled internally it would really be useful to have a method for that purpose in org.mapdb.DB). I've never hit this bug on setups where DB sizes are rather small, so i do my best to keep them that way, as i process a lot of data i.e.: one of my a high load setups is where over 5*10^9 records with up to 20 dependant child records on average are being processed daily, but i retain very little of it. On that setup fragmented nodes can bloat the state file x5 or more. Example from this morning: state file before cloning 1,2G after 0,3G (and cloning just copies from one db to another). @jankotek i could run my code without the workaround on dummy data in my testing environment if that would help you identify the issue. Just tell me what to look for. |
I will prepare som test cases, perhaps if you would run them on your setup. |
Has any progress been made on this research? |
I started investigating this issue again. So far I can only reproduce it if mmap write fails, for example if disk free space is exhausted. Will report soon. |
I got new theory why JVM crashes. MapDB uses sparse files, ByteBuffers are mapped beyond end of file. If data are written into mmaped ByteBuffer beyond EOF, file size expands. If file can not expand for some reason, that could cause JVM crash. Here is patch, so far it solves JVM crash I had with free space. I will polish it bit and apply on 1.0.8-SNAPSHOT. Feel free to try it on your JVM.
|
So far I found those reasons to explain JVM crash:
So there are going to be changes (both in 1.0.8 and 2.0 beta1)
|
This sounds promising. In our environments we only see this with about 2-3 customers in production and haven't been able to reproduce locally. FWIW, we have only see this issue on Linux. Customers experiencing this sometimes also have file-too-large issues, and other edge-cases, so this seems to support your analysis. Thanks. |
Unmap on close is now disabled by default in 2.0 branch. There is option to enable it: DBMaker.fileMmapCleanerHackEnable() For 1.0 I dont want to disable it. Too big change in default behavior, so I will add new option to disable unmap hack. That should prevent JVM crash. |
…xpand file size. It could crash JVM. See #442
…xpand file size. It could crash JVM. See #442
MapDB 1.0 snapshot now has option MapDB 2.0 has now Cleaner Hack disabled by default. There is option I consider my work on this bug done, so current version of both repos will make it to final release. |
MapDB 2.0-beta1 and 1.0.8 with solutions were just released. In 1.0.8 you need to disable Cleaner Hack with Please let me know your result. Right now I do not know for sure if this issue is fixed. I will leave this open for couple of weeks, and eventually close it, if nobody can replicate any longer. |
Correction 2.0-beta2 does not have all fixes, and could still crash. I just added fix to master branch |
No complains, so closing this issue. |
We have yet to try out the new changes in production, and may wait a while to attempt to do so. I'll let you know if we see any issues with this attempted fix. |
I ran into this error today and here are my observations. I have a system that I run all the time on an m3.large instance because the SSD is not big enough to hold all the data generated by a batch job on an m3.medium. This batch job is also doing mmap I/O to another database, which is a Jena TDB database. Anyhow, I changed my scripts that spin up the AWS instance and accidentally put in 'm3.medium' and found that in that condition I get this error consistently. If has worked with 'm3.large' and has worked on my Windows machine and never had this problem. This makes me wonder if this message is a roundabout way of saying "disk full" or can be in some situations. |
This issue is happening if delayed write fail, for example if disk is
full. Disk space problem is solved in MapDB 3 and MapDB2. What version
are you using?
|
Using 1.0.6.
After my fileDB passes 6GB i sometimes get the following JRE crash:
This happens when calling commit or compact. After restart the store is normally recovered (all i have to do is remove failed compact files).
DB make code:
I'm trying it now with mmap disabled, but compacting in this state is painfully slow. Will try with mmap partial (after the current compact run finishes :D )
The text was updated successfully, but these errors were encountered: