New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File::Stat.new() issue in 1.7.23 on Windows 7 #3525

Closed
ph opened this Issue Dec 8, 2015 · 22 comments

Comments

Projects
None yet
2 participants
@ph

ph commented Dec 8, 2015

I have been investigating a Logstash issue with the file input on windows 7 x86.

Se this JVM error log at https://gist.github.com/ph/c0dcc55573337a8d2296

When we try to use the File::Stat.new on a file deleted by another process it
throw an exception EXCEPTION_PRIV_INSTRUCTION by following the trace and the logic flow I have narrowed it down in our code to this line https://github.com/jordansissel/ruby-filewatch/blob/master/lib/filewatch/watch.rb#L102

Few points that I know:

  • When I run the same code in a Windows 2012 Server or Windows 10, I don't have the problem.
  • When I rollback to JRuby to 1.7.22 I don't see the issue anymore on windows 7.

I have tried to create a simple test case outside of Logstash without luck so far.
I have a reproducible way using logstash check the issue for the full test case

@ph

This comment has been minimized.

Show comment
Hide comment
@ph

ph Dec 8, 2015

@enebo Could this be related to the latest changes concerning Stat Handling under windows and could I provide anything else to help debug this issue?

ph commented Dec 8, 2015

@enebo Could this be related to the latest changes concerning Stat Handling under windows and could I provide anything else to help debug this issue?

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Dec 8, 2015

Member

@ph hmmm this is pretty weird. It sort of sounds like some race exists where FindFirstFileW thinks something is there and then it really isn't and it crashes. I am wondering if there are any issues in MSDN (or however bugs are tracked in Windows) about FindFirstFileW. It is a little odd to be seeing this problem since this is a very common API call in Windows.

Member

enebo commented Dec 8, 2015

@ph hmmm this is pretty weird. It sort of sounds like some race exists where FindFirstFileW thinks something is there and then it really isn't and it crashes. I am wondering if there are any issues in MSDN (or however bugs are tracked in Windows) about FindFirstFileW. It is a little odd to be seeing this problem since this is a very common API call in Windows.

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Dec 9, 2015

Member

@ph Another suggestion (if possible) is to see if MRI will also crash in this way. We are both calling the same methods (although we call through our jffi subsystem and they compile C code). I am curious whether it also crashes on MRI.

Member

enebo commented Dec 9, 2015

@ph Another suggestion (if possible) is to see if MRI will also crash in this way. We are both calling the same methods (although we call through our jffi subsystem and they compile C code). I am curious whether it also crashes on MRI.

@ph

This comment has been minimized.

Show comment
Hide comment
@ph

ph Dec 9, 2015

@enebo Concerning the race this is what I though also from the error log. For reproducing under mri, we will check to come up with a easier test case to reproduce might take some times.

cc @jsvd

ph commented Dec 9, 2015

@enebo Concerning the race this is what I though also from the error log. For reproducing under mri, we will check to come up with a easier test case to reproduce might take some times.

cc @jsvd

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Dec 9, 2015

Member

@ph @jsvd I did some cursory searches for problems with FindFirstFileW and with the errno returned and it was not particularly useful.

Member

enebo commented Dec 9, 2015

@ph @jsvd I did some cursory searches for problems with FindFirstFileW and with the errno returned and it was not particularly useful.

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Dec 13, 2015

Member

@ph @jsvd I did some cursory searches for problems with FindFirstFileW and with the errno returned and it was not particularly useful.

WOW --- this text area has had this sentence in it for days :| I should remember to hit the comment button.

Member

enebo commented Dec 13, 2015

@ph @jsvd I did some cursory searches for problems with FindFirstFileW and with the errno returned and it was not particularly useful.

WOW --- this text area has had this sentence in it for days :| I should remember to hit the comment button.

@enebo enebo added this to the JRuby 1.7.24 milestone Jan 6, 2016

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 11, 2016

Member

Hmm I might have had a breakthrough! https://msdn.microsoft.com/en-us/library/windows/desktop/aa364946%28v=vs.85%29.aspx

Check out this section on transactional support:

"Transacted Operations

If a file is open for modification in a transaction, no other thread can open the file for modification until the transaction is committed. So if a transacted thread opens the file first, any subsequent threads that try modifying the file before the transaction is committed receives a sharing violation. If a non-transacted thread modifies the file before the transacted thread does, and the file is still open when the transaction attempts to open it, the transaction receives the error ERROR_TRANSACTIONAL_CONFLICT.
"
This is not the exact same thing but it seems close in spirit. I might look into the transactional version of these file.

@ph the logstash script you have that reproduces this that I can run locally on my win7 windows machine?

Member

enebo commented Jan 11, 2016

Hmm I might have had a breakthrough! https://msdn.microsoft.com/en-us/library/windows/desktop/aa364946%28v=vs.85%29.aspx

Check out this section on transactional support:

"Transacted Operations

If a file is open for modification in a transaction, no other thread can open the file for modification until the transaction is committed. So if a transacted thread opens the file first, any subsequent threads that try modifying the file before the transaction is committed receives a sharing violation. If a non-transacted thread modifies the file before the transacted thread does, and the file is still open when the transaction attempts to open it, the transaction receives the error ERROR_TRANSACTIONAL_CONFLICT.
"
This is not the exact same thing but it seems close in spirit. I might look into the transactional version of these file.

@ph the logstash script you have that reproduces this that I can run locally on my win7 windows machine?

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 11, 2016

Member

Hmm on the next link to the transacted version is it not highly recommending I not use this...

Member

enebo commented Jan 11, 2016

Hmm on the next link to the transacted version is it not highly recommending I not use this...

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 11, 2016

Member

ignore my last two comments. I had not looked at the actual crash dump since before I left for Japan and I realize I should have :)

jnr.posix.WindowsLibC$jnr$ffi$0.FindFirstFileW([BLjnr/posix/windows/WindowsFindData;)Ljnr/posix/HANDLE;+60

So, I was spending time this afternoon looking at GetFileAttributesExW when the crash is happening after that call in the failure branch in FindFirstFileW. So this bug just got much different and hopefully will get some traction now that I am actually looking at the right method :|

Member

enebo commented Jan 11, 2016

ignore my last two comments. I had not looked at the actual crash dump since before I left for Japan and I realize I should have :)

jnr.posix.WindowsLibC$jnr$ffi$0.FindFirstFileW([BLjnr/posix/windows/WindowsFindData;)Ljnr/posix/HANDLE;+60

So, I was spending time this afternoon looking at GetFileAttributesExW when the crash is happening after that call in the failure branch in FindFirstFileW. So this bug just got much different and hopefully will get some traction now that I am actually looking at the right method :|

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 12, 2016

Member

I can reproduce this problem entirely within jnr-posix now. If I create a temp file via java.io.File then FindFirstFileW will crash the JVM. If I access the parent directory of that file then the call passes. I would expect some all or nothing behavior unless some fields are layed out wrong WindowsFindData and directories are not trying to write to those fields somehow??? Anyways this is good new since I have something local which is broken.

This particular bug was not occurring very often because it is only called on a failure from GetFileAttributesExW. So, in other words, only in a weird case.

Member

enebo commented Jan 12, 2016

I can reproduce this problem entirely within jnr-posix now. If I create a temp file via java.io.File then FindFirstFileW will crash the JVM. If I access the parent directory of that file then the call passes. I would expect some all or nothing behavior unless some fields are layed out wrong WindowsFindData and directories are not trying to write to those fields somehow??? Anyways this is good new since I have something local which is broken.

This particular bug was not occurring very often because it is only called on a failure from GetFileAttributesExW. So, in other words, only in a weird case.

@ph

This comment has been minimized.

Show comment
Hide comment
@ph

ph Jan 12, 2016

@enebo Wohoo! do you still need a reproducible Logstash script? There is more details on our side at logstash-plugins/logstash-input-file#82

I will gladly test it if you have fix!

ph commented Jan 12, 2016

@enebo Wohoo! do you still need a reproducible Logstash script? There is more details on our side at logstash-plugins/logstash-input-file#82

I will gladly test it if you have fix!

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 12, 2016

Member

@ph I will be updating jruby-1_7 branch with new jnr-posix which has resolved the issue. I would like to get some confirmation. Perhaps I can ask you to test this tomorrow to make sure it is really fixed?

Member

enebo commented Jan 12, 2016

@ph I will be updating jruby-1_7 branch with new jnr-posix which has resolved the issue. I would like to get some confirmation. Perhaps I can ask you to test this tomorrow to make sure it is really fixed?

@ph

This comment has been minimized.

Show comment
Hide comment
@ph

ph Jan 12, 2016

@enebo Sound like plan! point me to a binary and It will be easy to test.

ph commented Jan 12, 2016

@enebo Sound like plan! point me to a binary and It will be easy to test.

enebo added a commit that referenced this issue Jan 12, 2016

Upodate to latest jnr-posix. Fixes #3525. stat.writable? incorrectly …
…reporting false for some directories on Windows 7
@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 12, 2016

Member

Whoops wrong issue description. @ph can you build jruby-1_7 or do you need a nightly build made?

Member

enebo commented Jan 12, 2016

Whoops wrong issue description. @ph can you build jruby-1_7 or do you need a nightly build made?

@ph

This comment has been minimized.

Show comment
Hide comment
@ph

ph Jan 12, 2016

@enebo I would prefer a nightly build

ph commented Jan 12, 2016

@enebo I would prefer a nightly build

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 13, 2016

Member

@ph If you got the snapshot yesterday then get another today since there are two more commits in it now.

Member

enebo commented Jan 13, 2016

@ph If you got the snapshot yesterday then get another today since there are two more commits in it now.

@ph

This comment has been minimized.

Show comment
Hide comment
@ph

ph Jan 14, 2016

Same kind of error with the newest build, (I used the .zip which should be the same as the tgz)

screenshot 2016-01-14 09 30 32

full trace in https://gist.github.com/ph/4384f48b0ffa580a595b

ph commented Jan 14, 2016

Same kind of error with the newest build, (I used the .zip which should be the same as the tgz)

screenshot 2016-01-14 09 30 32

full trace in https://gist.github.com/ph/4384f48b0ffa580a595b

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 14, 2016

Member

Ok well I will see if I can repro with the linked issue from logstash locally on my win7 box. Stay tuned...

Member

enebo commented Jan 14, 2016

Ok well I will see if I can repro with the linked issue from logstash locally on my win7 box. Stay tuned...

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 19, 2016

Member

@jsvd @ph http://ci.jruby.org/snapshots/jruby-1_7/jruby-bin-1.7.24-SNAPSHOT.tar.gz

I think I got it for realsies this time. I cannot reproduce on 32 bit JVM running the logstash example now but I would love to get confirmation this is good with logstash now. (I just ran nightly so that link should be latest version)

Member

enebo commented Jan 19, 2016

@jsvd @ph http://ci.jruby.org/snapshots/jruby-1_7/jruby-bin-1.7.24-SNAPSHOT.tar.gz

I think I got it for realsies this time. I cannot reproduce on 32 bit JVM running the logstash example now but I would love to get confirmation this is good with logstash now. (I just ran nightly so that link should be latest version)

@ph

This comment has been minimized.

Show comment
Hide comment
@ph

ph Jan 19, 2016

@enebo With the latest snapshot that you posted, I cannot reproduce that issue with windows 7 👍

ph commented Jan 19, 2016

@enebo With the latest snapshot that you posted, I cannot reproduce that issue with windows 7 👍

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Jan 19, 2016

Member

@ph excellent. Look forward to a JRuby release tomorrow then.

Member

enebo commented Jan 19, 2016

@ph excellent. Look forward to a JRuby release tomorrow then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment