Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVM crash with file-system watching enabled on Windows #13135

Closed
wolfs opened this issue May 15, 2020 · 12 comments · Fixed by #13411
Closed

JVM crash with file-system watching enabled on Windows #13135

wolfs opened this issue May 15, 2020 · 12 comments · Fixed by #13411
Assignees
Labels
Milestone

Comments

@wolfs
Copy link
Member

wolfs commented May 15, 2020

ArtifactTransformCachingIntegrationTest.cache cleanup does not delete entries that are currently being created seems to crash in file-events.dll.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00007ff90d7b3b69, pid=10600, tid=8716
#
# JRE version: OpenJDK Runtime Environment (14.0+36) (build 14+36-1461)
# Java VM: OpenJDK 64-Bit Server VM (14+36-1461, mixed mode, sharing, tiered, compressed oops, g1 gc, windows-amd64)
# Problematic frame:
# C  [native-platform-file-events.dll+0x23b69]

See https://builds.gradle.org/viewLog.html?buildId=34884728&buildTypeId=Gradle_Check_VfsRetention_30_bucket6&fromExperimentalUI=true


cc: @gradle/build-cache

@wolfs
Copy link
Member Author

wolfs commented May 15, 2020

Crash log

There doesn't seem to be a daemon log, maybe because the build was started with --no-daemon.

The re-run worked: https://e.grdev.net/s/fixi5bxoa5a3i/tests/:dependencyManagement:vfsRetentionIntegTest/org.gradle.integtests.resolve.transform.ArtifactTransformCachingIntegrationTest/cache%20cleanup%20does%20not%20delete%20entries%20that%20are%20currently%20being%20created#1

Since it was a no-daemon build and judging from the build log, the crash seems to happen in startWatching.

@wolfs
Copy link
Member Author

wolfs commented May 18, 2020

Same here and here.

@wolfs wolfs self-assigned this May 18, 2020
@lptr
Copy link
Member

lptr commented May 19, 2020

Commit where things started to fail: 09185e8

@wolfs
Copy link
Member Author

wolfs commented May 19, 2020

Some resources how to find out where the crash on Windows happens:

@lptr
Copy link
Member

lptr commented May 19, 2020

FTR, the PDB generated by @wolfs points to line 128 as the crash location, which probably means the line below:

https://github.com/gradle/native-platform/blob/16b2181e9fdf949385b3aa05ec983e4c263864d2/src/file-events/cpp/win_fsnotifier.cpp#L128-L132

    if (status != WatchPointStatus::LISTENING) {
        logToJava(LogLevel::FINE, "Ignoring incoming events for %s as watch-point is not listening (%d bytes, errorCode = %d, status = %d)",
            utf16ToUtf8String(path).c_str(), bytesTransferred, errorCode, status);
        return;
    }

@wolfs
Copy link
Member Author

wolfs commented May 19, 2020

I managed to reproduce the crash on master (d91f4d0) as well.

Let's see if I can reduce the number of tests we need to run.

@wolfs
Copy link
Member Author

wolfs commented May 19, 2020

Usage of the tools above:
To show the headers with dumpbin, you can use

"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\dumpbin.exe" /headers ..\gradle\intTestHomeDir\worker-1\native\5fef6b658a83c5bcc1b372000e2bec0065fa8a789a588892999cc2c26706b2e4\windows-amd64\native-platform-file-events.dll
...
OPTIONAL HEADER VALUES
             20B magic # (PE32+)
           14.25 linker version
           13600 size of code
            F400 size of initialized data
               0 size of uninitialized data
            4D94 entry point (0000000180004D94)
            1000 base of code
       180000000 image base (0000000180000000 to 0000000180026FFF)
...

Then you can disassemble the dll by this:

"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\dumpbin.exe" /exports /disasm ..\gradle\intTestHomeDir\worker-1\native\5fef6b658a83c5bcc1b372000e2bec0065fa8a789a588892999cc2c26706b2e4\windows-amd64\native-platform-file-events.dll > native-platform-file-events.asm

In that file, you can look up the address from the hs_err file:

0000000180023B57: E8 3F EE FD FF     call        000000018000299B
  0000000180023B5C: E9 36 01 00 00     jmp         0000000180023C97
  0000000180023B61: 48 8B 84 24 E0 00  mov         rax,qword ptr [rsp+00000000000000E0h]
                    00 00
  0000000180023B69: 83 78 68 01        cmp         dword ptr [rax+68h],1
  0000000180023B6D: 0F 84 CB 00 00 00  je          0000000180023C3E
  0000000180023B73: BA 02 00 00 00     mov         edx,2
  0000000180023B78: 48 8B 0D 29 64 15  mov         rcx,qword ptr [0000000180179FA8h]

Though that probably won't help you much to pin down the line in the code.

So, if you have the pdb file for the dll, you can use llvm-pdbutil to show the mapping from address to line:

llvm-pdbutil.exe pretty --lines build/libs/nativePlatformFileEvents/shared/windows_amd64/native-platform-file-events.pdb > native-platform-file-events.lines

In the resulting file you can search for (part of) the address of the crash to find out the source code lines:

...
    <project-dir>\src\file-events\cpp\win_fsnotifier.cpp (MD5: 995F304E9F7C13F091B902D16166C592)
...
      Line 124, Address: [0x00023b4f - 0x00023b5b] (13 bytes)
      Line 125, Address: [0x00023b5c - 0x00023b60] (5 bytes)
      Line 128, Address: [0x00023b61 - 0x00023b72] (18 bytes)
      Line 129, Address: [0x00023b73 - 0x00023c3b] (201 bytes)
      Line 131, Address: [0x00023c3c - 0x00023c3d] (2 bytes)
      Line 133, Address: [0x00023c3e - 0x00023c4c] (15 bytes)
...

There you go: it is line 128 in win_fsnotifier.cpp.

@wolfs
Copy link
Member Author

wolfs commented May 20, 2020

Looking at the flaky tests for the last 24hs on master, I found more occurences of the JVM crash:

The other flaky tests (e.g. https://e.grdev.net/s/i62lewlh4r3wg) are also caused by JVM crashes in native-platform-file-events.dll, this time at 0x2143a.

@lptr
Copy link
Member

lptr commented May 20, 2020

So 0x2143a seems to map to here:

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.25.28610\include\xlocale (MD5: 48B1BDF6F3916796193CD4CB8DDA87E7)
      Line 1084, Address: [0x0002140c - 0x00021431] (38 bytes)
      Line 1086, Address: [0x00021432 - 0x00021441] (16 bytes)
      Line 1087, Address: [0x00021442 - 0x00021446] (5 bytes)
        while (_Mid1 != _Last1 && _Mid2 != _Last2) { // convert and put a wide char
            unsigned long _Ch;
/* --> */   const unsigned short _Ch1 = static_cast<unsigned short>(*_Mid1);
            bool _Save                = false;

I guess it's the de-referencing of _Mid1, i.e. we are trying to convert a wide string from an invalid memory location. That makes sense, and points to an invalid path being converted again.

@wolfs
Copy link
Member Author

wolfs commented May 25, 2020

Sometimes it seems that cancelling a watch point fails with device or resource busy (see for example here). I only recall seeing this problem when not using isolated daemons, see #12604.

@wolfs wolfs changed the title Analyze JVM crash with VFS retention Windows Fix JVM crash with VFS retention Windows May 26, 2020
@wolfs wolfs added this to the 6.6 RC1 milestone May 26, 2020
@wolfs wolfs removed their assignment May 26, 2020
@wolfs
Copy link
Member Author

wolfs commented Jun 8, 2020

Looks like DependencyVerificationIntegrityCheckIntegTest is now so flaky that the build fails consistently on master: https://e.grdev.net/s/buk5vdoiy3xqs/tests/flaky

wolfs added a commit that referenced this issue Jun 8, 2020
@wolfs
Copy link
Member Author

wolfs commented Jun 8, 2020

I ignored the test on master for now: 6995019

@lptr lptr changed the title Fix JVM crash with VFS retention Windows JVM crash with file-system watching enabled on Windows Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants