Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs on write into .vdi or .vmdk mounted image #15

Closed
andry81 opened this issue Nov 16, 2022 · 11 comments
Closed

Hangs on write into .vdi or .vmdk mounted image #15

andry81 opened this issue Nov 16, 2022 · 11 comments

Comments

@andry81
Copy link

andry81 commented Nov 16, 2022

  • Windows x64 Pro SP1

Got the stack trace from the Process Hacker one of threads has having the cycles. It does change over the time, but the WriteFile been blocked on something. The whole system suffers from WriteFile blocking and none of application can be closed or terminated which had call to WriteFile recently. Even can not be terminated through the Process Hacker itself. The Process Hacker somehow survived that and I could take a single stack dump. Only the hard reset can reset the system, which means it is a critical bug.

Can't say how the notepad could save the stack trace into a file, may be the blocking is not stable.

0, ntoskrnl.exe!KeInsertQueueApc+0x5cb
1, ntoskrnl.exe!ObfReferenceObject+0x99d
2, ntoskrnl.exe!KeWaitForSingleObject+0x1a3
3, ntoskrnl.exe!longjmp+0x1c470
4, Ntfs.sys+0x9bd48
5, fltmgr.sys+0x1102
6, fltmgr.sys!FltIsCallbackDataDirty+0x23ba
7, fltmgr.sys!FltDeletePushLock+0x3ee
8, ntoskrnl.exe!NtWriteFile+0x45f
9, ntoskrnl.exe!longjmp+0x5bf3
10, ntdll.dll!ZwWriteFile+0xa
11, KernelBase.dll!WriteFile+0x7b
12, kernel32.dll!WriteFile+0x36
13, DomainNeutralILStubClass.IL_STUB_PInvoke(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte*, Int32, Int32 ByRef, IntPtr) + 0xc8 <-- mscorlib.ni.dll+0x63c9e8
14, System.IO.FileStream.WriteFileNative(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Int32, System.Threading.NativeOverlapped*, Int32 ByRef) + 0x83 <-- mscorlib.ni.dll+0x5ad683
15, System.IO.FileStream.WriteCore(Byte[], Int32, Int32) + 0x5d <-- mscorlib.ni.dll+0x5ad5dd
16, DiscUtils.Vmdk.HostedSparseExtentStream.Write(Byte[], Int32, Int32) + 0x12c <-- 0x7fe852b360c
17, LTR.IO.ImDisk.Devio.Server.Providers.DevioProviderFromStream.Write(Byte[], Int32, Int32, Int64) + 0x3e <-- 0x7fe852b34ae
18, LTR.IO.ImDisk.Devio.Server.Providers.DevioProviderManagedBase.Write(IntPtr, Int32, Int32, Int64) + 0x6b <-- 0x7fe852b343b
19, LTR.IO.ImDisk.Devio.Server.Services.DevioShmService.WriteData(System.Runtime.InteropServices.SafeBuffer) + 0x175 <-- 0x7fe852b2fa5
20, LTR.IO.ImDisk.Devio.Server.Services.DevioShmService.RunService() + 0x509 <-- 0x7fe852b17a9
21, LTR.IO.ImDisk.Devio.Server.Services.DevioServiceBase.ServiceThreadProcedure() + 0x24 <-- 0x7fe852b0e24
22, System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) + 0x172 <-- mscorlib.ni.dll+0x58df12
23, System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) + 0x15 <-- mscorlib.ni.dll+0x58dd95
24, System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) + 0x55 <-- mscorlib.ni.dll+0x58dd65
25, System.Threading.ThreadHelper.ThreadStart() + 0x55 <-- mscorlib.ni.dll+0x633ea5
26, clr.dll!LogHelp_LogAssert+0x1f13
27, clr.dll!LogHelp_LogAssert+0x1e28
28, clr.dll!LogHelp_LogAssert+0x2690
29, clr.dll!ClrCreateManagedInstance+0x20bf
30, clr.dll!LogHelp_LogAssert+0x3540
31, clr.dll!LogHelp_LogAssert+0x34b3
32, clr.dll!LogHelp_LogAssert+0x33f2
33, clr.dll!LogHelp_LogAssert+0x35cf
34, clr.dll!ClrCreateManagedInstance+0x1fa9
35, clr.dll!LogHelp_LogAssert+0x68c5
36, kernel32.dll!BaseThreadInitThunk+0xd
37, ntdll.dll!RtlUserThreadStart+0x1d

How to reproduce:

  1. Create dynamic .vdi or .vmdk file of 1TB size.
  2. Mount to drive B: and format for NTFS 4KB fast.
  3. Start copying 300GB of files from an SSD into it.
  4. After ~50GB it does hang and system from now can not be rebooted nor a process terminated.

I have reproduced it on both .vdi (fastes coping) and .vmdk (slower, the stack trace is taken from). The .vhd is too slow to reach the 50GB point (< 10MB/s on any files) and might be not affected because of slowness (I feel there is might be some kind of a race condition with high speed writes).

@andry81 andry81 changed the title Hangs on write into .vdi or .vdmk mounted image Hangs on write into .vdi or .vmdk mounted image Nov 16, 2022
@andry81
Copy link
Author

andry81 commented Nov 16, 2022

I have tried the Arsenal Image Mounter v3.9.228 and got the same bug but, with slightly different stack trace at ~60GB point. I've took it after termination of the hanged dotnet.exe process (all threads has closed instead of 2, where one have has cycles).

Stack trace of one thread with cycles after dotnet.exe termination initiated by the PH

0, ntoskrnl.exe!KeInsertQueueApc+0x5cb
1, ntoskrnl.exe!ObfReferenceObject+0x99d
2, ntoskrnl.exe!KeDelayExecutionThread+0x186
3, ntoskrnl.exe!RtlFreeUnicodeString+0xa3
4, ntoskrnl.exe!IoCreateController+0x13ad
5, ntoskrnl.exe!ObReferenceObjectByHandle+0x7c9
6, ntoskrnl.exe!KeInsertQueueApc+0x6cc
7, ntoskrnl.exe!KiCpuId+0x2730
8, ntoskrnl.exe!longjmp+0x5c97
9, ntdll.dll!NtWaitForMultipleObjects+0xa
10, KernelBase.dll!GetCurrentProcess+0x40
11, kernel32.dll!WaitForMultipleObjectsEx+0xb3
12, coreclr.dll+0x3587f
13, coreclr.dll+0x356f1
14, coreclr.dll+0x35260
15, 0x7fe87fa23c7
16, 0x7fe87fa4443
17, 0x7fe87fa3fa0
18, 0x7fe87f9efce
19, 0x7fe87f9ee0a
20, 0x7fe87fa742f
21, 0x7fe87f9df7a
22, 0x7fe87fa4d2f
23, 0x7fe87fa485f
24, 0x7fe87f993b6
25, 0x7fe87f985c4
26, System.Private.CoreLib.dll+0x2c286f
27, coreclr.dll!coreclr_shutdown_2+0x16003
28, coreclr.dll+0x56d9c
29, coreclr.dll!coreclr_execute_assembly+0x26723
30, coreclr.dll+0x336f5
31, coreclr.dll+0x335fa
32, coreclr.dll+0x33419
33, kernel32.dll!BaseThreadInitThunk+0xd
34, ntdll.dll!RtlUserThreadStart+0x1d

At this time it could unblock the hanged copy operation with the Win32 Exception dialog after ~2-4 mins of waiting (update: the dialog does appear only if the Remove button been pressed in the UI beside the dotnet.exe process termination):

Error when dismounting virtual disk.

The system cannot find the file specified. (2)

Do you want to forcefully remove the virtual disk device instead?

I don't know, but I've issued the mountvol <mounted-drive> /d and have terminated dotnet.exe process and that may be somehow got loose the hang.

Or is that connected to the CLR and .NET?

.NET runtime installer: windowsdesktop-runtime-6.0.10-win-x64.exe

@andry81
Copy link
Author

andry81 commented Nov 17, 2022

I've minimized the reproduction steps and just copy now 40GB single file. It got stuck on ~10GB get copied. I think this somehow related to the physical memory size which in mine case is 64GB.

The .vdi file can be created through the Arsenal Image Mounter UI as is.

Then updated .NET Desktop runtime up to 6.0.11. Reproduced the issue.

Then I've made w:\1.vdi to mount and w:\2.vdi to copy into first one. I have used Windows Explorer to copy. It got stuck at very end. Created the minidump from the Process Hacker.

dotnet.exe.zip

@andry81
Copy link
Author

andry81 commented Nov 17, 2022

I've updated .NET Desktop runtime up to 7.0.0. The case with the w:\1.vdi file is not reproduced. But the initial SSD copy operation still does reproduce.

Then I downgraded to .NET Desktop runtime 5.0.17. Run Arsenal Image Mounter v3.6.188 and got the same bug as for the ImDisk without ability to revive the system. The stack trace was looking the same with the WriteFile in the middle.

@LTRData
Copy link
Owner

LTRData commented Nov 17, 2022

Thanks for your report.

I'll do some tests soon and see what I can reproduce here.

Could this possibly be related to cache of some kind? If you look at Task Manager, memory tab, while this happens, do you notice a huge increase of used memory that slowly drops down to normal levels again during the period where the drive appears hung? Do you notice any differences when the target image file is stored in different locations such as on system volume, external disks, network shares etc?

If you want to forcefully remove an Arsenal Image Mounter mounted disk, the best option is usually to use aim_ll.exe tool:
aim_ll -d (to remove all mounted disks) or aim_ll -d -m D: (to remove disk mounted to drive letter D:) or aim_ll -d -u 000100 (to remove disk with a specified device number in Arsenal Image Mounter driver). You can use aim_ll -l to list all mounted disks.
https://github.com/ArsenalRecon/Arsenal-Image-Mounter/tree/master/Command%20line%20applications

If you kill the dotnet.exe process that mounted the disk, the disk will hang and all I/O to it will eventually timeout. The only way to remove it is then usually to use aim_ll -d.

@andry81
Copy link
Author

andry81 commented Nov 17, 2022

Could this possibly be related to cache of some kind? If you look at Task Manager, memory tab, while this happens, do you notice a huge increase of used memory that slowly drops down to normal levels again during the period where the drive appears hung?

No. Memory hasn't any noticeable consumption.

Do you notice any differences when the target image file is stored in different locations such as on system volume, external disks, network shares etc?

I didn't notice the difference. But I've only tested it for 2 volumes from HDD and SSD.

If you want to forcefully remove an Arsenal Image Mounter mounted disk, the best option is usually to use aim_ll.exe tool:
aim_ll -d (to remove all mounted disks) or aim_ll -d -m D: (to remove disk mounted to drive letter D:) or aim_ll -d -u 000100 (to remove disk with a specified device number in Arsenal Image Mounter driver). You can use aim_ll -l to list all mounted disks.

There is no such utility, only aim_cli.exe

If you kill the dotnet.exe process that mounted the disk, the disk will hang and all I/O to it will eventually timeout. The only way to remove it is then usually to use aim_ll -d.

Currently this is the only way to revive the system. By timeout.

@LTRData
Copy link
Owner

LTRData commented Nov 17, 2022

Thanks for testing. I have done some tests here, but I cannot reproduce anything like this.

If you want to forcefully remove an Arsenal Image Mounter mounted disk, the best option is usually to use aim_ll.exe tool:
aim_ll -d (to remove all mounted disks) or aim_ll -d -m D: (to remove disk mounted to drive letter D:) or aim_ll -d -u 000100 (to remove disk with a specified device number in Arsenal Image Mounter driver). You can use aim_ll -l to list all mounted disks.

There is no such utility, only aim_cli.exe

You need to download aim_ll using the link:
https://github.com/ArsenalRecon/Arsenal-Image-Mounter/blob/master/Command%20line%20applications

If you kill the dotnet.exe process that mounted the disk, the disk will hang and all I/O to it will eventually timeout. The only way to remove it is then usually to use aim_ll -d.

Currently this is the only way to revive the system. By timeout.

If possible, could you test with aim_ll -d and see if that removes the mounted disk? I am curious to know that, because it points in a bit different direction depending on whether that helps or not.

@andry81
Copy link
Author

andry81 commented Nov 19, 2022

Thanks for testing. I have done some tests here, but I cannot reproduce anything like this.

Can't say it is clearly related to the ImDisk or Arsenal Image Mounter. There is another driver which uses mounting - VeraCrypt. I've feelings it somehow involved into the issue. I am using 1.25.9 has installed over 1.24-update7 using .inf file directly (currently there is an issue with the installation of the last installer on Windows 7 x64 and another issue with uninstalling of a previous version).

May be you could test using mine case?

@LTRData
Copy link
Owner

LTRData commented Nov 19, 2022

Thanks for more details, I'll do some tests together with VeraCrypt as well and see what I can find.

One thing that really surprises me here, is that you get slower I/O with vhd format. That is usually the fastest and most efficient, since it does not use DiscUtils implementation but instead a separate highly optimized kernel driver implementation. Or more specifically, that is in the case of AIM, maybe you compared performance when using ImDisk instead? Could you try with a mounted vhd in AIM and see if you still get the hang? Also, is it the same if you use Windows built-in features for mounting the vhd (in Disk Management)?

@LTRData
Copy link
Owner

LTRData commented Nov 21, 2022

I have done some tests now with VeraCrypt installed and running as well. I can however still not reproduce any similar problem.

@andry81
Copy link
Author

andry81 commented Dec 6, 2022

I've tried to retest the case and got stuck on another issue.

If try to run Arsenal Image Mounter and create .vdi or .vhd image then, the app does freeze with the Please wait... dialog. But, the buttons still can be pressed and I pressed Refresh and saw incompletely mounted item in the list (has no drive letter). If open the Device Manager dialog and open tree item with disk drives then, there would be the Arsenal Virtual SCSI Disk Device item together with an orange exclamation icon. The properties shows the driver is programmatically disabled (code 32) but, if open driver details tab there would be a list of files loaded together with the Arsenal driver:

C:\Windows\system32\DRIVERS\disk.sys
C:\Windows\system32\drivers\partmgr.sys
C:\Windows\system32\drivers\veracrypt.sys

The Arsenal Image Mounter GUI still usable and I can just remove the item with the exception:

DriveNotFoundException

Error when dismounting virtual disk.

Device 000000 is not ready

No drive found for device number 000000

Do You want to forcefully remove the virtual disk device instead?

[Yes] [No]

The Arsenal app still shows the Please wait... splash dialog and I think this is another bug.

But, I've disable all suspicious or relevant drivers before the test: Veracrypt, VirtualBox, ImDisk. Did remove .NET Desktop runtime 7.0.0.

Seems the Veracrypt somehow has forced to load together with the Arsenal driver.

Update:
I renamed the sys file and it made the trick. The driver file is removed from the list. But the freeze still persist.

@LTRData
Copy link
Owner

LTRData commented Dec 6, 2022

I would recommend that you search registry for references to veracrypt.sys. I did not expect it to load as a filter driver in another stack in this way, but it seems like it does in your case. Or something else might be interfering in some way. Not sure really what is going on. If you have another machine or virtual machine etc where you can try out configuration changes, could you try and see if you can reproduce it there and if not, which of applications installed can you install there too without this issue happening?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants