Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application using scalable allocator crashes on Windows 11 when opening a file dialog #665

Closed
myoung301 opened this issue Nov 19, 2021 · 17 comments
Assignees

Comments

@myoung301
Copy link

A minimal program was created to reproduce the issue. The program opens a file dialog using GetOpenFileName.

On Windows 10, the program runs successfully both with and without the TBB scalable allocator.

On Windows 11, the program crashes when the scalable allocator is used. The crash occurs right after the file dialog displays. The application runs successfully with the scalable allocator removed.

The TBB allocator was enabled using the linker option:

tbbmalloc_proxy.lib /INCLUDE:"__TBB_malloc_proxy"

The code:

#include <windows.h>
#include <Commdlg.h>

int WINAPI WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow)
{

    OPENFILENAME ofn;
    char szFile[500];
    ZeroMemory(&ofn, sizeof(ofn));
    ofn.lStructSize = sizeof(ofn);
    ofn.hwndOwner = NULL;
    ofn.lpstrFile = szFile;
    ofn.lpstrFile[0] = '\0';
    ofn.nMaxFile = sizeof(szFile);
    ofn.lpstrFilter = "All\0*.*\0Text\0*.TXT\0";
    ofn.nFilterIndex = 1;
    ofn.lpstrFileTitle = NULL;
    ofn.nMaxFileTitle = 0;
    ofn.lpstrInitialDir = NULL;
    ofn.Flags = OFN_PATHMUSTEXIST | OFN_FILEMUSTEXIST;
    GetOpenFileName(&ofn);

    MessageBox(NULL, ofn.lpstrFile, "File Name", MB_OK);
    return 0;
}
@anton-potapov
Copy link
Contributor

@myoung301 can you provide more details :

  • compiler (and version) being used
  • call stack of the crash
  • version of the build (debug /release )

@myoung301
Copy link
Author

@anton-potapov

compiler: Visual Studio 2017 (15.0.0)
version: release

Call stack from windbg:

ntdll!RtlReportFatalFailure+0x9
ntdll!RtlReportCriticalFailure+0x97
ntdll!RtlpHeapHandleError+0x12
ntdll!RtlpHpHeapHandleError+0x7a
ntdll!RtlpLogHeapFailure+0x45
ntdll!RtlpFreeHeapInternal+0x825
ntdll!RtlFreeHeap+0x51
SHELL32!operator delete+0x28
SHELL32!CFSIconOverlayManager::MaybeAddIconOverlayHandlersFromStateRepo+0x185464
SHELL32!CFSIconOverlayManager::LoadIconOverlayIdentifiers+0xb7
SHELL32!CFSIconOverlayManager::_s_LoadIconOverlayIdentifiers+0x76
SHELL32!CFSIconOverlayManager::LoadNonloadedOverlayIdentifiers+0x13c
SHELL32!EnableExternalOverlayIdentifiers+0x4a
SHELL32!CFSIconOverlayManager::RefreshOverlayImages+0xc0
SHELL32!SHELL32_GetIconOverlayManager+0x98
windows_storage!CFSFolder::_GetOverlayInfo+0x124
explorerframe!CNscOverlayTask::_Extract+0x65
explorerframe!CNscOverlayTask::InternalResumeRT+0x55
explorerframe!CRunnableTask::Run+0xce
windows_storage!CShellTask::TT_Run+0x46
windows_storage!CShellTaskThread::ThreadProc+0xdd
windows_storage!CShellTaskThread::s_ThreadProc+0x44
shcore!ExecuteWorkItemThreadProc+0x20
ntdll!RtlpTpWorkCallback+0x171
ntdll!TppWorkerThread+0x686
KERNEL32!BaseThreadInitThunk+0x10
ntdll!RtlUserThreadStart+0x2b

@anton-potapov
Copy link
Contributor

@zheltovs could you please take a look?

@anton-potapov
Copy link
Contributor

@myoung301, we were not able to reproduce the issue at the moment.
Does it shows all the times ? Or sporadically ?
As well could you please specify the exact command line used to compile the sample ?

@vlserov
Copy link

vlserov commented Dec 9, 2021

@myoung301, we are able to reproduce the issue with your steps and got same call stack.
Analysis is in progress.

@vlserov vlserov self-assigned this Dec 9, 2021
@Sedeniono
Copy link

Hi, could you please share some more details regarding the reproduction? We received a crash dump from a customer showing the exact same call stack, but I failed to reproduce the issue so far.

What I tried: I compiled the example code above on Windows 10 (not 11) using Visual Studio 2019, Windows SDK 10.0.19041.0, C++14, Release, cl.exe 19.29.30137. The scalable allocator of oneTBB comes from revision 4eec89f.
I then copied the binaries to the Windows 11 Professional machine which uses a clean install of Win11 (and installing all offered updates). I tried it with versions 10.0.22000.282 and 10.0.22000.348 of the shell32.dll and the Windows.FileExplorer.Common.dll, to no avail.
After digging around in shell32.dll, I guessed that the crash might have something to do with cloud sync tools such as owncloud creating an entry under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\SyncRootManager and syncing the user's Desktop. But unfortunately, I was still not able to reproduce the issue.

It seems some specific Windows setting or software is required to trigger the problem?

@metallermartin
Copy link

We have customer reports of crashes showing the same stack trace. However, we use mimalloc as a custom allocator instead of oneTBBs. So the issue does not seem specific to oneTBB, but is more a general windows issue.

However we also failed to reproduce the crashes on our dev machines or local VMs. I agree with @Sedeniono assessment that there must be some specific setting or software (probably containing explorer items) that triggers the crash.

@myoung301
Copy link
Author

I and a colleague both confirmed that if we sign out of OneDrive, the crash goes away. We have to be careful to fully sign out (the 'Unlink this PC' option in settings).

We've also reproduced the problem with mimalloc, as mentioned. So there is an issue when using either custom allocator and OneDrive on Windows 11.

@Sedeniono
Copy link

Sedeniono commented Dec 14, 2021

@myoung301 @vlserov I can confirm that the example posted above crashes both with oneTBB and mimalloc if OneDrive is syncing the Desktop. By that I mean OneDrive's option to backup the Desktop. Note that if you enable this option, the Desktop folder itself gets moved under the OneDrive folder.

After some more debugging, I can confirm that one ingredient of the crash is indeed the registry path I mentioned above. More precisely, one requirement for the crash to occur is that the SID-key ("S-1-5-21-...") under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\SyncRootManager\OneDrive!......!101\UserSyncRoots needs to contain a path that contains the Desktop. The path by default points to C:\Users\<User>\OneDrive, and the Desktop got moved by OneDrive to C:\Users\<User>\OneDrive\Desktop. If you modify the SID-key to e.g. C:\Users\<User>\OneDriveNotExisting, the crash no longer occurs. If you "Unlink this PC" in the OneDrive settings, the entire registry key gets deleted.
In principle, the same could apply to any cloud sync tool (although I have, unexpectedly, not observed it yet with ownCloud).

I tested it with versions 10.0.22000.282 and 10.0.22000.348 of the shell32.dll and the Windows.FileExplorer.Common.dll files, and in both versions the FileOpen-Dialog crashes.

In short, I think it is a bug in shell32.dll, where memory is allocated via a redirected malloc but then free'd directly (bypassing mimalloc/oneTBB); maybe someone can confirm this. More details below.

One workaround is to ensure that the Desktop is not located under the OneDrive folder, or any other synced folder that is mentioned in the SyncRootManager registry key. Besides this, a temporary solution is of course to disable the custom allocator, e.g. via the environment variable.

More details:
All of the following analysis used version 10.0.22000.282 of the shell32.dll and the Windows.FileExplorer.Common.dll. There are some minor difference compared to 10.0.22000.348, but none that matter.

The following image shows the first relevant part of shell32.dll!CFSIconOverlayManager::MaybeAddIconOverlayHandlersFromStateRepo:
1_CFSIconOverlayManager_MaybeAddIconOverlayHandlersFromStateRepo

Notice that it first gets the path to the Desktop (SHGetKnownFolderPath()). Then the first virtual function call (the first __guard_xfg_dispatch_icall_fptr) calls CSyncRootManager::GetStorageProviderInfoFromPath which only returns something relevant if the Desktop is synced. This checks the registry key mentioned above.

If this is the case, the second virtual function call CStorageProviderInfo::GetIconOverlayHandlerClsidArray is reached. This function in turn allocates memory if additional conditions are met. I am not entirely sure about them: The conditions are met in case of OneDrive, but not in case of ownCloud, and I failed to find the systematic differences between these two applications (both put stuff into the SyncRootManager registry path and both register icon overlays). In any case, for OneDrive it continues to allocate memory via windows.fileexplorer.common.dll!operator new:
2_malloc
which in turn calls ucrtbase!malloc, which is redirected (in this case by mimalloc):
3_mallocRedirected

Now, back in shell32.dll!CFSIconOverlayManager::MaybeAddIconOverlayHandlersFromStateRepo, the code continues to do stuff with the allocated memory, and finally tries to delete it:
4_CFSIconOverlayManager_MaybeAddIconOverlayHandlersFromStateRepo_Crash
It calls shell32.dll!operator delete[], which calls ntdll!FreeHeap directly:
5_operator_delete
No references to mimalloc in the FreeHeap callstack as far as I can see. In the end, the ntdll!RtlLogHeapFailure is called.

So, I think we have a Microsoft bug in shell32.dll: Memory is freed not via an ordinary (and thus redirected) free() but rather directly via the Heap-functions, which seems incorrect to me. The memory allocation in windows.fileexplorer.common.dll, on the other hand, does use the ordinary malloc() correctly. This mismatch probably causes the crash?

@ChrisGuzak
Copy link

for MS folks, http://task.ms/38281009

@Sedeniono
Copy link

@ChrisGuzak Am I interpreting your post correctly that the shell32.dll team at Microsoft is now aware of the issue? Otherwise, what does your link point to?

@rodyflee
Copy link

rodyflee commented Mar 4, 2022

@ChrisGuzak Is this article meant to be access by non MS people? Or can we get access to it?

@isaevil
Copy link
Contributor

isaevil commented Nov 30, 2022

@ChrisGuzak what is the status on this?

@holenno1
Copy link

So MSFT appears to have fixed this issue with KB5020044 (preview) or KB5021255 (stable).

The release notes for KB5020044 has this bullet point:

It addresses an issue that might cause certain apps to stop responding. This occurs when you use the Open File dialog.

Our preliminary testing does suggest the crash we experienced on Windows 11 with the Open File dialog box has gone away with this Windows update.

Digging into the Windows binaries (thanks @Sedeniono for his deep dive), previously (Windows 11 build 22000) CStorageProviderInfo::GetIconOverlayHandlerClsidArray allocates memory using operator new[](size_t, std::nothrow_t)

?GetIconOverlayHandlerClsidArray@CStorageProviderInfo@@UEAAJPEAIPEAPEAU_GUID@@PEAH@Z:
  00000001800284E0: 48 89 5C 24 08     mov         qword ptr [rsp+8],rbx
  00000001800284E5: 48 89 6C 24 10     mov         qword ptr [rsp+10h],rbp
  00000001800284EA: 48 89 74 24 18     mov         qword ptr [rsp+18h],rsi
...
  0000000180028556: E8 D1 B2 FD FF     call        ??_U@YAPEAX_KAEBUnothrow_t@std@@@Z
...
  00000001800285B2: 48 83 C4 20        add         rsp,20h
  00000001800285B6: 41 5F              pop         r15
  00000001800285B8: 41 5E              pop         r14
  00000001800285BA: 5F                 pop         rdi
  00000001800285BB: C3                 ret

which is then freed with HeapFree in shell32.dll via operator delete.
As noted earlier, this is the reason for the crash we were seeing.

In Windows 11 build 22621, CStorageProviderInfo::GetIconOverlayHandlerClsidArray now calls a helper function CopyClsidsToCoTaskMemAlloc

?GetIconOverlayHandlerClsidArray@CStorageProviderInfo@@UEAAJPEAIPEAPEAU_GUID@@PEAH@Z:
  00000001800496E0: 41 83 21 00        and         dword ptr [r9],0
  00000001800496E4: 48 81 C1 48 02 00  add         rcx,248h
                    00
  00000001800496EB: E9 08 D7 FF FF     jmp         ?CopyClsidsToCoTaskMemAlloc@@YAJAEBV?$vector@U_GUID@@V?$allocator@U_GUID@@@std@@@std@@PEAIPEAPEAU_GUID@@@Z

which in turn calls CoTaskMemAlloc to allocate memory.

?CopyClsidsToCoTaskMemAlloc@@YAJAEBV?$vector@U_GUID@@V?$allocator@U_GUID@@@std@@@std@@PEAIPEAPEAU_GUID@@@Z:
  0000000180046DF8: 48 8B C4           mov         rax,rsp
  0000000180046DFB: 48 89 58 08        mov         qword ptr [rax+8],rbx
...
  0000000180046E62: 48 FF 15 A7 10 03  call        qword ptr [__imp_CoTaskMemAlloc]
                    00
...
  0000000180046ED6: 48 83 C4 20        add         rsp,20h
  0000000180046EDA: 41 5F              pop         r15
  0000000180046EDC: C3                 ret

Instead of operator delete/HeapFree, this memory is then freed in shell32.dll by
wil::unique_any_array_ptr<_GUID, wil::function_deleter<void (*)(void *),CoTaskMemFree>, wil::empty_deleter>::reset()
image

The use of CoTaskMem* effectively work arounds the malloc/HeapFree debacle.


Regarding the use of HeapFree to free a malloc allocated pointer as seen in Windows 11 build 22000, I tend to believe this is not actually a MSFT bug.

Instead, I think link-time optimisation is to blame here. My hunch is that shell32.dll is built as part of the same project as ntdll.dll, which is how the body of malloc (which just calls RtlAllocateHeap with GetProcessHeap) got inlined into shell32.dll's operator new.

@nofuturre
Copy link

@myoung301 is this issue still relevant?

@myoung301
Copy link
Author

@nofuturre - sorry, I can not confirm. I am no longer using the version of Windows that was problematic.

@nofuturre
Copy link

If anyone encounter with this issue in future please open new issue with a link to this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests