Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] CACHE_MANAGER BSOD on Windows 10 #1061

Closed
philiparvidsson opened this issue Aug 24, 2018 · 20 comments
Closed

[bug] CACHE_MANAGER BSOD on Windows 10 #1061

philiparvidsson opened this issue Aug 24, 2018 · 20 comments

Comments

@philiparvidsson
Copy link

philiparvidsson commented Aug 24, 2018

Description

On 2 of 3 Windows 10 machines, I receive a BSOD when I open PDF files with SumatraPDF. It is 100% reproducible on the two machines, but does not happen at all on the third (a Dell XPS laptop). The other two have nothing in common except that they both have Intel i7 CPUs and NVIDIA GeForce GPUs and that they are both stationary with two monitors connected.

Software

Version: SumatraPDF x64 3.1.2 (portable)
OS: Windows 10 Pro x64 (1803)

No AV-software on any of the machines.

Steps to reproduce

  1. Open PDF with SumatraPDF
  2. Close SumatraPDF
  3. Open same PDF with SumatraPDF (must be same PDF!)

Expected result

PDF is displayed on screen.

Actual result

Windows 10 BSOD's with "CACHE_MANAGER" related issue (no more information than that is provided on the BSOD).

Attachments

image

@philiparvidsson philiparvidsson changed the title CACHE_MANAGER BSOD on Windows 10 [bug] CACHE_MANAGER BSOD on Windows 10 Aug 24, 2018
@SumatraPeter
Copy link

SumatraPeter commented Aug 24, 2018

  1. Reproducible with any PDF?

  2. Reproducible with the latest pre-release version?

  3. Does command-line dev option -console show anything useful?

@Mattiwatti
Copy link
Contributor

This is a bug in a kernel mode component (either a driver or the kernel itself), hence the BSOD instead of a "SumatraPDF has stopped working..." message like you'd see with a regular application crash. Most likely some behaviour of SumatraPDF (possibly in combination with your specific hardware) is exposing this bug.

I can't reproduce the BSOD here, but if you could upload a crash dump file (the screen says it's generating one, normally it should be at %SystemRoot%\MEMORY.DMP) that might be good enough to establish the cause. Make sure the system crash dump level is at least 'kernel memory dump' (right click This PC -> Properties -> Advanced System Settings -> 'Settings...' under 'Startup and Recovery' -> set the dropdown box to 'kernel memory dump' if it's not already). Also check 'disable automatic deletion...', because this setting seems to confuse Windows 10, making it delete a fresh crash dump immediately after rebooting.

@philiparvidsson
Copy link
Author

I'll get back to you guys tomorrow. The issue is reproducible with any valid PDF file on the systems I encounter it on. I'll try the pre-release tomorrow as well.

@GitHubRulesOK
Copy link
Collaborator

@Mattiwatti
Copy link
Contributor

@GitHubRulesOK The Windows SDK debugging tools are indeed the best way to diagnose a crash, but the correct approach to doing this depends greatly on whether the crash occurs in user or kernel mode. The advice in your link is excellent if you have a problem where SumatraPDF crashes. However, in this case it is the system that is crashing, which cannot normally be caused by a user mode applications, which can merely expose or exacerbate some existing bug in the kernel code to trigger the vulnerability/bug.

More briefly stated, the following step

4.4) When Sumatra crashes, type: !analyze -v and paste the result of that to the bug report

is slightly over-optimistic for this case 😄 Rather, what will happen is that Sumatra will appear to run for a short time, until the blue screen comes in and it is game over. At this point you can forget about saving a process dump or running !analyze -v, because WinDbg, SumatraPDF, Microsoft Clippy, and all other user mode processes are dead.

Diagnosing a kernel mode crash is in fact very similar to the above, except that you will need (ideally) a second machine to debug the victim machine that is going to crash. However this is not practical to set up for the average user, and a kernel developer can often find enough context to isolate the cause from a crash dump alone (this is 'post mortem debugging'). This is why crash dump files are so useful - if you are met with a bugcheck, you're going to have to reboot regardless, so being able to step through the code wouldn't have been a lot of help anyway.

@philiparvidsson I've just discovered that it is possible to regain the good old bugcheck parameters that made the whole damn screen useful in the first place. So if you can't obtain a crash dump, try simply importing this as a .reg file and rebooting:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl]
"DisplayParameters"=dword:00000001

Your next BSOD should then look something like this:
proper
Note the 4 tiny numbers in the top left - those are the parameters for this particular bugcheck.

From parameter 1 it is possible to tell that the exact security violation was 'Stack cookie instrumentation code detected a stack-based buffer overrun'. Parameter 2 gives the address of the trap frame for the bugcheck-causing exception, and parameter 3 gives the exception record address for same (viewable with .exr in WinDbg). The last parameter is reserved/unused.

So, this 'number stuff for nerds' is actually pretty damn useful, because it allows developers to actually fix bugs! Too bad Microsoft decided that giant smiley faces are more important.

@philiparvidsson
Copy link
Author

So, some results:

  1. I tried the pre-release version. I started it, and since it was not associated with pdf file (portable version), I drag-dropped pdf files into it. I could not reproduce the BSOD.
  2. Next, I tried the version that BSODs for me on this and another computer. I drag-dropped files into it. I could not reproduce the BSOD.
  3. I tried double-clicking the PDF instead. First time it loaded, second time, the BSOD occurred.
  4. I associated the new pre-release version with PDF files and repeated the process. No BSODs occurred.

I'm sticking with the pre-release for now. Do you guys still want a crash dump?

(Also some amazing knowledge sharing re. kernel/user mode and debugging here - big thanks for that!)

@Mattiwatti
Copy link
Contributor

Thanks for the detailed steps. I tried the same on both a live machine and a VirtualBox VM with Windows 10 x64. They are Enterprise editions, not Pro, but I really doubt that matters. I didn't really expect to be able to reproduce the issue to be honest, it seems far too hardware and/or device driver specific for that.

And yes, I would really like to see a crash dump of this if it's not too much trouble for you to make one, because this is an interesting case as one of those things that 'shouldn't happen' (TM). Crash dumps can be quite big, but they compress very well due to being sparse. The best method is to reboot (after making sure the crash dump settings are correct as described above), produce the BSOD directly after startup, reboot again and then 7zip the .dmp file. I normally make complete memory dumps (32 or 64 GB) and they can often be reduced to ~250 MB this way.

@philiparvidsson
Copy link
Author

Ok, I configured my system according to your instructions - below is a kernel mem dump.

BSOD

image

image

Parameter values

0x0000000000000273
0xFFFFFFFFC0000420
0x0000000000000000
0x0000000000000000

MEMORY.DMP

Link (~2GB, compressed to ~150MB): http://philiparvidsson.com/pub/memory.dmp.zip

@kjk
Copy link
Member

kjk commented Aug 26, 2018

@philiparvidsson There's about 0% chance that this is caused by Sumatra.

The contract between the OS and user-mode applications (like Sumatra) is such that a user mode app cannot crash the OS.

It's caused by bad driver.

Based on google research, it's most likely related to a storage device.

The way to fix it is to uninstall or update the driver.

See e.g.

The screenshots doesn't provide much information. You can probably find out more information by looking at Event Viewer (https://www.reviversoft.com/blog/2013/12/how-to-find-out-the-cause-of-your-bsod/ or use http://www.nirsoft.net/utils/blue_screen_view.html).

You want to find out which driver caused the crash (usually the file with .sys extension). Then use google to figure out which software it belongs to. Upgrade or uninstall the software.

Or google for that name + "bsod" or "windows 10 crash". Chances are there were other people with the same driver causing the same crash and they figured how to fix it.

Let us know if you manage to solve it, but I'll close this bug soon because there's no way this is caused by Sumatra.

@philiparvidsson
Copy link
Author

philiparvidsson commented Aug 26, 2018

@kjk have a look at my conversation with @Mattiwatti - he might be able to do something with the mem dump.

Re. storage - one of the two computers has a Samsung 850 EVO, the other one (from today) has a Samsung 860 EVO. So that might be it, I guess. shrugs I don't have any attached devices (USB storage etc) and PDF was hence read from my hard drive. Both computers only have a single drive in them. They don't share mobo (this one has an Asus Z270F mobo, no idea what the other one has) but I guess it's possible they're sharing some storage controller chip.

Also, I only encounter this with SumatraPDF (not meaning to put blame on SumatraPDF here, but it's interesting to investigate what problem its behavior is exposing!)

Oh and also, the pre-release version is not causing the BSOD, so it's somehow related to what SumatraPDF is doing - not completely unrelated.

@kjk
Copy link
Member

kjk commented Aug 26, 2018

I ran the analyze on that .dmp: https://gist.github.com/kjk/fbc5ad95b396eaa8dd0161fb355e73a7

The process that actually crashed is dropbox and the crash seems to happen in file reading code. But that's all I can see there.

You might try:

@philiparvidsson
Copy link
Author

@kjk, good catch! If I close Dropbox, there's no crash when opening a PDF.

  1. Current version (3.1.2 x64 portable) BSODs only when Dropbox is running.
  2. Pre-release version never BSODs.

I recently read that Dropbox is injecting code into other processes, so that might be related. See this for some info: https://www.ghacks.net/2018/08/20/about-google-chromes-incompatible-applications-warning/ (TL;DR: Chrome has started issuing warnings on applications that are injecting code into it - Dropbox is one of them. Dropbox basically seems to be injecting code everywhere.)

I don't want to turn this into some political discourse, but it bothers me to no end that Dropbox does this (despite pre-release version working fine). I don't know if there's any chance of getting hold of the Dropbox devs re. this. I'm a looong time Dropbox user, and Dropbox is becoming more and more of a monster.

Either way, I'm still curious what it is that's actually causing the BSOD with SumatraPDF in particular.

@kjk
Copy link
Member

kjk commented Aug 26, 2018

That looks like a serious bug in Dropbox. I believe they actually do have a kernel driver that might cause BSOD. You could report this to them, pointing to this discussion, maybe via https://www.dropboxforum.com/t5/Desktop-client-builds/bd-p/101003016

I don't think it's particular to Sumatra. Sumatra seems to trigger that bug in Dropbox but it doesn't mean that there are no other ways to trigger it, just that we don't know them.

The issue here seems related to re-reading the same file the second time. Per your information, this happens when you re-open the same PDF file. CACHE_MANAGER suggests that it's related to file caching subsystem of the OS. It looks like Dropbox, on first read, corrupts kernel side file cache memory which is then detected by kernel when the file is read the second time.

I imagine other programs with similar pattern might trigger the same bug for example opening the same text file twice in a notepad or opening the same image file twice in image viewer app etc.

I'm closing this bug as there's nothing I can do in Sumatra to fix it.

@kjk kjk closed this as completed Aug 26, 2018
@Mattiwatti
Copy link
Contributor

Mattiwatti commented Aug 26, 2018

Haha, yes, I was just writing a reply to blame Dropbox. It is the current process at the time of the crash.

There seem to be some strange issues with the dump file that I have never seen before:

Page 200085d73 too large to be in the dump file.
Page 200085d74 too large to be in the dump file.
Page 200085d73 too large to be in the dump file.
Page 20008bab5 too large to be in the dump file.
Page 20008bab5 too large to be in the dump file.
Page 20008bab5 too large to be in the dump file.
Page 20008bab5 too large to be in the dump file.
Page 200086d6f too large to be in the dump file.
Page 200080026 too large to be in the dump file.
...
EXCEPTION_RECORD:  ffffffffc0000420 -- (.exr 0xffffffffc0000420)
Cannot read Exception record @ ffffffffc0000420

This is a bit frustrating because I am looking for evidence that Dropbox is messing up (meaning in kernel mode, because they have recently started deploying kernel drivers(!) via their obnoxious constant updates). However, the stacktrace does not indicate any third party driver interference:

ffffc005`2587f2d8 fffff800`28e704fd : 00000000`00000034 00000000`00000273 ffffffff`c0000420 00000000`00000000 : nt!KeBugCheckEx
ffffc005`2587f2e0 fffff802`3ba15aa5 : 00000000`00000000 ffffe780`00000000 ffffaf00`00000053 ffffe780`00000000 : nt!CcCopyReadEx+0x1a266d
ffffc005`2587f370 fffff802`3ba183cd : 00000000`00000053 ffffc005`2587f600 ffffe780`c54fc340 00000000`00000001 : Ntfs!NtfsCachedRead+0x1a1
ffffc005`2587f3e0 fffff802`3ba16238 : ffffaf00`705c07d8 ffffaf00`6fe1e010 00000000`00000088 ffffe780`c54fc3d0 : Ntfs!NtfsCommonRead+0x1f9d
ffffc005`2587f5d0 fffff800`28cba199 : ffffaf00`70876af0 ffffaf00`6fe1e010 ffffaf00`6fe1e010 ffffaf00`68c5c8d0 : Ntfs!NtfsFsdRead+0x1d8
ffffc005`2587f690 fffff802`3ad87207 : ffffc005`2587f760 fffff802`3ad853b4 00000000`00000000 00000000`00000003 : nt!IofCallDriver+0x59
ffffc005`2587f6d0 fffff802`3ad851c6 : ffffc005`2587f760 ffffaf00`6cdd1a30 00000000`00400000 00007fff`ffff0000 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x157
ffffc005`2587f740 fffff800`28cba199 : ffffaf00`6fe1e010 fffff800`28cba465 00000000`00000000 00000000`00000000 : FLTMGR!FltpDispatch+0xb6
ffffc005`2587f7a0 fffff800`2916d54b : ffffc005`2587fa80 ffffaf00`6cdd1a30 00000000`00000000 00000000`19e8ed98 : nt!IofCallDriver+0x59
ffffc005`2587f7e0 fffff800`292065d2 : ffffaf00`00000000 ffffaf00`6cdd1a30 00000000`00000000 ffffc005`2587fa80 : nt!IopSynchronousServiceTail+0x1ab
ffffc005`2587f890 fffff800`28e5a343 : ffffffff`ffffffff 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtReadFile+0x692
ffffc005`2587f990 00000000`77851e4c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`19e8ed78 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77851e4c

Furthermore no Dropbox driver (that I'm aware of) can be found in your loaded module list, although you do have a lot of crap in it:

*** ERROR: Module load completed but symbols could not be loaded for WRkrn.sys
*** ERROR: Module load completed but symbols could not be loaded for SamsungRapidFSFltr.sys
*** ERROR: Module load completed but symbols could not be loaded for SamsungRapidDiskFltr.sys
*** WARNING: Unable to verify timestamp for Null.SYS
*** ERROR: Module load completed but symbols could not be loaded for Null.SYS
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for drmk.sys - 
*** ERROR: Module load completed but symbols could not be loaded for bcmwl63a.sys
*** ERROR: Module load completed but symbols could not be loaded for LGBusEnum.sys
*** ERROR: Module load completed but symbols could not be loaded for LGJoyXlCore.sys
*** ERROR: Module load completed but symbols could not be loaded for lgcoretemp.sys
*** ERROR: Module load completed but symbols could not be loaded for wrUrlFlt.sys
*** ERROR: Module load completed but symbols could not be loaded for LGVirHid.sys

These are the 'wtf is this' drivers that I found off hand. Note that Null.sys is NOT a third party driver and should checksum correctly and have debug symbols loaded. But this driver is located in one of the pages that could not fit in the dump file, so I can't say whether this is a real issue or just a hiccup with the crash dump.

Unrelated, but you should nuke those Samsung Rapid Mode drivers ASAP. There are numerous issues with those drivers, and worst of all, they even degrade performance compared to not having Rapid Mode enabled (naturally, since the whole idea of it is to poorly reimplement a cache that already exists in the kernel, which does a fine job on its own).

!analyze -v output (abbreviated):

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

CACHE_MANAGER (34)
    See the comment for FAT_FILE_SYSTEM (0x23)

FAT_FILE_SYSTEM                  (0x23)

    If you see FatExceptionFilter on the stack then the 2nd and 3rd
    parameters are the exception record and context record. Do a .cxr
    on the 3rd parameter and then kb to obtain a more informative stack
    trace.

Arguments:
Arg1: 0000000000000273
Arg2: ffffffffc0000420
Arg3: 0000000000000000
Arg4: 0000000000000000

Because the filesystem is not FAT at all but NTFS, this a bogus analysis, or a lazy bugcheck by a developer who should have used a different bug code.

The exact bugcheck call you are seeing is this:

BOOLEAN NTAPI CcCopyReadEx(PFILE_OBJECT FileObject, PLARGE_INTEGER FileOffset, ULONG Length, BOOLEAN Wait, PVOID Buffer, PIO_STATUS_BLOCK IoStatus, PETHREAD IoIssuerThread)
{
  // ...
  if (Length + FileOffset->QuadPart > FileObject->SectionObjectPointer->SharedCacheMap->FileSize.QuadPart))
    KeBugCheckEx(CACHE_MANAGER, 0x273, 0xFFFFFFFFC0000420, 0UL, 0UL);
}

For CC (cache manager) bugchecks, the first parameter (here 0x273 / 627) is the line number. So that's pretty useless information.
The second parameter is simply an NTSTATUS lazily extended to fit a ULONG_PTR. NTSTATUS 0xC0000420 = STATUS_ASSERTION_FAILURE.

Interestingly this bugcheck was not always here; older versions of Windows simply truncate the filesize if this happens.

Now to find out if and how I can reproduce this so I can crash people's Windows 10 machines 😄

(Edit: updated code snippet using the full SHARED_CACHE_MAP definition dumped from PDBs to make it more readable.)

@philiparvidsson
Copy link
Author

Wow, amazing job @Mattiwatti - genuinely impressive. I'll try disabling the RAPID mode drivers and see what happens! Brb!

Would a complete memory dump be more useful for you (would prefer sending it over a private channel, e.g., link via mail)? Do you need a new dump?

@philiparvidsson
Copy link
Author

No go, still crashing with RAPID mode off! 👎

@Mattiwatti
Copy link
Contributor

Sure, I will take a look at a complete dump if you're willing to send me one. However it's possible that no further useful info can be found in it (depending on what exactly Dropbox is doing, from which process, and when). Keep in mind that you'll probably need to increase the size of your pagefile to get such a dump (to, say, your total RAM + 1 GB).

I do agree with @kjk (and my own first post... heh) that this issue is unrelated to Sumatra, so I think it would be better to continue this discussion via email. My address is my github username@gmail.com.

@GitHubRulesOK
Copy link
Collaborator

Kudos to @philiparvidsson for perseverance and @Mattiwatti for expert insights

@kjk may be worth updating wiki debug page to point to the moved debugger link as I found above and also mention a link to this as a typical example of tracking a bug on BSOD crash

@philiparvidsson
Copy link
Author

In case Dropbox devs arrive here (I've forwarded the issue to their sec team which in turn forwarded it to their desktop client team): @Mattiwatti and I are delving deeper into this, feel free to contact me (and I assume @Mattiwatti as well) if you need more information.

@PavelPr
Copy link

PavelPr commented Mar 11, 2021

Hello everyone!

I apologize for the (almost three years) late off-topic, but despite not being a SumatraPDF user, I have encountered a similar issue as well: an out-of-the-blue CACHE_MANAGER bugcheck. Having Googled about this issue, I am now here. I was hoping you guys might have some insights, as @philiparvidsson said that he and @Mattiwatti were diving deeper into the issue.

The NTSTATUS I see is the same: STATUS_ASSERTION_FAILURE. Yet, the call stack is different:

nt!KeBugCheckEx
nt!CcAcquireByteRangeForWrite+0x1f35a1
nt!CcFlushCachePriv+0x31b
nt!CcWriteBehindInternal+0x1f4
nt!CcWriteBehind+0x91
nt!CcWorkerThread+0x259
nt!ExpWorkerThread+0x105
nt!PspSystemThreadStartup+0x55
nt!KiStartSystemThread+0x28

The failing process is System, which tells me that this was either directly or indirectly triggered by ring0 code. Perhaps you guys could shed some light?

Thanks and apologies again for the off-topic!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants