-
-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minidump files get placed in the wrong directory and aren't uploaded to Sentry #833
Comments
Hi @mntt-SSE! I could not look into this on any of my Windows machines yet, but it seems you can access both types of machines to verify behavior and configuration. Can you look in the registry to see whether the machines differ in the settings for |
Thx for verifying, @mntt-SSE; that is what I expected. |
Maybe I should also say, that what is interesting are the values in |
For further root-causing, it would certainly help if you could also gather the debug output of the Native SDK on the systems where the dump happens in |
The I have so far tried to:
I tried restarting the computer after each step, but nothing worked :/ I also checked the corresponding keys under I tried attaching WinDbg to crashpad_handler.exe but I don't seem to get any logging from it in the Logs widget. Do I have to build it with a certain configuration or flag? I should also mention that I haven't used WinDbg before so I might be missing something. |
What I meant was that even though you have compared the two logs, which were identical on both types of machines, it would be great to see them. I am asking because the only other reason I can imagine this happening is if our backend didn't start correctly (and for some reason,
That means that it is configured to use the default values. |
Ah, sorry, here is what's being printed on startup from a computer that doesn't work (which is the same as when it works):
|
Okay, the logs are wrapped, interleaved, and incomplete. They also show some other transmission (maybe a manual event) but not the crash (because with Maybe it is easier to debug this issue with the plain
Since you You can also verify this externally by deleting your database path, executing your program, provoking a crash, and looking into (the newly created) path; there must be a |
I have been plagued by the same issue, that some machines just refuse to send any reports. After reading this discussion and talking to @supervacuus on discord I spent a day trying a bunch of things: To start by answering the last question, if the 2 last lines of the sentry output: Those lines does appear on working machines and does not on the machines that doesn't work. This indicates that the The first thing that proved that it was not a Windows issue (as a lot of blogs and discussions around the web indicates it could be) was that it worked to do So something in the startup process overwrites our exception handler, and I'm pretty sure it is not any of our dependencies since it should behave the same for all machines in that case, and I couldn't find any reference to I tried the I finally tried replacing the |
Thank you @Cyriuz and @supervacuus for looking into this! I have also done some more investigation. Firstly, the reason I didn't get the full Sentry logs was because I initialized Sentry before the other log handling. When I switched the order I got output identical to yours @supervacuus, except for the Secondly, I have found a workaround. I'm building a Qt application (Qt 5.15.8), and I noticed that if I initialize Sentry after the first window is shown (using |
If you run this after
I tried to simplify the other examples I found as much as possible but I'm no Windows programmer so take it with a grain of salt. The best way to get to the bottom of what it is that clears the filter would be to make a proper hook, and print the callstack in there, but its somewhat involved and I'm not sure what I would do with the info anyway since I guess it theoretically could be multiple things. To get the WER module you have to build sentry-native with |
That is some very good feedback, thanks for the investigation! We have indeed heard reports before that Qt / display drivers are constantly overwriting the While it would be possible to hook / lock overwriting it, we decided against that, as it is a very deep "incision" that we don’t want to do by default. |
Fair enough. Although I wonder if it can still be considered somewhat standard as so many other projects seem to do the same? And considering that sentry-native is, in my understanding at least, supposed to take ownership of the entire crash reporting pipeline I feel like there might be an argument for including it, I guess it could even be optional? Or maybe it could just be a good topic for a page on the sentry docs as a workaround so people can find it easier? I assume it potentially affects most people that use sentry native on Windows. |
IMO hooking the There are legitimate uses for temporarily using it and then restoring the original handler afterwards. We don’t want to break those usecases. Although as the examples show, restoring the original handler is often not done correctly by other tools either. We might want to offer this either as a compile time flag or an explicit runtime API though, wdyt @supervacuus ? Then the responsibility of (not) breaking any unpredictable usecases is up to the app author. |
Just going to throw in my 2 cents here. Had a very similar issue with regards to QT (5.15.8), the moment it was initialized the entire crash reporting seized to function. There we're no registry keys, or misplaced dumps, in fact there was absolutely nothing. A quick test basically confirmed what I suspected, I would throw and exception right after sentry initialized, it worked. Did the same right after calling Weirdly enough it worked on some Windows machines just fine (e.g: server, ...) but on others basically nothing worked. No dumps, no logs (only the one indicating that sentry had initialized), no indication that something went wrong whatsoever. |
This issue is also affecting a user of the Unreal SDK v0.16.0. It happens with both auto and manual initialisation:
Is there any updates on this issue? |
We must understand that we cannot solve this problem within the Native SDK. While a solution like the one proposed here: #833 (comment) can work to mitigate the issue, we have to be clear that this kind of solution is the reason why the problem exists in the first place. We already had support topics that took weeks to resolve. In the end, we discovered that a customer code dependency overwrote Also, IMO, the correct approach for Sentry is to extend the "Advanced Usage" documentation and provide users of the Native SDK with steps to identify the issue, isolate potential culprits, and implement solid mitigations. I did this with multiple users in an ad hoc fashion over the last two years, and while all users had different setups and root causes, there were a lot of common steps in between. Another aspect is to collect known culprits and potential mitigations, if any. Since the focus over the last six to nine months has been primarily on supporting downstream SDKs to improve quality on mobile platforms, these Desktop documentation topics have been put on the back burner. But we will tackle them. cc: @kahest |
While I respect your decision and stance on the issue, mostly for discussions sake, I would argue that because of the nature of this library, its only purpose being catching crashes, the argument that the "solution is the reason why the problem exists in the first place" is not really helpful. For sure it is not good that many dependencies and drivers use this api wrong/hooks it, but that doesn't change the fact that this is the situation we're in. We want to take ownership of the exception handling but can't possibly do it any other way. If it was just dependencies it would be one thing, it is easy enough to debug who is resetting the hook and we can complain upstream, but for drivers that does it on customer PCs, its just not feasible. Is it really that big of a deal to say that we don't want anyone else to be able to handle exceptions, given the scope of this lib? My proposed fix doesn't involve using the apis wrong, just making sure no one else is allowed. I agree that it feels wrong in the context of doing it in a library, and maybe it is also wrong to do it here, but I see no other option and I think it would help a lot of users of this lib to save a lot of time debugging things. In any case docs covering all of this that is easy to find would be great so you can easily find and make your own decisions :) |
Description
I'm using the Native SDK with the bundled crashpad_handler, and I noticed that crashes for some users wouldn't get uploaded to Sentry. After some digging I realized that for those users that didn't get their crashes uploaded, the minidump (dmp) files got placed in a different directory from the one configured with
sentry_options_set_database_path
. The directory they get placed in is[user]/Local/AppData/CrashDumps
. It only happens for some users, but for those this is consistent.Does anyone know what could be the cause of this? Or what I could do to debug it further? I've looked in the documentation and source code, but I haven't been able to figure out how I get the crashpad_handler to log stuff.
Thanks in advance!
When does the problem happen
Environment
Steps To Reproduce
It's only reproducible on some machines, but this is how I set up the crash handler in my app:
Log output
I've compared the Sentry Native output between a computer where everything is working fine and one where it is not, and they are identical. I don't know how to get output from the crashpad_handler.
The text was updated successfully, but these errors were encountered: