-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android 8/9 default trust manager: OOM when getting "https://icloud.com" #520
Comments
That cert is a bit weird, because it doesn't actually match icloud.com (it's for images.apple.com). It looks like icloud.com returns that cert if SNI isn't provided, but OkHttp should always be enabling SNI. So it appears there are at least two problems here:
I'll try to figure out what's going on with the latter and point the OkHttp folks at the former. |
I have to apologize. Maybe I got the cert from openssl when debugging and took the wrong one. You would have to see what cert okhttp uses directly. |
Cool. Now that I've looked into this more, it looks like the icloud.com cert is pointing to a CRL that revokes around 10k certificates, and we're using a bunch of memory to parse all of those. That's kind of ridiculous on their part, but we should handle this better. |
Same test, similar (but not the same) exception again today:
|
Is this an issue of Android 8.0.0 which is fixed in 8.0.1 or later? |
Following up 2 years later - we're still seeing instances of this issue. A few of our CDNs were using yet another certificate by DigiCert that was pointing to a 15MB CRL (http://crl4.digicert.com/ssca-sha2-g6.crl). It took us almost 3 days to get a reply from DigiCert (even though we're a 'priority support' customer), and they've successfully pinned our account/did some other magic to point to a much smaller CRL (~300KB). We had to reissue all the certificates and update our endpoints, and once we did that it seems the crashes stopped :) This happens mostly on low-RAM devices running Android 8. I hope our experience will help others facing the same issue. |
What caused this problem, I also encountered a similar problem |
We are also encountering this problem intermittently on some phones and pretty consistently on the other phones, especially if the date on the phone is advanced, leading to SSL errors. Here is the partial stack trace from one of the phones: Phone and OS details: Android: 9 Additional Information: |
It looks like the Android platform ( In parallel, I did not expect this code path in the trust manager to be doing online CRL checking - it explicitly disables it in favour of stapled OCSP. I wonder if this is a vendor change... the Have you seen this on non-Samsung devices? Or on Android 11 or newer? |
@prbprbprb My apologies for the delay in responding. Here is the base part of the URL that we were facing problems with: https://assets.adobedtm.com/ (The CRL size is ~20MB) Coming to the non-Samsung devices, we don't do a lot of work with non-Samsung devices, so we haven't seen this issue on them. I am pretty sure that this was seen on an Android 11 device too. Hope this information is sufficient to continue the investigation. |
At Pinterest we're seeing this occasionally spike. We are also seeing it on Android 11+ devices as well as non-Samsung devices. Sample stack trace: java.lang.OutOfMemoryError: Failed to allocate a 50531696 byte allocation with 3200416 free bytes and 3125KB until OOM, target footprint 268435456, growth limit 268435456 |
I wonder if switch to any older release of okhttp3. in my case URL is I tried the following version.
and following version as well.
More detail is here I think, If i sync device time and then call API, it's seem to be working and not creating issue. any one else also have such experience? |
I think the issue here is all Conscrypt and Android, so switching okhttp versions probably won't help. There seem to be two things going on:
If you could flesh out your SO code into a working repro for me, that would help a lot with figuring out (1) and maybe preventing it. That change can probably go into an Android Mainline update of Conscrypt which will help devices running Android 11+. The RAM-eating cache is in Android platform code outside of Conscrypt. It's also fixable (which I'll try and do), but needs some more analysis to tune the cache sizes - I'm not sure we'd want to remove it altogether. Unfortunately there's no feasible way to get such a fix to older devices where it would do the most good. The Android runtime is a Mainline module in Android 12+ so we can get it there at least. |
Question on stackoverflow if you can workaround this. I suspect something like
Might work. |
I had a play around with this to confirm the issue. Seeing 30MB certificates for login.microsoftonline.com cashapp/certifikit#104 Looking at OneSignal/OneSignal-Android-SDK#1517, it appears that HttpUrlConnection is used by URICertStore.
So a potential workaround is to intercept these requests by overriding it.
https://gist.github.com/swankjesse/dd91c0a8854e1559b00f5fc9c7bfae70 We can't ship this in OkHttp even if it fixes things, because we aren't the right place to do Security policy changes, but maybe the play provider could workaround it? I suspect it might have a measurable impact on a bunch of android apps that are hitting this without realising it. It seems like a really common problem. @prbprbprb three questions
|
I can't reproduce the problem on Android 9 emulator. I think that's a good thing. My test is using URL.setURLStreamHandlerFactory and hitting the hostnames listed above. So I suspect either fixed in those new emulator images, or it's actually constrained to only the samsung devices that @prbprbprb mentioned. Any ideas how to reproduce with an emulator? or some particular device? |
Here is Repo, i pushed to test the API and I observe the memory usage gets increases when i start app after killing it from the recent list in android and then only lunch app after disabling of Automatic date & time and timezone and set date as 10-15 day ahead of the current date. Steps:-
For now, I have resolved the issue by forcing the auto time sync as we are using the app as Kiosk mode with custom android 8.1. |
@himanshumistri Thanks, that was it. Failing on Android 9 emulator. So it confirms it is not solely a Samsung issue. HTTP Connections may not work anyway because of time but I can at least reproduce the CRL request.
This does give another option for OkHttp users. |
|
Apologies, I missed the @himanshumistri 's repro link above, I'll have a look at that. The fact that it's reproducible on the emulator pretty much rules out any OEM changes being involved. I also did some archeology and the code which supposedly disables CRL checking was submitted in 2016, and shipped in Android 8.0, and it's been largely unchanged since. |
If you want to repro with a URLHandler you can tweak, this commit to a branch will be useful yschimke/okhttp@e73c446 |
Understanding this a bit better now. Doesn't reproduce on 8.0, but on 8.1 it does download the CRL. I believe the trigger is whether the OCSP response stapled to the cert is valid which is a window of about 5 days into the future). On 8.0 setting the date a few days in the future has no effect, but setting it further causes a handshake failure with I suspect the bug is that the "mode" for the revocation status checker is set to Mitigation is going to be tricky if not impossible as 8.1 is outside its support window. |
Hi @prbprbprb @yschimke |
Sorry, not from me. But I assume he's talking about this https://cs.android.com/android/platform/superproject/+/master:libcore/ojluni/src/main/java/sun/security/provider/certpath/RevocationChecker.java;l=76;bpv=1 |
I need to double check my working - I just reproduced it on on 8.0, so maybe I wasn't careful enough about killing the process between runs... And FWIW it reproduces with a simple URL connection, as you'd expect. Worse, it sort of reproduces on Android 10, but doesn't cause memory issues because the CRL fetch gets suppressed for other reasons:
So I suspect (but have yet to confirm) that the logic is "wrong" on all current Android versions it just doesn't affect more recent versions because the CRL fetch fails for other reasons.
This is where the CRL download gets triggered. Obviously that's a bit of a hack, the correct fix is to ensure the If you're working with memory constrained devices, you might also want to tune these values. E.g. maybe 10 for the cert cache and less for the CRL cache and 1M for the max size. That's totally untested though! It sounds like you're an Android OEM? If you're registered on partner.android.com then please also raise a bug there, as it makes it easier for me to justify spending time on this. |
Soooooo..... It's actually a Conscrypt bug, I believe. The The emulator doesn't include Conscrypt source for 8.0/8.1 (dunno why), but with breakpoints in the Sun code I verified that (1) the previous paragraph is correct and (2) forcing the PR forthcoming which will fix the platform Conscrypt for R, S and T (and some Q) devices and I'll get it into the next release of the Maven libraries (due Real Soon Now!). However note that whilst this stops the latency and memory pressure of a CRL download, the TLS handshake will still fail as the OCSP data being outside of its trusted time window is considered a hard failure. |
Fixes google#520. If stapled OCSP is present, TrustmanagerImpl.setOcspResponses() adds a PKIXRevocationChecker to the list in the current PKIXParameters to check it, with option END_ENTITY_ONLY set to ensure only the root CA is checked. However unless the NO_FALLBACK option is also set then if the OCSP check fails (e.g. because the date on the device is wrong) then the Sun PKIXRevocationChecker will fall back to downloading a CRL and checking that. On Android 8.0/8.1 this causes large latency and possible OOM. On later Androids, the CRL download fails if the CRL distribution URL is plain HTTP, avoiding the issue but potentially causing confusing errors. I don't *yet* have a regression test for this as it turns out we don't have _any_ tests for the OCSP path
Fixes google#520. If stapled OCSP is present, TrustmanagerImpl.setOcspResponses() adds a PKIXRevocationChecker to the list in the current PKIXParameters to check it, with option END_ENTITY_ONLY set to ensure only the root CA is checked. However unless the NO_FALLBACK option is also set then if the OCSP check fails (e.g. because the date on the device is wrong) then the Sun PKIXRevocationChecker will fall back to downloading a CRL and checking that. On Android 8.0/8.1 this causes large latency and possible OOM. On later Androids, the CRL download fails if the CRL distribution URL is plain HTTP, avoiding the issue but potentially causing confusing errors. I don't *yet* have a regression test for this as it turns out we don't have _any_ tests for the OCSP path
Fixes google#520. If stapled OCSP is present, TrustmanagerImpl.setOcspResponses() adds a PKIXRevocationChecker to the list in the current PKIXParameters to check it, with option END_ENTITY_ONLY set to ensure only the root CA is checked. However unless the NO_FALLBACK option is also set then if the OCSP check fails (e.g. because the date on the device is wrong) then the Sun PKIXRevocationChecker will fall back to downloading a CRL and checking that. On Android 8.0/8.1 this causes large latency and possible OOM. On later Androids, the CRL download fails if the CRL distribution URL is plain HTTP, avoiding the issue but potentially causing confusing errors. I don't *yet* have a regression test for this as it turns out we don't have _any_ tests for the OCSP path, but confirmed by single stepping that adding NO_FALLBACK prevents CRL download when the OCSP data is rejected.
Fixes google#520. If stapled OCSP is present, TrustmanagerImpl.setOcspResponses() adds a PKIXRevocationChecker to the list in the current PKIXParameters to check it, with option END_ENTITY_ONLY set to ensure only the root CA is checked. However unless the NO_FALLBACK option is also set then if the OCSP check fails (e.g. because the date on the device is wrong) then the Sun PKIXRevocationChecker will fall back to downloading a CRL and checking that. On Android 8.0/8.1 this causes large latency and possible OOM. On later Androids, the CRL download fails if the CRL distribution URL is plain HTTP, avoiding the issue but potentially causing confusing errors. I don't *yet* have a regression test for this as it turns out we don't have _any_ tests for the OCSP path, but confirmed by single stepping that adding NO_FALLBACK prevents CRL download when the OCSP data is rejected.
Fixes google#520. If stapled OCSP is present, TrustmanagerImpl.setOcspResponses() adds a PKIXRevocationChecker to the list in the current PKIXParameters to check it, with option END_ENTITY_ONLY set to ensure only the leaf certificate is checked. However unless the NO_FALLBACK option is also set then if the OCSP check fails (e.g. because the date on the device is wrong) then the Sun PKIXRevocationChecker will fall back to downloading a CRL and checking that. On Android 8.0/8.1 this causes large latency and possible OOM. On later Androids, the CRL download fails if the CRL distribution URL is plain HTTP, avoiding the issue but potentially causing confusing errors. I don't *yet* have a regression test for this as it turns out we don't have _any_ tests for the OCSP path, but confirmed by single stepping that adding NO_FALLBACK prevents CRL download when the OCSP data is rejected.
@prbprbprb - just to confirm - will your fixes also fix the OOM in #520 (comment)? And would this fix would have to come via an OS update or can OkHttp take a fix? |
@prbprbprb please correct me if I'm wrong. My understanding is that it's unlikely to get out to Android 8 and 9 via system updates, even if it lands in the source for them. And it hasn't landed yet #1066 Do mainline updates mean this gets deployed to Android 10 and onwards? https://source.android.com/devices/architecture/modular-system https://www.xda-developers.com/android-project-mainline-modules-explanation/ There is no fix possible in OkHttp. I assume you could still do one of the following: a) Use Play provider for Conscrypt? |
…ad. (#1066) Fixes #520. If stapled OCSP is present, TrustmanagerImpl.setOcspResponses() adds a PKIXRevocationChecker to the list in the current PKIXParameters to check it, with option END_ENTITY_ONLY set to ensure only the leaf certificate is checked. However unless the NO_FALLBACK option is also set then if the OCSP check fails (e.g. because the date on the device is wrong) then the Sun PKIXRevocationChecker will fall back to downloading a CRL and checking that. On Android 8.0/8.1 this causes large latency and possible OOM. On later Androids, the CRL download fails if the CRL distribution URL is plain HTTP, avoiding the issue but potentially causing confusing errors. I don't *yet* have a regression test for this as it turns out we don't have _any_ tests for the OCSP path, but confirmed by single stepping that adding NO_FALLBACK prevents CRL download when the OCSP data is rejected.
…ad. (google#1066) Fixes google#520. If stapled OCSP is present, TrustmanagerImpl.setOcspResponses() adds a PKIXRevocationChecker to the list in the current PKIXParameters to check it, with option END_ENTITY_ONLY set to ensure only the leaf certificate is checked. However unless the NO_FALLBACK option is also set then if the OCSP check fails (e.g. because the date on the device is wrong) then the Sun PKIXRevocationChecker will fall back to downloading a CRL and checking that. On Android 8.0/8.1 this causes large latency and possible OOM. On later Androids, the CRL download fails if the CRL distribution URL is plain HTTP, avoiding the issue but potentially causing confusing errors. I don't *yet* have a regression test for this as it turns out we don't have _any_ tests for the OCSP path, but confirmed by single stepping that adding NO_FALLBACK prevents CRL download when the OCSP data is rejected.
(Copied from square/okhttp#4155)
When using okhttp (which uses the platform default custom manager, which seems to be provided by Conscrypt) to
GET
(orPROPFIND
) https://icloud.com (without www) with code like that:it takes a long time until the process is killed with OOM and the test fails:
This happens with Android 8.0 and 9.0 (emulator from SDK), but not with Android 4.4 (haven't tested other versions yet).
The problem occurs with okhttp 3.10.0 and 3.11.0 (haven't tested other versions yet).
Everything is working for some other URLs I have tested, including www.icloud.com. It seems to be related to parsing the certificate. When using a custom trust manager (from https://gitlab.com/bitfireAT/cert4android), it works.
I don't know whether this is an okhttp problem (looks like an Android problem?), but I guess it's quite important to understand why a simple
GET
request causes the whole process to crash.The questionable certificate seems to be:
The text was updated successfully, but these errors were encountered: