[Bug] On slow network, main thread blocked after clicking url bar from homescreen on cold startup due to MLS #9935
Comments
@csadilek Do you know what might have changed in a-c to have caused this? |
I think the possible solutions are:
I'd rather we do 2. so we can mimic the performance characteristics of production builds (and developers are closer to running what production builds do in their debug builds) but I'm not sure how feasible that is. |
For context, MLS is the Mozilla Location Service which is used to determine roughly where the user is so we can provide locale-specific search engines for them. During triage we realized this could be indicative of an underlying performance problem: if a production version of MLS without an API key pauses the app for 10+ seconds, can the production version with the API key also create a critical performance issue? @pocmo Can you explain to me why this timeout occurs if we don't have an API key? Do you think it's possible that this is indicative of an underlying performance problem we need to address? fwiw, csadilek thought it could be a perf problem but not a P1 perf problem. Assigning self to remember to identify the priority. |
This is not the cause. It is trying to put even more bandaid on this thing. :)
There are definitely multiple weird things going on in Fenix and I didn't have the time to look into them. Fenix has some wrappers around the AC search code and those seem to change the behavior in a way that I do not understand yet.
So yeah, it does make sense to use the dummy implementation in builds that don't have an API key. No need to ping the server for nothing. But at the same time the app should not freeze if this request takes long and as of now it can take at least 10+ seconds with those timeouts. |
FWIW, those are exactly the same timeouts as Fennec uses: |
We're blocking the main thread when fetching geolocation for search engines. This is even worse when there is no MLS key (like with the two build flavors that mcomella has mentioned). Sebastian and mcomella have good suggestions in their comments above, and these would be good perf bugs to try profiling (for before/after). Might be useful to have some SearchFragment knowledge
|
Question for UX, while we're waiting for search engines to load, we're planning on just showing an empty list. |
@sblatz @boek I checked out the latest master 673507d and I still see the issue in profiles: https://share.firefox.dev/2ZjPRDd It looks like |
I investigated the issue I discovered. It appears the following call stack happens:
When the region has not been successfully fetched, we try to fetch it from the network. The connection timeout is 10s and the read timeout is 10s so, in the worst case, we'd block for 20s in a single call. If the region has been successfully fetched, we fetch it from SharedPrefs. This method gets called multiple times on SearchFragment startup so it's possible to block for longer than 20s if we're unable to connect to the server. However, if we ever connect to the server successfully once, we cache the value and this should never happen again because we can always read from SharedPrefs. I think we need still need to address this issue: even if it's unlikely to happen and only then likely to happen only once, the potential blocking for multiple seconds for the first user action on first run leaves a poor first impression and I'd prefer to avoid that. Furthermore, in the event that the MLS server is down, every new user would experience this. Notes on reproducing:
|
Could this be causing this crash if we're loading without search engines? #11906 |
I believe this will be resolved with the PR #11974 (review) |
From #11974 (review) this was verified from the perf perspective. @boek Does this need manual QA as well? |
Steps to reproduce
geckoBetaFennecBeta
orgeckoBetaForPerformanceTest
Expected behavior
Quick to open
Actual behavior
Long pause. In
forPerformanceTest
with the profiler going (so with much overhead), it takes > 10 seconds.Device information
I took a profile and it looks like
FenixSearchEngineProvider.installedSearchEngines
is blocking the main thread for a long time:I looked at blame and nothing changed in Fenix so this is likely an issue in a-c.
Comment form @MarcLeclair : This issue occurs because Fenix blocks the UI thread when calling
LocationServices
inFenixSearchEngineProvider.kt
in methodinstalledSearchEngine()
. The best way to see this is to putLocationServiceS.dummy()
in theelse
branch. Since the app built locally doesn't have any token, the app makes empty http request that just hangs the app.FYI it does occur on the first click as the most obvious but it will happen on any subsequent calls, it just seems faster ( no idea why, didn't look much further into it).
┆Issue is synchronized with this Jira Task
The text was updated successfully, but these errors were encountered: