Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Re-enable storage maintenance call #7227

Closed
grigoryk opened this issue Dec 16, 2019 · 34 comments · Fixed by #27689, fork-house/fenix#14, Leland-Takamine/fenix#159 or nathanmkaya/fenix#108
Closed
Assignees
Labels
eng:health Improve code health P2 Upcoming release performance Possible performance wins S2 Major Functionality/product severely impaired and a satisfactory workaround doesn't exist
Milestone

Comments

@grigoryk
Copy link
Contributor

grigoryk commented Dec 16, 2019

In #2765 we added periodic storage maintenance calls that run once a day (or so), during startup.

In #6937 we saw that this interferes with our fennec migration code, which also runs on startup. So, maintenance was disabled.

In #6938 I proposed a solution for this problem, but it doesn't apply now that we're running migration code in a background service.

This issue is to re-enable maintenance call in a way that doesn't interfere with other parts of the browser that may need to acquire higher levels of SQL locks (reserved+).

┆Issue is synchronized with this Jira Task

@grigoryk grigoryk added eng:health Improve code health performance Possible performance wins labels Dec 16, 2019
@boek boek added P2 Upcoming release S2 Major Functionality/product severely impaired and a satisfactory workaround doesn't exist labels Dec 18, 2019
@boek
Copy link
Contributor

boek commented Dec 18, 2019

@grigoryk are we tracking this anywhere? Or is this a nice-to-have in the future?

@mcomella mcomella added this to Needs prioritization in Performance, front-end roadmap Dec 26, 2019
@mcomella mcomella moved this from Needs prioritization to Needs prioritization: waiting in Performance, front-end roadmap Jan 21, 2020
@mcomella
Copy link
Contributor

Perf self triage: this doesn't seem like it'll land in GA so moving to low impact so we don't have to track it.

@mcomella mcomella moved this from Needs prioritization: waiting to Low impact (unprioritized) in Performance, front-end roadmap Feb 24, 2020
@mcomella mcomella removed the performance Possible performance wins label Aug 3, 2020
@mcomella mcomella removed this from Low impact (unprioritized) in Performance, front-end roadmap Aug 3, 2020
@mcomella
Copy link
Contributor

mcomella commented Aug 3, 2020

This isn't inherently a performance problem so removing it from our board. However, the proposed solution may have perf implications so if that's the case, please let us know. Thanks!

@mcomella mcomella added the performance Possible performance wins label Aug 19, 2020
@mcomella mcomella added this to Needs prioritization in Performance, front-end roadmap via automation Aug 19, 2020
@mcomella
Copy link
Contributor

mcomella commented Aug 19, 2020

From the original issue:

@grigoryk indicated that it does not [run maintenance], which will result in worse perf, more disk usage, etc.

Re-adding to our board because this may impact long term app performance.

I'd guess the solution should happen via WorkManager when the app isn't running though if it does happen while the app is running, please don't put it into the startup critical path: it should probably live in the visualCompletenessQueue if it happens during startup.

@mcomella mcomella moved this from Needs prioritization to Top 10 Inter-Team bugs in Performance, front-end roadmap Aug 19, 2020
@jonalmeida jonalmeida assigned jonalmeida and grigoryk and unassigned jonalmeida Sep 28, 2020
@grigoryk grigoryk removed this from ⏳ Sprint Backlog in A-C: Android Components Sprint Planning Oct 5, 2020
@stale
Copy link

stale bot commented Mar 27, 2021

See: #17373 This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Mar 27, 2021
@stale stale bot closed this as completed Apr 3, 2021
Performance, front-end roadmap automation moved this from Top 10 Inter-Team bugs to Done Apr 3, 2021
@mcomella mcomella added this to Needs prioritization in Performance, front-end roadmap via automation Apr 5, 2021
@mcomella mcomella removed the wontfix label Apr 5, 2021
@mcomella
Copy link
Contributor

mcomella commented Apr 5, 2021

@grigoryk I noticed stalebot closed this but this seems important. What do you think the urgency/priority is on this issue? If high, do you think it's something the team would have cycles to work on?

@mhammond
Copy link
Contributor

Thought about two more issues worth considering:

* I think we'd want to use a dedicated writer for this though, so we don't interrupt/cancel the wrong operation by mistake when using a shared writer.

I don't think using a dedicated writer will be able to work without the risk of the "other" writer seeing "database is locked" errors. I do agree we need to work out how to not interrupt the wrong operation though (and I think Ben was suggesting a way to "scope" the interrupt, but I haven't thought it through)

* We should also make sure we don't run this every time the device is idle. Maybe just once a day?

The call only removes 6 visits each time it is called, so doing it once per day seems unlikely to effectively keep a cap on the number of visits.

@csadilek
Copy link
Contributor

csadilek commented Aug 30, 2022

The call only removes 6 visits each time it is called, so doing it once per day seems unlikely to effectively keep a cap on the number of visits.

OK, makes sense! But running it every time the device is idle might also not be ideal for battery consumption. If you all have a suggestion of how often we should ideally run this, please let us know. We could start with a compromise and introduce some upper limit of runs per day otherwise.

@mhammond
Copy link
Contributor

It's certainly true that we are never calling it today, so ramping up slowly should in theory have some small benefit. Another challenge is that a world where this has been in-place and being called regularly (ie, there will usually be just a few records to purge) is quite different to the world we have today (where there will be many as it catches up).

IOW, calling it daily is still better than what we do today, so let's do that! But it does open the question about telemetry etc - how will we ever know when we've found the sweet-spot?

@bendk
Copy link
Contributor

bendk commented Aug 31, 2022

I don't think using a dedicated writer will be able to work without the risk of the "other" writer seeing "database is locked" errors. I do agree we need to work out how to not interrupt the wrong operation though (and I think Ben was suggesting a way to "scope" the interrupt, but I haven't thought it through)

I didn't have a plan for how to stop this, so maybe adding another writer is the simplest way to move forward. My one concern is that the number of connections continues to grow as we add more interruptible operations. If that happens we should probably come up with a different situation.

OK, makes sense! But running it every time the device is idle might also not be ideal for battery consumption. If you all have a suggestion of how often we should ideally run this, please let us know. We could start with a compromise and introduce some upper limit of runs per day otherwise.

I like the idea of starting once per day and increasing based on telemetry. Maybe we count how many devices are still over the size target after running maintenance and try to get that down to some small-ish number.

As to how often maybe we could return a boolean if the DB is still above the size target after running maintenance. If that's false, then just run it once a day (maybe even less if the app isn't used everyday). If it's true, then run it more often, with some upper limit.

@Mugurell
Copy link
Contributor

Saw this next in our backlog but it seems like there are still discussions to take plans and a dependency in AS?
Not sure what the next steps are for Fenix. @csadilek @mhammond @bendk .

@mhammond
Copy link
Contributor

mhammond commented Oct 10, 2022

I thought we had a plan written somewhere in one place but can't find it. I believe we agreed:

  • We'd start small, running once per day, with a target size of 75 MiB
  • We accept that this is not interruptable because we expect it to complete very quickly. If we don't already, it would be fine to interrupt the "writer" connection at shutdown, which would interrupt any executing operations, possibly including this, which is fine.
  • Christian suggested that it might make sense to put this behind a feature flag in Nightly.
  • We have already landed telemetry in app-services in #5123, so as long we we have a fairly recent app-services we should get telemetry. Our team will monitor that and offer advice for next steps (which will hopefully be to run more often than once per day)

I'm sure @bendk or @csadilek will correct what I got wrong or add what I forgot.

@csadilek
Copy link
Contributor

csadilek commented Oct 11, 2022

Thanks @mhammond. Yes, I think this captures it all. The rest of the discussion happened in mozilla/application-services#5115.

@Mugurell Please sync up with @jonalmeida and @kycn though, because I think they had planned to look into this soon, or possibly already started.

@kycn kycn self-assigned this Oct 12, 2022
kycn added a commit to kycn/fenix that referenced this issue Nov 3, 2022
Re-enable storage maintenance call by introducing WorkManager worker on A-C side and consuming it from Fenix.
The work request is periodic and the repeat interval is 24h. It requires the device to be idle and not to have
low battery. This feature is available only for Nightly for now.
@github-actions github-actions bot added the eng:reopen-for-qa Reopens and tags the issue for QA needed when the issue is merged label Nov 3, 2022
@kycn
Copy link
Contributor

kycn commented Nov 3, 2022

Status update:
A PeriodicWorkRequest is introduced in A-C with a 24h repeat interval.
It runs when the following constraints are met: setRequiresBatteryNotLow(true) and setRequiresDeviceIdle(true). Fenix enqueues this request to WorkManager if visualCompletenesQueue is ready.
The related PRs: Fenix and A-C.
The feature is behind a feature flag and is enabled for only Debug and Nightly variants.
@csadilek @jonalmeida @mhammond @bendk

@mergify mergify bot closed this as completed in #27689 Nov 3, 2022
mergify bot pushed a commit that referenced this issue Nov 3, 2022
Re-enable storage maintenance call by introducing WorkManager worker on A-C side and consuming it from Fenix.
The work request is periodic and the repeat interval is 24h. It requires the device to be idle and not to have
low battery. This feature is available only for Nightly for now.
Performance, front-end roadmap automation moved this from Perf P2-ish (unordered) to Done Nov 3, 2022
@github-actions github-actions bot added this to the 108 milestone Nov 3, 2022
@github-actions github-actions bot reopened this Nov 3, 2022
@github-actions github-actions bot added eng:qa:needed QA Needed and removed eng:reopen-for-qa Reopens and tags the issue for QA needed when the issue is merged labels Nov 3, 2022
Performance, front-end roadmap automation moved this from Done to Needs triage Nov 3, 2022
@jonalmeida
Copy link
Contributor

No QA testing is needed here.

Performance, front-end roadmap automation moved this from Needs triage to Done Nov 8, 2022
@jonalmeida jonalmeida removed the eng:qa:needed QA Needed label Nov 8, 2022
JohanLorenzo pushed a commit to mozilla-mobile/firefox-android that referenced this issue Feb 14, 2023
…e call.

Re-enable storage maintenance call by introducing WorkManager worker on A-C side and consuming it from Fenix.
The work request is periodic and the repeat interval is 24h. It requires the device to be idle and not to have
low battery. This feature is available only for Nightly for now.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
eng:health Improve code health P2 Upcoming release performance Possible performance wins S2 Major Functionality/product severely impaired and a satisfactory workaround doesn't exist
Projects
Android Team Backlog Staging Board
Security and Performance (top 10 issues)
9 participants