Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting started #2

Open
jakearchibald opened this issue Feb 24, 2015 · 38 comments
Open

Getting started #2

jakearchibald opened this issue Feb 24, 2015 · 38 comments

Comments

@jakearchibald
Copy link

I'm hearing from partners and developers that this is really important. One of the most common questions I get is "How much can I store in the cache?" and partners whince when I admit that the browser is free to nuke the unsynced local data at will.

I'm fond of the simplicity of this proposal:

navigator.requestStorageDurability().then(yey, ney);

…but I'd like to get a small cross-browser group together to flag up any huge concerns, bikeshed the API, and eventually help with the spec.

Mozilla: @annevk @sicking @wanderview - who's interested?
Microsoft: @jacobrossi - who'd be best to talk to get involved from MS, is it you?
Apple: Any ideas?

@wanderview
Copy link

I'm interested.

How does this relate to kinu/quota-api? Is it a replacement, an extension, or tangential?

I definitely think we need something for this problem. I'm just trying to determined if solving it is tied to the quota-api. I think some folks here are unsure of the current quota-api direction.

@jakearchibald
Copy link
Author

I think this replaces the "request" parts of the quota API (@slightlyoff correct me if I'm wrong) but it can still be used for looking up quota. Perhaps navigator.storageQuota is a natural home for requestStorageDurability, but if storageQuota is contentious I don't have a problem keeping them separate.

Once we work out who wants to be involved I'll set up a call to work through some of that stuff.

@sicking
Copy link

sicking commented Feb 25, 2015

There's a few things here that I think that are interesting to do:

  • The ability to say "Make all the stuff that I've saved be guaranteed not to be thrown out based on browser heuristics". This would sadly likely involve a prompt unless the user has installed the website. And likely this would not just persist stuff stored in cache, but also stuff stored in IndexedDB and localStorage.
  • We've gotten requests from developers to be able to store cached objects which can be thrown out one by one. I.e. cache three large resources, and then have the browser toss out just one of them if we run low on disk space.
    Right now both the cache API and IndexedDB will delete all data for an origin if we run low on disk space. This is a good default policy, but not good as the only policy.
  • We need the ability for websites to indicate that not all data is created equal. Other platforms have /tmp or /cache folders which the platform reserves the right to nuke data from if it's running low on disk space.
    It has been suggested to use some form of callback to support this. Which would allow the website to delete less important data according to whatever complex algorithm it wants.
    However for the same reasons that memory-pressure events doesn't really work, and other platforms have moved away from them, I don't think that we should do that here. Instead we should follow other platforms and allow websites to declare what data can be nuked without too much concern so that the OS can do so quickly on demand without running website logic.

@slightlyoff
Copy link
Owner

One at a time:

  • I'm ok with a prompt! I think we can figure out better ways to reduce that friction in other contexts (e.g., installed apps), but if a prompt gets it done in the uninstalled case, great.
  • It's interesting that the discardable storage thing has come up here. We've talked in the past about a storage pressure event that SWs could use to do their own cleanup (based on an app-specific decision about priority). Would that not work?
  • Does a storage pressure event solve the /tmp/ issue? I'm not so worried about running site logic, but I'd like to understand that concern more deeply.

@sicking
Copy link

sicking commented Feb 25, 2015

I don't think a storage-pressure event is realistic. It's been tried in other platforms and has failed there.

The problem is that at the time when website A wants access to storage, you simply don't want to have to fire up the SW of websites B, C and D to fire events at them and hope that they free up enough storage. That gives a poor user experience as the user is actively using A and so you don't want to leave A hanging for that long.

Other platforms have tried this and given up. That's why we have things like /tmp and and /cache directories. And why android has introduce ashmem and why posix has mmap. Memory pressure notifications just doesn't work well enough.

@triblondon
Copy link

Is there an intention to explore practical use cases and potential associated UX in detail? It would be reassuring to see how its envisaged that this feature would fit into a publishing app's user journey, but equally for ecommerce, media streaming, social networking, all the big use cases.

In the FT's case, for example, we already have pretty horrific amounts of "on boarding" code to walk the user through the save to home screen and storage permission prompt dance. If this technology involves another permission prompt it would be nice to start rationalising.

@sicking
Copy link

sicking commented Mar 2, 2015

Definitely would love to see some use cases being discussed. Not just for the UX pieces, but also for the rest of the API.

Regarding the UX, I think there would be some API for getting access to "persistent storage". This API might bring up a prompt. The Permissions.get("persistent-storage") API could be used to test if a prompt would indeed be shown. Other storage APIs would never (at least by default) show UX. So the page would be entirely in control of if and when UX is displayed.

@sicking
Copy link

sicking commented Mar 2, 2015

Some additional thoughts on tricky questions that needs to be answered:

What happens with data that was written before the page asked for
Related to that API would be the question of if getting granted persistent storage should make all stored data suddenly be considered persistent. I.e. would any indexedDB or CacheAPI data now be persistent? We did a bunch of experimentation with this and in our experience it should.

Should be able to grant access to X GB of storage to a site. I.e. is "having access to persistent" storage more than just a boolean?

Both the Chrome and the Firefox UIs for persistent storage (Chrome only uses it for its filesystem API) does grant the page X GB of storage rather than unlimited storage.

If that's a model that we want to keep, then that brings in the tricky situation of what to do if a page tries to store some persistent data and it reaches the storage limit. Does that make the write operation fail, or does it bring up a prompt to get access to more data?

Making the write operation fail makes it significantly harder to write data since you have to make every write operation fail gracefully in the face of having reached storage limit. Especially if what you want to do is to ask the user for more storage space and then retry. I would expect most developers to fail at doing this properly.

At the same time it would be bad if any write operation can result in UX plopping up. Since that would make it hard for the page to control when UX is showing.

One option here would be to allow the page to choose a policy. Either by firing an event on the page when a write operation is about to fail due to quota limits, or by allowing the page to set a policy.

Probably the most reasonable solution here is to simply fail write operations when a quota limit is reached. And then make it the responsibility of the page to use the quota API to see when it's getting close to that limit and ask for more storage.

@annevk
Copy link

annevk commented Mar 6, 2015

So it sounds like there should be some API to persist storage stored in IDB, localStorage?, Cache API, and potentially the filesystem API if we ever agree on one. Any other API?

It also sounds like we should have some way to have explicit temporary storage, similar to /tmp that can be cleared when necessary. For this we could provide a different way to get hold of a Cache or IDB, perhaps?

The quota story seems the least compelling. I think once persistence is granted also granting unlimited space (up to some reasonable limit based on disk) is best as the user is not well informed on these matters. Obviously browsers should provide origin/eTLD+1-organized storage management.

It seems like rolling out a persistence grant first and temporary storage as a follow up could be a reasonable way to get somewhere quickly. sessionStorage could be suggested as a hackish way of getting temporary storage while v1 is being rolled out.

@sicking
Copy link

sicking commented Mar 6, 2015

I think the simplest way to start would be the spec at https://dvcs.w3.org/hg/quota/raw-file/tip/Overview.html. But merge it with @jakearchibald proposal at the beginning of this thread by changing requestPersistentQuota(size) to requestStorageDurability() (or whatever we want to all it).

Either way we'd also need something which indicates if storage is by default persistent or not.

@annevk
Copy link

annevk commented Mar 9, 2015

Right, so v1 would be introducing a storage mode (persistent/temporary) per eTLD+1/origin that affects a set of APIs per my list above. You need to able to query this storage mode and change it to persistent somehow.

Then v2 could introduce explicit temporary storage APIs. You would use these for storing resources that are volatile, such as media in a social media app.

@sicking
Copy link

sicking commented Mar 9, 2015

Sounds good. Though I think v1 should also allow querying amount of data used since that's something that developers often ask for.

@annevk
Copy link

annevk commented Mar 10, 2015

Why do you need to know how much data you've used?

@wanderview
Copy link

Why do you need to know how much data you've used?

Imagine you are writing an app that must manage remote resources that will often exceed the local storage of a mobile device:

  • photo gallery
  • music player
  • dropbox-like remote file manager

You obviously want as much content to work offline as possible, but you must limit it somehow. Implementing this kind of limit is easier if you have an API to get usage info.

Otherwise every app has to estimate disk usage manually (which of course takes even more space in code, IDB data, etc).

@sicking
Copy link

sicking commented Mar 10, 2015

Exactly. And while you are using temporary storage (i.e. before you've requested durable storage) you don't wan to waste bandwidth downloading resources that you won't be able to save.

@annevk
Copy link

annevk commented Mar 12, 2015

Say I have a 20 GiB available and I go to both music.com and photos.com and I need them to do everything offline. How would this quota thing pan out?

@sicking
Copy link

sicking commented Mar 12, 2015

The math Gecko uses is here:

http://hg.mozilla.org/mozilla-central/file/0190a1d17294/dom/quota/QuotaManager.cpp#l1245
and
http://hg.mozilla.org/mozilla-central/file/0190a1d17294/dom/quota/QuotaManager.cpp#l2401

For the example that you cite, the quota limit for temporary storage would be min(20GB * 50% * 20%, 2GB) = 2GB

I think this was heavily inspired by the Chrome limits, so I think those are similar.

@sicking
Copy link

sicking commented Mar 12, 2015

That's actually somewhat simplified. The full equation in Gecko is:

max(min(availableStorage * 50% * 20%, 2GB), min(availableStorage, 10MB))

In most practical situations, that means that the number comes out to 10% of availableStorage, capped at 2GB max.

availableStorage is the amount of free disk space plus amount of disk space currently used by temporary storage.

@annevk
Copy link

annevk commented Mar 13, 2015

That seems annoying. If I grant persistent elastic storage I'd like to actually let that application make use of the space so I can have 8 GiB of music for offline consumption (without fear of it being deleted). If we want to compete with native systems things like that need to be possible, and I don't really see how allocating a quota helps if the availability should be elastic.

@sicking
Copy link

sicking commented Mar 13, 2015

Note that I was only talking about temporary storage. Persistent storage is currently always unlimited in Gecko I think. But we only use it for installed websites, so it's slightly different from what's discussed here.

@kinu
Copy link

kinu commented Mar 16, 2015

/sub (Will read through tomorrow)
/cc @inexorabletash

@davidsgrogan
Copy link

@sicking What platform tried a storage pressure event? I'd like to find out more; links or search terms appreciated! We (some chrome folks) have been talking about something similar but a little different than what you're describing. We envisioned the event firing when the UA detects the system is nearing running out of space, not when some other site is already failing writes.

@annevk Just FYI, chrome would probably have the persistent/durable bit control appcache and websql in addition to the APIs you listed above and on the wiki.

@sicking
Copy link

sicking commented Mar 17, 2015

The only pressure events that I know about are for memory management, not disk management. But Android has exactly the thing that you're talking about, i.e. an event which is supposed to fire /before/ the device runs out of memory.

The problem that Android runs into is that memory growth can happen very fast. And swapping in a different process and firing an event at it is very slow. So often times the system ends up running out of memory before apps have a chance to handle the pressure notification.

Hence Android is moving towards declarative solutions which allows the OS to free up memory areas without running application logic.

Obviously memory handling is different from disk handling. But it seems to me that it'll play out very similarly, but slower. While writing to disk takes longer than allocating memory does, firing disk-pressure notifications also takes significantly longer than memory-pressure notifications.

Firing a disk-pressure notification will involve starting multiple processes, doing a bunch of IO to load the various scripts involved, doing more IO to let the apps actually delete the data that's not critical, doing yet more IO to "vacuum" the levelDB/sqlite database that are backing the storage APIs, etc. On top of that a lot of time these steps will require more data to be written to disk before the "vacuum" can actually reduce the needed disk usage.

I looked for some docs on how Android does memory pressure notifications but couldn't immediately find them. But some googling should turn it up. Also search for ashmem and mmap for declarative alternatives.

@kinu
Copy link

kinu commented Mar 18, 2015

@sicking
From what you described I can imagine how pressure notification could fail to work, but I think there's one more possible difference between memory and disk situation: we can possibly reserve some extra space for the time being for disk situation. For example in the current code base both Gecko and Chrome (basically) try to limit the total temporary space up to 50% of available space, so we can allow one app to write a lot more for the time being, fire pressure event then, and even evict the oldest one a lot later. We don't need to assume the current implementation (which was designed in its early days), but I think we can employ a similar scenario in a different implementation.

(By the way I won't oppose to declarative or priority based approach if we can come up with a comfortable set of APIs that work across different storage APIs. I remember we once discussed if we could introduce 'namespace' or something that groups a set of storage objects as a unit of eviction that is finer than origin, though we never polished the idea into a spec.)

@annevk
Copy link

annevk commented Mar 18, 2015

If a user grants persistence to an app and resources that app stores for the user, the user would likely not be very happy if the app removed some of those resources upon receiving such an event.

We have some ideas for making the non-persistent scenarios better along the lines of introducing multiple storage areas per site (eTLD+1/origin) and providing actual cache-like storage (where individual entries can be removed when not accessed in a while). Both could also be useful in the persistent case as it could allow for scenarios where the user only deletes books, but not the reader. Or the user deletes levels, but not the game. However, I hope that we can make progress on persistent storage first as that involves touching a lot less API-wise.

@kinu
Copy link

kinu commented Mar 18, 2015

In case it was not clear, I totally agree that the resources for an app that is granted persistence should not be removed regardless of receiving such events or not. I also agree that it'd be great if we can make progress on persistent storage first as it looks we are having same/similar requests from developers.

@kinu
Copy link

kinu commented Mar 18, 2015

Oops, you mean even the app should not delete the data... please ignore the first sentence of my previous comment as it's out of context.

@shacharz
Copy link

Hi all,
I thought direct feedback from developer use-case that uses IndexedDB, Filesystem API extensively both with temporary and persistent storage in Firefox and Chrome might be valuable.

  • Storage type: we would like both persistent and temporary to exist - where by temporary we mean no user prompt is needed, and persistent user prompt is needed. We'd like the temporary to be as high as possible but understandably volatile, so the user won't be harmed by apps taking his disk space.
  • An API to understand how much data is being used, and left (similar to quota api, although when persistent storage was unlimited the quota didn't give the amount of free disk space left).
  • Move objects from temporary to persistent (without re-writing them)
  • When storage has reached limit (even though we aim not to, using the quota api) notify with an error, and we'd like to control what to prompt to the user if at all.
  • When the browser clears temporary data, do it one object at a time based on LRU or something like that, and just as much as it needs and not the whole storage.

@davidsgrogan
Copy link

Storage type: we would like both persistent and temporary to exist - where by temporary we mean no user prompt is needed, and persistent user prompt is needed.

I assume that you would also be ok with the browser granting persistence to your site without prompting the user? Or is there some reason I'm missing why you, as the site developer, want the browser to prompt the user before granting persistence?

When storage has reached limit (even though we aim not to, using the quota api) notify with an error, and we'd like to control what to prompt to the user if at all.

I think this is how IndexedDB works today, at least on chrome. Do other (browser, storage API) combos prompt the user? I think firefox used to prompt the user if a site wrote more than 50mb into IndexedDB. Is that what you're referring to? Does it still do that?

@shacharz
Copy link

I assume that you would also be ok with the browser granting persistence to your site without prompting the user?

That's correct, we aim for as little prompt possible. But sometimes the user does actions in the app that simply require more offline storage, and that's when we move to persistent - knowing a prompt will show up. Although I would like to know if a browser can grant persistence without prompting, so I can take advantage of that. - For example someone who already approved persistent storage in the past, I'd like to know that via the API.

I think this is how IndexedDB works today, at least on chrome.

In FS API on Chrome using temporary storage gives an exceeded limit error. In persistent storage pre-fix when storage was unlimited, reach the free disk-space limit would just return unspecified write errors. Not sure what happens now, need to check the data.
IDB on FF (previous versions) you'd need to write 50MB in order to get the prompt to the user, and they would return a concise error if free disk-space ran out.
Latest version of FF introduced default/persistent storage, pretty much changed everything which I think goes something like this: using previous syntax/api will give no prompt and allow to store

max(min(availableStorage * 50% * 20%, 2GB), min(availableStorage, 10MB))
Using a different syntax will prompt user and create persistent storage with unlimited quota.
AFAIK there's no quota-like api in FF yet. And they're waiting for this spec to implement it.

Is that what you're referring to?

I referred both to the FF prompt and the unspecified write error that used to be in Chrome.

@sicking
Copy link

sicking commented Mar 19, 2015

It's a good point that we could allow exceeding the global quota and then fire a pressure event to try to "clean things up" to get down to the desired amount of storage.

However both application defined priorities, as well as a pressure event, would have the problem that the application can only reason about local priorities. I.e. an application can only compare some set of its own data with some other set of its own data and decide which of those two are less important to save.

So if I use gmail every single day, but there's a game website which I only visited once 3 months ago and have no plans to visit again, gmail can only reason about what of its data it should get rid of. Gmail can't realize that the right solution in this case is to delete all of the data from the game website, and none from gmail.

One solution that we talked about is to allow a website to create named storage areas of temporary storage. The contract would be that the browser is allowed to delete a named storage area without needing to delete other data from that origin.

This way the browser could do a global ordering of all named storage areas from all origins, and then delete the least recently used ones.

This way the browser could know that all of the storage areas from the game are older than any of the storage areas from gmail, and delete all of the game data.

A game could use this to store individual game levels in separate named areas. Or gmail could store groups of emails in individual storage areas. This way a level which hasn't been played in a game, or label which hasn't been viewed, will get deleted before data from another app which was more recently used.

We also debated expanding IndexedDB or the Cache API such that the website opts in to allowing the browser to delete individual key/value pairs from a given database or a given cache. In order to avoid having to create lots and lots of separate named storage areas. I.e. to avoid having google docs to create a new named area for each document, or spotify to avoid having to create a new named area for each song it caches locally.

@kinu
Copy link

kinu commented Apr 2, 2015

(Reviving this thread) That's true that each app can only reason about local priorities upon eviction events, but having eviction events doesn't necessarily mean we can't apply our good, old LRU-like heuristics for global eviction order. In the gmail vs game case UA could just evict all game data if its data is not touched for long time, and then it can start asking apps like gmail that are used more recently to delete some of their data. So the steps could look like: 1. purge some origins that are not durable and have very low usage in LRU order, and then 2. fire beforeevict/clearcache events if the UA still needs more space and all remaining apps look somewhat fresh.

I can see that there's some benefit in named storage areas, but assuming that one origin typically represents one logical app, it looks that the app that had auto-purge will still need to identify what data is deleted to reconstruct its metadata (like index) to keep internal data integrity, and it could be tough for apps if we had no notification events upon data deletion? Also designing a named area API that works for all possible storage APIs could be tricky and time consuming. Eviction events seems to have some advantages here-- its API surface is relatively small, it can just let the app do what needs to be done on data deletion, or the app could even do something smarter than deletion like data compression.

@KenjiBaheux
Copy link

+1 to @kinu.

I believe that a good solution should:

  1. only ask the user when neither the UA nor the Apps can come up to a solution that resolves the storage pressure.
  2. avoid impacting the user experience (e.g. avoid having any scrambling for storage eviction when the user is installing/accessing an app, must not delete potential critical data without user's consent)

A. Anything temporary is fair game (no need to ask for the user's consent) so:

  1. The UA should first try to rationalize temporary storage usage. Order should be in increasing user value (less valuable first).
  2. The UA should give each app a chance (onbeforeeviction with an event indicating the type of storage being looked at) to rationalize its temporary storage use
  3. The UA might judge that the current app didn't made a reasonable contribution and delete all its temporary data.
  4. The UA would then move on to the next App.

B. For durable storage, if needed,

  1. The UA should ask Apps to rationalize their use of durable storage. Order should be in increasing user value (less valuable first).
  2. The UA should give each app a chance (onbeforeeviction with an event indicating the type of storage being looked at) to rationalize its durable storage use
  3. The UA might judge that Apps so far didn't made a reasonable contribution and prompt the user for input. Roughly it could be something like:
    • info about the amount of storage to free up
    • a list of Apps ordered from lowest bang for the byte (the idea is to have the badly behaved/low user-value apps at the top)
    • show info about amount of storage used by the app (total, %)
    • relevant actions (delete all data, launch, "remove"?... )
  4. if needed, the UA should move to the next App and repeat from step 1.

Step B.3 is to avoid pushing regularly used apps to pay for the incompetency of lesser used apps and encourage developers to do the right thing.

One thing I'm wondering about is the need for firing the after the fact "oneviction" event because I'm worried bad actors would use this to grab back their storage. Is there need for an after the fact event and can we avoid the backfiring issue?

@annevk
Copy link

annevk commented Apr 7, 2015

I revised https://wiki.whatwg.org/wiki/Storage significantly to be a bit more clear about multiple storage boxes per origin and how that could work.

The main goals I have are to define the underpinnings of all storage APIs, enable persistent storage in a way that is competitive with native, and provide more storage options for sites so that clearing is no longer an all-or-nothing for them.

I'm still not convinced that if we reach the point where we need to start asking the user, firing a dozen beforeeviction events is going to do much good.

@sicking
Copy link

sicking commented Apr 7, 2015

The big thing that have changed in my thinking since we last worked on the quota API is that I think that we should make it so that when a website requests persistent storage, and the user grants that, that it should change the storage policy for all data currently stored by that website.

So I don't think that the page should have to first request access to persistent storage, and then use some new API or new API syntax to write to that storage.

Instead, getting access to persistent storage should make all data written using the default syntax in IDB/localStorage/WebSQL/CacheAPI/Appcache suddenly become persistent.

This seems like the simplest model for pages.

I do think that we can additionally we can add new storage API syntax to allow pages to explicitly declare that certain data should be stored in a temporary storage area. Which it can do before or after having gotten access to persistent storage.

@sicking
Copy link

sicking commented Apr 7, 2015

Crap, sorry, that was intended for a different issue. But it's still relevant for this issue.

@shacharz
Copy link

@davidsgrogan Something like this to know whether a prompt is going to be made could be great:
https://w3c.github.io/permissions/

@davidsgrogan
Copy link

We plan on making the Permissions API support durable/persistent storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants