Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide cache.putAll() method #867

Closed
AshleyScirra opened this issue Apr 6, 2016 · 14 comments
Closed

Provide cache.putAll() method #867

AshleyScirra opened this issue Apr 6, 2016 · 14 comments

Comments

@AshleyScirra
Copy link

If you want to write a bunch of separately obtained request-responses to a cache (e.g. you make requests before you've decided which cache to put them in), there is cache.put(), but not cache.putAll(). This makes it hard to make the write atomic. Why not add something like cache.putAll(requests[], responses[])?

@jakearchibald
Copy link
Contributor

Duplicate of #823?

@jakearchibald
Copy link
Contributor

Out of curiosity, why was addAll insufficient here?

@AshleyScirra
Copy link
Author

Because you have to have a cache at the same time as your list of requests. If you don't have the cache until your list of requests completes, then you need to put() each of them.

@jakearchibald
Copy link
Contributor

Why wouldn't you have a cache?

@asutherland
Copy link

Verging off-topic (with apologies to Jake), but in the context of discussions about transactions, I'm very interested in the motivating use-case. Specifically, do the atomic transactions:

  • simplify reasoning about what must be in the cache given the presence of other requests/responses? (ex: It's known that "a", "b", and "c" are all put in as part of the same transaction, so "a" implies "b" and "c".)
  • simplify the logic related to putting things into the cache? (ex: by reducing promise book-keeping.)
  • other?

@AshleyScirra
Copy link
Author

AshleyScirra commented Apr 6, 2016

My intent is to make it easier to make things atomic. You could do this:

  1. open a cache
  2. call cache.addAll() which makes a bunch of requests and waits for them to complete

but then my thinking was if the requests fail or the browser session is terminated, you are left with an empty cache. If your versioning system works by caches.has(), it might think it's now up-to-date but the cache is empty. Ideally we want to avoid this.

So I tried doing this:

  1. make a bunch of requests and wait for them all to complete
  2. open a cache
  3. write all responses

This means if step 1 fails or is cancelled, there is no empty cache left behind. Right now step 3 has to be a lot of cache.put() in parallel, which isn't atomic either, which is the intent behind suggesting cache.putAll(). Even then, there is a chance between steps 2 and 3 that something goes wrong and leaves behind an empty cache. At least network latency is not a part of this gap, just write speed.

I am wondering if it would help to have a single atomic "open cache and write this data" operation, but then I guess we're on to transactions?

@wanderview
Copy link
Member

A caches.move(old, new) could help with this use case. Basically write to a temporary cache name and then when everything is complete move (rename) it to the final cache name.

I'm starting to get the sense that the transactions might be worth exposing, though.

@asutherland
Copy link

Transaction-wise, it seems like most of the use cases can be characterized as wanting to synchronize an entire bundle of versioned resources in an all-or-nothing fashion. Partial progress isn't a problem other than if it's accidentally perceived as completed. In fact, it seems beneficial for partial progress to be stored so that forward progress is always made, especially if partial progress keeps happening for reasons related to resource exhaustion. caches.move (or I've seen rename used by Jake in other issues) seems like it establishes a perfect idiom for this without requiring transactions and the potentially user-hostile issues that could crop up.

Imagine a hypothetical game "foogame" where levels are characterized by a versioned JSON manifest consisting of paths, file lengths, and hashes of the files. The SW receives a request for "level2", and the pseudo-code goes like this:

  1. The front-end asks for level2/manifest.json first since manifest.json is the magic filename we use to do the following magic things[1].
  2. We invoke caches.open("level2-pending") and caches.open("level2"). There is no "level2-pending"; if there were and we didn't have some weird "level2-in-use" mutex somehow[1 again], we would have done caches.move("level2-pending", "level2") and then used what was level2-pending. But since it didn't exist, we just use level2 since it's coherently there.
  3. A version-check process is spun off (using waitUntil) which pulls the manifest.json out of the cache while also fetching the current level2/manifest.json from the network.
  4. caches.open("level2-next") is opened, it exists, so we know we started to sync a newer version of the level in some previous turn of the event loop.
  5. We pull the manifest.json out of "level2-next" stashed in it as the first step of the sync process last time. (Actually, what we did was new a Cache(), diffed the old manifest.json and the new manifest.json, finding any files that existed in both with the same hash, then putAll()[2] the new manifest and all of those already existing files from level2. This allows us to assume that if there's anything in level2-next that we've already stolen what we can from level2.)
  6. We find that the manifest.json in level2-next is the same as the one we just pulled down from the network, so we know that any progress in level2-next is reusable.
  7. We run keys() against the contents of level2-next and remove anything already in there from our to-do list.
  8. We issue separate add() calls for everything still on the to-do list, accumulating them into a list of Promises that we Promise.all(). This avoids all-or-nothing throwing away of progress.
  9. If that promise resolves, we're done, we can move(level2-next, level2-pending) or to level2 if there's no weird mutex thing. Maybe if we're thorough we first verify that all the entries in the cache have the proper lengths because of Jake's scary research about truncation.

1: re: magical mutex thing or coordination amongst multiple separate fetch events. I'm still coming up to speed about Service Workers and idioms, so I'm not clear if there's a solution in place already for this case already. That is, I'm aware that this could theoretically be handled by rev'ing the service-worker and using the installing/installed/activating/etc. state machine, but that assumes everything is sliced up into nice bite-sized pieces. I'm doubting most developers would be on board with this.

The thing about the proposed transaction model where transactions can stay alive arbitrarily is that it seems like a backdoor mechanism to introduce mutexes/blackboards for this coordination process at the expense of massive complexity for the Cache API. It seems far better to be able to create an explicit API to support this instead. For example, a particularly excessive one with weak-ish references to clients so that entries can automatically disappear would be:

clientKeyedMaps['levels'].set(event.clientId, 2);
const levelUsed = (level) => clientKeyedMaps['levels'].values().some(x => x === level);

2: Transaction-wise, I do think it makes sense to enable more control of creating a single list of CacheBatchOperation dictionaries for a single invocation of Batch Cache Operations. Exposing a wrapper so that multiple deletes and puts could be placed in there doesn't increase complexity. This avoids developers needing to reason about transactions stacking up and/or the nightmarish potential interactions of fetching/incumbent records.

@jakearchibald
Copy link
Contributor

@AshleyScirra

  1. make a bunch of requests and wait for them all to complete
  2. open a cache
  3. write all responses

This really sounds like addAll.

caches.open('static-v2').then(c => c.addAll(urls)).catch(err => {
  // if you don't want to leave the empty cache there:
  caches.delete('static-v2')
  throw err;
});

@jakearchibald
Copy link
Contributor

I'm not against caches.move(currentName, newName), but many of these use-cases work better by versioning caches.

@NekR
Copy link

NekR commented Apr 8, 2016

@jakearchibald what happens if browser is terminated (e.g. crashed) while addAll() is working? Does browser keep 'empty' caches (no entries) or auto deletes them? (just asking)

@jakearchibald
Copy link
Contributor

Keeps. This is a good case for transactions.

@NekR
Copy link

NekR commented Apr 18, 2016

Here is possible polyfill to this problem:

function putAll(cache, entries) {
  const operation = entries.map((entry) => {
    return cache.put(entry[0], entry[1]);
  });

  return Promise.all(operation).then(() => {
    return cache.put('/__data_is_safe__', new Response(''));
  });
}

function hasCache(storage, cacheName) {
  return storage.open(cacheName, (cache) => {
    return cache.match('/__data_is_safe__')
  }).then((response) => {
    return !!response;
  });
}

@jakearchibald
Copy link
Contributor

This is covered by the transactions proposal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants