Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement CacheStorage? #2071

Closed
jimmywarting opened this issue Apr 14, 2023 · 24 comments · Fixed by #2076
Closed

implement CacheStorage? #2071

jimmywarting opened this issue Apr 14, 2023 · 24 comments · Fixed by #2076
Labels
enhancement New feature or request

Comments

@jimmywarting
Copy link
Contributor

I was wondering if you would be willing to look into implementing the Cache Storage api

it was something i was looking into in the past... but it got abandoned.
it is pretty much tightly coupled with Request/Response classes

and i would like to use it to avoid having to make extra requests if i have the key (Request) cached locally.

it's also in fact tightly coupled with the cache option in Request too.
it would be neat if this could just work with some default storage and Request / Responses where saved locally on the disc somewhere.

fetch(url, {
  cache: "no-cache", // *default, no-cache, reload, force-cache, only-if-cached
})

manual working with caches works to...

    caches.match(request).then((response) => {
      // caches.match() always resolves
      // but in case of success response will have value
      if (response !== undefined) {
        return response;
      } else {
        return fetch(event.request)
          .then((response) => {
            // response may be used only once
            // we need to save clone to put one copy in cache
            // and serve second one
            let responseClone = response.clone();

            caches.open("v1").then((cache) => {
              cache.put(event.request, responseClone);
            });
            return response;
          })
          .catch(() => caches.match("/gallery/myLittleVader.jpg"));
      }
    })

(taken from mdn)

maybe caches.open('node:undici') could store the default request / responses from fetch when using the cache option?

@jimmywarting jimmywarting added the enhancement New feature or request label Apr 14, 2023
@ronag
Copy link
Member

ronag commented Apr 14, 2023

I don't have time to do it myself but I'd be happy to see something like this added.

@KhafraDev
Copy link
Member

I'll look into it.

@KhafraDev
Copy link
Member

I believe we would need sqlite or some alternative to implement this correctly, which makes it a non-starter: "Furthermore, because CacheStorage requires file-system access...". Deno also uses sqlite for this.

@jimmywarting
Copy link
Contributor Author

Oh, didn't know deno had cache storage, that is cool :)
then i think NodeJS should also definitely have it too!

@KhafraDev
Copy link
Member

If node was to implement some form of sqlite database this would be actionable, but we can't really do anything better than an in-memory cache (which is worse than a userland alternative).

@ronag
Copy link
Member

ronag commented Apr 14, 2023

RocksDB with the BlobDB variant would be a perfect fit IMHO

@ronag
Copy link
Member

ronag commented Apr 14, 2023

@mcollina Any thoughts? Would vendoring a db into node core even be an option?

@jimmywarting
Copy link
Contributor Author

I think it would be good if the nodes http-loader could utilize fetch + CacheStorage.
But also for other things such as caching static api responses like some config etc.

@KhafraDev
Copy link
Member

I think it would be useful, but better as another library. Nothing requires fetch internals afaict so it should be possible. Adding a postinstall script and a million build dependencies doesn't seem like a good idea for undici.

@jimmywarting
Copy link
Contributor Author

jimmywarting commented Apr 14, 2023

... but better as another library. Nothing requires fetch internals afaict so it should be possible

What if you want to use: fetch(url, { cache: 'only-if-cached' })?
fetch would have to do something...

@KhafraDev
Copy link
Member

I was talking about the CacheStorage implementation itself, not really integrating it into fetch, which might need fetch internals. There are a number of fetch options that are either ignored or have no effect, so setting cache won't do anything. It is also possible for another library to wrap undici's fetch and implement the cache option themselves, similar as to what's been done with cookie jars and whatnot.

@KhafraDev
Copy link
Member

Here's what I came up with before I got tired of working on it: https://github.com/KhafraDev/node-cachestorage

It implements CacheStorage and Cache. None of Cache is implemented other than matching so you can't really test it out. If node ever got a database of some sort I'd probably copy this implementation into undici so feel free to contribute 😄.

@mcollina
Copy link
Member

To be honest, I don't think a naive implementation needs a database for this. We could just build it on top of a temporary folder and hashed URLs.

@KhafraDev
Copy link
Member

yeah, but then it's a worse version of something that can be done elsewhere. It also adds a lot more complexity to an already complicated (badly written) spec.

@jimmywarting
Copy link
Contributor Author

Having a IndexedDB would be something useful.
I know it's not the absolut loveliest api \w callbacks and what not. but it sure can save a lot of things structuralClone is able to clone. it includes: Circular ref, Object Literals, BigInts, Map, Set, ArrayBuffer, TypedArray, Blob, Files, Error, regex, FileSystem handles and some crypto stuff, etc.

and other databases are not so good at saving javascript objects that are structural cloneable. it goes very much hand in hand with the v8.Serializer to make it a searchable.

I could definitely see me building a CacheStorage on top of IndexedDB.
There are quite a bunch of npm package that expects that it exist so it can store curtain things without really having to care about the environment it's operating in. it's almost like how every android app have a built in nosql database
So it is much easier to build polyfills for this kind of stuff.

@ronag
Copy link
Member

ronag commented Apr 15, 2023

@benjamingr does implementing a db make us more web compliant?

@jimmywarting
Copy link
Contributor Author

jimmywarting commented Apr 15, 2023

Now when i think about it... @mcollina have a good argument about using the file system.
Responses could be very large so being able to pipe the body to/from the disc without allocating much memory would also be useful. some ppl download pretty large stuff.

and doing things such as:

const responseClone = response.clone()
cache.put(request, responseClone)
return response

lets you operate on it much earlier in parallel then if you had to put everything in a database and wait for it to be copied over with a single ArrayBuffer or something

The cache storage pretty much lets you iterate over all Request/Responses and sees them.
so if the Responses could be constructed with a lazy stream or heck even a disc based blob then that would be a plus

So a file based storage would not actually be that bad if it's stream/memory friendlier

@KhafraDev
Copy link
Member

I mentioned it previously but the CacheStorage/Cache spec is not written as concisely/cleanly as the fetch spec. It's actually much worse than the websocket spec, which is impressive. None of the steps are clear and it feels like steps are skipped over/not implemented (look at Cache.addAll, the spec doesn't mention caching the responses but it expects you to). Adding another layer of complexity here makes it even harder to implement.

Referencing Deno: their implementation is massively outdated and doesn't include many methods, either on Cache or CacheStorage. They're missing Cache.prototype.add/Cache.prototype.addAll, CacheStorage.prototype.matchAll, and probably a few others. It's also behind a flag I suspect, because logging globalThis.caches returns undefined.

Not to mention that the only way to open file-backed blobs is through fs.openAsBlob which was added in v19.8.0 and is still experimental.


However if you want to implement it (could be a fun weekend project), I have a branch that implements the outlines of both Cache and CacheStorage, with webidl conversions and everything needed.

@ronag
Copy link
Member

ronag commented Apr 15, 2023

I think using e.g. the chromium implementation is the best spec 🤷‍♂️

@jimmywarting
Copy link
Contributor Author

maybe it dose not necessary have to be just a disc- based blob. openAsBlob would take a long time until everybody could actually start using it... Response / fetch just probably see blobs in the init and convert the body to streams anyway? body = blob.stream() or something and Response also support streams and async iterables too. so it could maybe just be as simple as just doing something like new Response(fs.createReadStream(path))

@KhafraDev
Copy link
Member

However we also need to cache the body (and request headers and method) because each cache match is meant to return a new Response.

I took a look at the chromium implementation and it looks like each cache entry creates 3 temporary files (1 for the body, 1 for headers, and 1 for other request stuff). But since it's in c++ it's kinda hard to translate that to js.

@jimmywarting
Copy link
Contributor Author

the Request body isn't saved, right? that's how i remembered it... only the request header & request url is saved?
would it not just be possible to save just 2 files... one for Response body and then a json with request and response info...

// 1.json
{
  url: "https://example.com/cat.png"
  request: {
    headers,
    url
  },
  response: {
    headers,
    status,
    statusText,
    url: response.url // the final response url after redirects
  },
}
// 1.bin
the hole response body...

@KhafraDev
Copy link
Member

KhafraDev commented Apr 15, 2023

Yep, request body doesn't need to be saved, I was referring to the response body there. I'm not sure why they use 3 files, maybe I just misread because I can't find the comment again.

Their implementation creates a directory for each cache (ie. await caches.open('v1') will create a v1 directory, which is where the files are then saved. But we also don't want to name the files 1.json, 1.bin etc. because it wouldn't be very performant to do a cache lookup.

Some limitations/design details/questions:

  • We might not be able to create directories/files based on the hostname. On windows, for example, this is limited to 255 or 256 characters. It's also possible just to throw an error is the cache name is too long. Maybe we can hash it in a way that it's guaranteed to fit within that length?
  • One url can be cached multiple times depending on the vary header, so we should create new folders for each url (or move to a separate folder when 1 url is cached multiple times?).

@benjamingr
Copy link
Member

@ronag

@benjamingr does implementing a db make us more web compliant?

Not as far as I'm aware but it would make it easier to write code that works both in browsers/node which is the point of web compliance in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants