Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching of external api calls from within SvelteKit #3642

Closed
dreitzner opened this issue Jan 31, 2022 · 16 comments
Closed

Caching of external api calls from within SvelteKit #3642

dreitzner opened this issue Jan 31, 2022 · 16 comments

Comments

@dreitzner
Copy link
Contributor

Describe the problem

The following situation:

On my new personal website, I wanted to call the dev.to api to show the links in the blog section.

I made it work by adding an json endpoint and including it in the SSR.

The problem I see is that with every request to /blog the dev.to api will be called again (with a lot of traffic the might block me).

So if I could cache the response for about 1 hour, I could drastically reduce the api calls even when a lot of traffic hits.

As far as I understand it cloudflare has some functionallity like that.

Describe the proposed solution

I would be nice to have something similar to cloudflare, where we use cache controll on the request/response.

Alternatives considered

No response

Importance

nice to have

Additional Information

No response

@RickTheHat
Copy link

I know that other SSG projects have this and it's wonderful, the one I'm specifically speaking of is 11ty / eleventy - Quick Tip #009—Cache Data Requests

@Rich-Harris
Copy link
Member

You have a variety of options, depending on exactly what behaviour you want. You can return a cache-control header from your endpoint...

export async function get() {
  const res = await fetch(`https://dev.to/api/articles?username=dreitzner`);

  return {
    headers: {
      'cache-control': 'public, max-age=3600'
    },
    body: {
      articles: await res.json()
    }
  }
}

...or, if you wanted to respect the original cache headers (probably not, since it looks like the dev.to API uses public, no-cache), and you don't need to munge the response in any way, you can just proxy it...

export function get() {
  return fetch(`https://dev.to/api/articles?username=dreitzner`);
}

...or, since the API doesn't need a private key, you could fetch the data in load and use maxage, which will hit the API during SSR but then cache the rendered page, and hit the API directly for client-side navigation rather than going via your endpoint (which will result in more API hits if people navigate between blog posts, but from many IP addresses):

<script context="module">
  export async function load({ fetch }) {
    const res = await fetch(`https://dev.to/api/articles?username=dreitzner`);

    return {
      maxage: 3600,
      props: {
        articles: await res.json()
      }
    };
  }
</script>

Personally I'd probably opt for the first one, since it means you have the opportunity to slim down the response to just the bits you need. Though I'd typically pick a much lower maxage (a few minutes at most, rather than an hour), so that you can handle getting slashdotted (oops, might have just aged myself) while avoiding stale content.

@dreitzner
Copy link
Contributor Author

dreitzner commented Feb 8, 2022

@Rich-Harris thank you for your detailed answer.

  1. it can't be in the load function => there is an api key, that dev.to wants as a header and I want to respect that ;)
  2. as my blog-writing is very limited (full time job and 4 kids) I would go so far as to cache the response 1 day, as it would only augment other data
  3. if I'm not mistaken cache headers will only be honored by the browser and not inside svelte-kit/serverless function, that would result in calling the api once for each user

The approach I would like to see, would be on the platform side with a standard interface/approach inside sveltkit that will be transformed by the adapter.

If I'm not mistaken cloudflare approuches this by adding a parameter (cf) to their fetch options:

const devTo = await fetch(
	'https://dev.to/api/articles?username=dreitzner',
	{
		headers: {
			'api-key': `${import.meta.env.VITE_DEVTO_API_KEY}`,
		},
		mode: 'cors', // no-cors, *cors, same-origin
		credentials: 'same-origin', // include, *same-origin, omit
		cf: {
			// Always cache this fetch regardless of content type
			// for a max of 60 seconds before revalidating the resource
			cacheTtl: 60,
			cacheEverything: true,
		},
	}
)

This should be cached inside the worker context and the api should not be called until the cache is invalid.

Another good example usecase (from my daily business) would be product data, that updates only once a day.
To call the api again and again would add unnecessary overhead and would slow down the response from the serverless function.

I hope I articulated what I would like to achieve well enough :)

Thanks again

@Tam2
Copy link

Tam2 commented Feb 11, 2022

We had a similar requirement where we call data from our API that doesn't change much and didn't want to load the data from the API each time, we ended up using the https://github.com/jaredwray/keyv library within our endpoint with something like this:

import Keyv from 'keyv';
const TTL_MINS = 60;
const TTL_MS = TTL_MINS * 60 * 1000;
const endpointCache = new Keyv({ namespace: 'xx', ttl: TTL_MS });

export async function get({ params }) {
  // Check cache first...
  const cache = await endpointCache.get(params.endpoint);

  if (cache) {
    console.log(`Returning from cache..`);
    return {
      body: {
        ...cache,
      },
    };
  }

  const data = await api.get(`taxonomy/${params.endpoint}`);
  await endpointCache.set(params.endpoint, data);
  return {
    body: {
      ...data,
    },
  };
}

@Bandit
Copy link

Bandit commented Feb 15, 2022

I have a similar issue where I want the server to cache the response, but my server is, well, serverless. So can't use something like keyv as above (at least to my knowledge).

I like the idea of something writing out a static file and then all client-side pages access this file, and periodically this file gets rebuilt as data changes/goes stale. Not sure how to do that, and it's probably out of scope for Sveltekit to handle I guess

@Theo-Steiner
Copy link
Contributor

I just learned about the "Incremental Static Regeneration" ergonomics in Next.js and they truly are on the next level.
You just return revalidate: 30 from your getStaticProps and if a request hits the cache after 30 seconds, the route will be regenerated in the background and only if that succeeds the newly generated route is shown... If you want to do this on demand, you can omit revalidate in get static props and use endpoints as webhooks to revalidate programmatically after you change something inside your cms... 🤯
This honestly blows my mind and I wonder if something like this would be possible to implement for svelte kit as well?

@Rich-Harris
Copy link
Member

Rich-Harris commented Apr 25, 2022

3. if I'm not mistaken cache headers will only be honored by the browser and not inside svelte-kit/serverless function, that would result in calling the api once for each user

It depends. Cache-Control: max-age=3600 will be respected by browsers, but Cache-Control: public, max-age=3600 will be respected by CDNs as well — in other words, if you have a CDN in front of your app, then only the first request in a ten minute window should hit the origin server. (Subsequent requests will have an additional Age header, so that a response served 7 minutes into the window will only be cached by the browser for the remaining 3 minutes.)

I like the idea of something writing out a static file and then all client-side pages access this file, and periodically this file gets rebuilt as data changes/goes stale.

I just learned about the "Incremental Static Regeneration" ergonomics in Next.js and they truly are on the next level.

Yes, we definitely want to adopt ISR (particularly on-demand ISR, where revalidation is triggered by a webhook notifying of a change, rather than requests). It's something we hope to implement after 1.0: #661

@Rich-Harris
Copy link
Member

Going to close this as I don't think there's much we can or should do, short of #661, beyond the existing ability to set cache-control headers. Though I will note that it ought to be possible to do fetch(url, { cf }) in endpoints in the Cloudflare Workers context, since we're just using the platform-provided fetch implementation.

@kalepail
Copy link

kalepail commented May 21, 2022

I'm dynamically generating images as well as json data and I've tried setting Cache-Control: public, max-age={maxAge} but I'm noticing in all responses an absence of the CF-Cache-Status header. I'm serving from a custom domain so I would expect to be hitting the Cloudflare CDN cache. I've used workers external to svelte or the site functionality so I know how it's supposed to work so I'm a little lost on what's going on and why resources returned with a Cache-Control: public, max-age={maxAge} header aren't being caught by the cf origin cache.

Here's an example image resource served on a custom domain with the Cache-Control header set.
https://smol.xyz/glyphs/585a3c0c05f9ea1dc037b0726f151e6bc375a1714097444a560147cfa8683624/lrg.png

I would expect to see a CF-Cache-Status header in the response and for it to be HIT after the initial request.

https://developers.cloudflare.com/cache

You'll notice that https://smol.xyz/favicon.png does behave this way as it's served as a static uploaded image. So why does that work while the dynamic images don't?

@kalepail
Copy link

Answering my own question:

You have to implement the CF Worker https://developers.cloudflare.com/workers/runtime-apis/cache/ API per route and then ensure you pass the headers from any retrieved cache response (the piece I was missing) which includes the CF-Cache-Status header.

Would be nice (and maybe possible I just don't know where) to intercept all incoming Worker requests to inspect it for a cache key. I tried the hooks handler method but it doesn't have access to the caches variable, which I think makes sense if I understand when and where the handler method fires

@kalepail
Copy link

kalepail commented May 21, 2022

For anyone curious stumbling upon this in the future. First "Hi!" 👋
Next here's some helper methods I'm using to make interacting with the cache across lots of endpoints more sane.

export async function getCache(request, bodyType) {
  const cache = caches.default
  const cacheUrl = new URL(request.url)
  const cacheKey = new Request(cacheUrl.toString(), request)

  const response = await cache.match(cacheKey)

  if (response) return {
    cacheKey,
    cached: {
      body: await response[bodyType]().then(Buffer.from),
      headers: response.headers
    }
  }

  return { cacheKey }
}

export function setCache({
  context,
  key,
  body,
  headers
}) {
  const cache = caches.default

  context.waitUntil(cache.put(key, new Response(body, { headers })))
}

Then use it something to the tune of

const { cacheKey, cached } = await getCache(request, 'arrayBuffer')

if (cached) {
  body = cached.body
  headers = cached.headers
}

else {
  const { context } = platform
  
  // create your body
  
  headers = {
    'Content-Type': mimeType,
    'Content-Length': body.length,
    'Access-Control-Allow-Origin': '*',
    'Cache-Control': `public, max-age=${maxAge}`
  }

  setCache({
    context,
    key: cacheKey,
    body,
    headers,
  })
}

return {
  status: 200,
  body,
  headers,
}

@kristjanmar
Copy link

You have a variety of options, depending on exactly what behaviour you want. You can return a cache-control header from your endpoint...

export async function get() {
  const res = await fetch(`https://dev.to/api/articles?username=dreitzner`);

  return {
    headers: {
      'cache-control': 'public, max-age=3600'
    },

I have added this cache-control header to my page's index.ts endpoint, but I can't see that it is having any effect. My site is hosted on Vercel and the response headers say:

cache-control: public, max-age=0, must-revalidate
x-vercel-cache: MISS

If I go to /page/__data.json it returns the correct "public, max-age=3600" cache headers. But visiting /page/ directly does not seem to trigger the cache.

Visiting the __data.json returns a response within 50ms so the cache is working for the actual endpoint. But visiting the page itself is several hundred ms, which is consisent with a completely uncached page.

Is there some other recommended way to cache pages/routes that are rendered with data from endpoints with the same name?

As in:

/page/index.ts -- fetches data and returns it as props
/page/index.svelte -- the page

@multiplehats
Copy link

For those that stumble upon this issue.

I was doing Cache-Control instead of cache-control. Wasted a good hour on that.

Cheers!

@blujedis
Copy link

blujedis commented Jan 2, 2023

@multiplehats bro anyone that says they haven't been there is straight lying through their teeth...my fav is when it's something you've done before, wanna beat myself senseless with a frozen tuna lol.

@m0rtalis
Copy link

@multiplehats @blujedis Aren't HTTP Headers case-insensitive, what's the difference?

@multiplehats
Copy link

@multiplehats @blujedis Aren't HTTP Headers case-insensitive, what's the difference?

Thought so too. But that happened to solve it for me at the time. This is a long time ago though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests