Skip to content
This repository has been archived by the owner on Jan 4, 2023. It is now read-only.

Add a shorter timeout for fetches in custom metrics #192

Closed
rviscomi opened this issue Sep 15, 2020 · 13 comments · Fixed by #193
Closed

Add a shorter timeout for fetches in custom metrics #192

rviscomi opened this issue Sep 15, 2020 · 13 comments · Fixed by #193

Comments

@rviscomi
Copy link
Member

An asynchronous fetch in a custom metric could take ~30 seconds before timing out. Rather than wait for the promise to reject, race the fetch against a shorter timeout of ~5-10 seconds and resolve the promise at the sooner of the two async events.

This would help ensure that the custom metrics don't interfere as much with the overall crawl rate, as 30 seconds * 7 million URLs * 2 runs per client (desktop, mobile) definitely adds up.

Here are the instances of fetch in the custom metrics: https://github.com/search?q=fetch+repo%3AHTTPArchive%2Flegacy.httparchive.org+path%3Acustom_metrics&type=Code&ref=advsearch&l=&l=

@Tiggerito
Copy link
Contributor

I'm happy to reduce it for the robotstxt one. If it takes 5 seconds to return a simple text file, I'd classify that as an error on its own.

@rviscomi
Copy link
Member Author

Thanks @Tiggerito. Are you able to take all 3 files? Should be the same pattern in each.

@Tiggerito
Copy link
Contributor

Thanks @Tiggerito. Are you able to take all 3 files? Should be the same pattern in each.

I'd have to research how to do it. This looks like a neat solution that could be put in a shared place. I think I saw that we can include js files?

https://www.lowmess.com/blog/fetch-with-timeout/

I could test it with my metric first?

@rviscomi
Copy link
Member Author

Here's a prototype of the JS I had in mind:

fetch = new Promise(resolve => setTimeout(resolve, 30000, 'fetch'));
timeout = new Promise(resolve => setTimeout(resolve, 5000, 'timeout'));
Promise.race([fetch, timeout]).then(value => console.log(value));

Shouldn't need to include external JS to do it.

@Tiggerito
Copy link
Contributor

I tested using this (from the article I referenced) in WebPageTest and it worked well:

const fetchWithTimeout = (uri, options = {}, time = 5000) => {
  const controller = new AbortController()
  const config = { ...options, signal: controller.signal }
  setTimeout(() => {
    controller.abort()
  }, time)
  return fetch(uri, config)
    .then((response) => {
      if (!response.ok) {
        throw new Error(`${response.status}: ${response.statusText}`)
      }
      return response
    })
    .catch((error) => {
      if (error.name === 'AbortError') {
        throw new Error('Response timed out')
      }
      throw new Error(error.message)
    })
}

If I set a small timeout it returns:

{"message":"Response timed out","error":{}}

Which we could easily alter. Do we have a standard thing to return when custom metrics fail?

One advantage of this pattern is that it cancels the request on timeout, so no risk of having forgotten requests continuing to be processed.

It's also easy to plug in. Add the code and change the fetch(url) to a fetchWithTimeout(url), and it works.

@rviscomi
Copy link
Member Author

Well not to play favorites (I'm totally playing favorites 😁) but the Promise approach can also be implemented as a fetchWithTimeout function and is much simpler:

function fetchWithTimeout(url) {
  var network = fetch(url);
  var timeout = new Promise(resolve => setTimeout(resolve, 5000, 'timeout'));
  return Promise.race([network, timeout]).then(r => {
    if (r == 'timeout') return Promise.reject(r);
    return r;
  });
}

@Tiggerito
Copy link
Contributor

Now I understand promises more 😀

I'll raise your simplification:

function fetchWithTimeout(url) {
  var controller = new AbortController();
  setTimeout(() => {controller.abort()}, 5000);
  return fetch(url, {signal: controller.signal});
}

@rviscomi
Copy link
Member Author

Hey @Tiggerito sorry for the delay, your function LGTM. Are you able to apply that to each fetch instance? Hoping to get this in today before the October crawl starts.

@Tiggerito
Copy link
Contributor

Looks like today is an HTTP Archive day. Will get onto it.

@Tiggerito
Copy link
Contributor

Testing the code now.

third-parties.js contains a fetch but is auto generated code built by bin/library-detector.js using what looks like another repository. It looks like the fetch is used in relation to the serviceWorker. Not a trivial one to alter.

Only thing I can think of is to update the builder to include code that intercepts the fetch. Something like:

let originalFetch = fetch;

fetch = function(url, options) {
  var controller = new AbortController();
  setTimeout(() => {controller.abort()}, 5000);
  options.signal = controller.signal;
  return originalFetch(url, options);
}

@rviscomi
Copy link
Member Author

These should be the only custom metrics with fetch: https://github.com/search?q=fetch+repo%3AHTTPArchive%2Flegacy.httparchive.org+path%3Acustom_metrics&type=Code&ref=advsearch&l=&l=

The code that generates the third parties script uses fetch, but it's not part of the custom metric code itself.

@Tiggerito
Copy link
Contributor

Cool. Working on the last one now. sass.

@rviscomi
Copy link
Member Author

Synced the HA server with the changes in #193 so this should take effect in the October crawl starting tomorrow. Thank you again for hopping on this @Tiggerito 🙏

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants