Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support retrying downloads if they transiently fail #1594

Closed
wants to merge 8 commits into from
14 changes: 1 addition & 13 deletions src/libostree/ostree-fetcher-curl.c
Original file line number Diff line number Diff line change
Expand Up @@ -337,19 +337,7 @@ check_multi_info (OstreeFetcher *fetcher)
curl_easy_getinfo (easy, CURLINFO_RESPONSE_CODE, &response);
if (!is_file && !(response >= 200 && response < 300))
{
GIOErrorEnum giocode;

/* TODO - share with soup */
switch (response)
{
case 404:
case 403:
case 410:
giocode = G_IO_ERROR_NOT_FOUND;
break;
default:
giocode = G_IO_ERROR_FAILED;
}
GIOErrorEnum giocode = _ostree_fetcher_http_status_code_to_io_error (response);

if (req->idx + 1 == req->mirrorlist->len)
{
Expand Down
15 changes: 5 additions & 10 deletions src/libostree/ostree-fetcher-soup.c
Original file line number Diff line number Diff line change
Expand Up @@ -1070,19 +1070,13 @@ on_request_sent (GObject *object,
soup_uri_to_string (soup_request_get_uri (pending->request), FALSE);

GIOErrorEnum code;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that feels odd to me is that the way this is structured, we'll be retrying the whole set of mirrors if the last one on the list returns a transient error. So e.g. the first N-1 mirrors could return NOT_FOUND and if the Nth mirror times out, we'll be retrying all N mirrors, right? Whereas I think what we want is to retry only the Nth mirror. It seems like fixing this properly would probably require indeed teaching each backend-specific fetcher about retries.

Anyway, I think it's definitely worth getting the current approach in and iterate on that since AFAIK mirrorlists are not widely used yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d definitely prefer to get the current approach in first. That said, I’m not sure whether I will have time afterwards to work on improvements to mirror handling (I’m working on this to fix a problem with our Endless downloads timing out, and mirrors don’t really impact on that).

However, I’m not entirely convinced about the situation you’re considering. Are we ever expecting the majority of mirrors to return NOT_FOUND? Sounds like an unlikely corner case to me. Even so, OstreeFetcherSoup maintains state about its mirrors, so I think the problem could be addressed by storing a bit more state there, rather than by rearchitecting things. Specifically, instead of storing a single mirrorlist_idx pointing to the best mirror to try, store some flags for each mirror, and track whether each mirror has returned NOT_FOUND, or returned a transient error, at all. Prioritise requesting from mirrors which haven’t. (This could also be implemented by storing a priority for each mirror, and demoting it on NOT_FOUND or transient errors.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I wasn't suggesting you would fix those up. Just something that'd need to be done down the road. As I said, I'm not aware of any big users of mirrorlists yet, so it's not a pressing concern.

Are we ever expecting the majority of mirrors to return NOT_FOUND? Sounds like an unlikely corner case to me.

Heh, sure. The NOT_FOUND was an example; anything "non-transient" would work. Anyway, I do agree with you that it could be done without lowering the retry logic down a level.


switch (msg->status_code)
{
case SOUP_STATUS_NOT_FOUND:
case SOUP_STATUS_FORBIDDEN:
case SOUP_STATUS_GONE:
code = G_IO_ERROR_NOT_FOUND;
break;
/* These statuses are internal to libsoup, and not standard HTTP ones: */
case SOUP_STATUS_CANCELLED:
code = G_IO_ERROR_CANCELLED;
break;
case SOUP_STATUS_REQUEST_TIMEOUT:
code = G_IO_ERROR_TIMED_OUT;
break;
case SOUP_STATUS_CANT_RESOLVE:
case SOUP_STATUS_CANT_CONNECT:
code = G_IO_ERROR_HOST_NOT_FOUND;
Expand All @@ -1092,10 +1086,11 @@ on_request_sent (GObject *object,
code = G_IO_ERROR_BROKEN_PIPE;
#else
code = G_IO_ERROR_CONNECTION_CLOSED;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to move this #endif?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argh, no, the problems of squashing two diffs

break;
#endif
default:
code = G_IO_ERROR_FAILED;
code = _ostree_fetcher_http_status_code_to_io_error (msg->status_code);
break;
}

{
Expand Down
19 changes: 19 additions & 0 deletions src/libostree/ostree-fetcher-util.c
Original file line number Diff line number Diff line change
Expand Up @@ -218,3 +218,22 @@ _ostree_fetcher_should_retry_request (const GError *error,

return FALSE;
}

/* Convert a HTTP status code representing an error from libsoup or libcurl to
* a #GIOError. This will return %G_IO_ERROR_FAILED if the status code is
* unknown or otherwise unhandled. */
GIOError
_ostree_fetcher_http_status_code_to_io_error (guint status_code)
{
switch (status_code)
{
case 403: /* SOUP_STATUS_FORBIDDEN */
case 404: /* SOUP_STATUS_NOT_FOUND */
case 410: /* SOUP_STATUS_GONE */
return G_IO_ERROR_NOT_FOUND;
case 408: /* SOUP_STATUS_REQUEST_TIMEOUT */
return G_IO_ERROR_TIMED_OUT;
default:
return G_IO_ERROR_FAILED;
}
}
2 changes: 2 additions & 0 deletions src/libostree/ostree-fetcher-util.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ void _ostree_fetcher_journal_failure (const char *remote_name,
gboolean _ostree_fetcher_should_retry_request (const GError *error,
guint n_retries_remaining);

GIOError _ostree_fetcher_http_status_code_to_io_error (guint status_code);

G_END_DECLS

#endif