Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry downloads when baseurl or full path provided #158

Merged
merged 3 commits into from Jul 24, 2019

Conversation

j-mracek
Copy link
Member

@j-mracek j-mracek commented Jul 7, 2019

No description provided.

@@ -691,64 +691,74 @@ select_suitable_mirror(LrDownload *dd,
assert(!err || *err == NULL);

*selected_mirror = NULL;
// maximal errors is used to allow use mirrorst multiple times for a targed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of typos in the comment, but...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

gchar *mirrorurl = c_mirror->mirror->url; // shortcut

if (!maximal_errors) {
if (!maximal_errors && g_slist_find(target->tried_mirrors, c_mirror)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've just ckecked !maximal_errors on the line above this one, no need to ckeck it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

LrMirror *c_mirror = elem->data;
gchar *mirrorurl = c_mirror->mirror->url; // shortcut

if (!maximal_errors) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maximal_errors is an int, can you check it as maximal_errors == 0?

Twice more below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@@ -691,64 +691,74 @@ select_suitable_mirror(LrDownload *dd,
assert(!err || *err == NULL);

*selected_mirror = NULL;
// maximal errors is used to allow use mirrorst multiple times for a targed
int maximal_errors = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... the name maximal_errors is misleading, the number represents the current error count, not the maximum, please rename it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved, the value was renamed to mirrors_iterated

continue;
}
if (c_mirror->successful_transfers == 0 &&
dd->allowed_mirror_failures > maximal_errors &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maximal_errors is guaranteed to be 0 here, so you're effectively checking dd->allowed_mirror_failures > 0, which was in the original code...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this is correct or there is some misunderstanding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're inside a condition here in which you've checked that maximal_errors is equal to 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

librepo/downloader.c Show resolved Hide resolved
{
if (!maximal_errors) {
// Skip each url that doesn't have "file://" or "file:" prefix
g_debug("%s: Skipping mirror %s - Offline mode enabled", __func__, mirrorurl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing indent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

return TRUE;
}
} while (g_slist_length(target->tried_mirrors) < dd->allowed_mirror_failures &&
++maximal_errors < dd->allowed_mirror_failures);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be personal preference, but why not use a while instead of do ... while, makes it easier to read and there's no benefit to do here I think.

Also, the whole mirror-picking mechanism seems to be complex already and adding one more loop around it pushes it to being quite convoluted.

Why not make it simply loop over the available mirrors until either success or number of retries is reached?

GSList *elem = target->lrmirrors;
int num_retires = 0;
while (elem != NULL && num_retires <= dd->allowed_mirror_failures) {
    if (try_mirror(...)) {
        break;
    }

    num_retries++;
    elem = g_slist_next(elem);
    // loop around in case we reach the end of the mirror list
    if (elem == NULL) {
        elem = target->lrmirrors;
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry but I have a feeling that the proposed approach will not deliver the required behaviour.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the required behaviour though? I have a feeling the current code is unnecessarily complex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukash From my point of view do {} while () loop is more readable that proposed change. The behaviour should allow to use all mirrors first. Then to allow of reuse of limited number of mirrors.
The construct directly says it. Please could you specify whatever it is no go request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukash From my point of view do {} while () loop is more readable that proposed change.

Readability is subjective, but I don't think you can really say that your code is more readable than what I proposed. You have two nested loops there with several more nested ifs checking multiple conditions. Granted my example is simplified and will surely need more stuff added, it is a simple logical flow of what you'd expect the function to do. So I find it hard to believe you think your code is more readable...

The behaviour should allow to use all mirrors first. Then to allow of reuse of limited number of mirrors.

This is too vague and in fact my example can do exactly the same (by adding a single if). I believe that if you want to merge code like this, you should be able to describe with words what it is supposed to be doing and actually put that into a comment. Specifically focusing on the reasons why it actually needs to be complex. One thing is making it easier to understand for others, the other thing is, by adding complexity, you are adding more corner cases and more possibilities for bugs. (And we should really be avoiding this, that's why I'm having this argument)

Actually, you have two nested loops there, while there is no code in the outer loop (apart from incrementing maximal_errors) and both loops iterate over the same list of items. So I think it can be proven theoretically you can achive the same with a single loop 🙂.

The construct directly says it. Please could you specify whatever it is no go request?

I'm not sure I understand what you mean here, but if you're asking whether I insist on changing this in the PR, I think so, until I understand the reasons it needs to be like this...

Cheers!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comment to provide additional hint why the loop is there. I don't have a good feeling to change the loop system here. Especially in case that the change request is marked as "Might be personal preference".
But I have a different question: Shall we add some wait step after the first iteration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not just a "Might be personal preference", so please don't dismiss it as that.

I've fixed up your comment and added a couple more while reviewing the code, please have a look and if you don't mind, squash it with your commit: lukash@8d4dd9f

As for the wait, if you mean to add it to this loop you've added to select_suitable_mirror(), then I'd say certainly not, as that function only selects the mirror for the next download, so IMHO you'd just be messing up the flow by adding random waits. But I'm still barely understanding the code, so...

@@ -2331,11 +2342,15 @@ check_transfer_statuses(LrDownload *dd, GError **err)
target->tried_mirrors = g_slist_remove(target->tried_mirrors, target->mirror);
num_of_tried_mirrors = g_slist_length(target->tried_mirrors);
}

if (can_try_more_mirrors(dd, num_of_tried_mirrors))
gboolean complete_url_or_baseurl = complete_url_in_path || target->target->baseurl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everytime when I see baseurl here this function, the code is entirely misleading -- because baseurl isn't only about baseurl= statement, but it can mean link to mirrorlist or metalink.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I know. Even I believe that baseurl could be completely removed from the code, because it is handled as a single mirror. But it means to rewrite a lot of code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps add a comment with an explanation then? Because I (and so probably a generic reader as well) have no idea what you're talking about and I think I am misled 🙂

for (GSList *elem = target->lrmirrors; elem; elem = g_slist_next(elem)) {
LrMirror *c_mirror = elem->data;
gchar *mirrorurl = c_mirror->mirror->url; // shortcut

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is trailing whitespace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@j-mracek j-mracek force-pushed the baseurl branch 3 times, most recently from d25b2f7 to 957393b Compare July 23, 2019 08:15
__func__, c_mirror->failed_transfers, mirrorurl);
continue;
}
} else if (mirrors_iterated < c_mirror->failed_transfers) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So IIUC, mirrors can pick up failed_transfers on different targets that were downloaded previously and can thus always end up being skipped for the subsequent downloads.

Example:
I have one mirror and two targets. The first target is not on the mirror (404), the second is there. The first target will be attempted 4 times and raise c_mirror->failed_transfers to 4. For the second target the mirror will never be tried, because the conditions in the code above will always rule it out. Is this correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, then it seems like a problem with the algorithm, as it does not try every mirror at least once for each target, which I think it should?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm is correct, because there are two contradicting requirements:
a. try to download everything
b. fail fast

@@ -293,8 +293,11 @@ is_max_mirrors_unlimited(const LrDownload *download)
}

static gboolean
can_try_more_mirrors(const LrDownload *download, int num_of_tried_mirrors)
can_try_more(const LrDownload *download, int num_of_tried_mirrors, gboolean complete_tath_or_baseurl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: tath

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

@@ -2331,11 +2342,15 @@ check_transfer_statuses(LrDownload *dd, GError **err)
target->tried_mirrors = g_slist_remove(target->tried_mirrors, target->mirror);
num_of_tried_mirrors = g_slist_length(target->tried_mirrors);
}

if (can_try_more_mirrors(dd, num_of_tried_mirrors))
gboolean complete_url_or_baseurl = complete_url_in_path || target->target->baseurl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you perhaps add a comment with an explanation then? Because I (and so probably a generic reader as well) have no idea what you're talking about and I think I am misled 🙂


if (can_try_more_mirrors(dd, num_of_tried_mirrors))
gboolean complete_url_or_baseurl = complete_url_in_path || target->target->baseurl;
if (can_try_more(dd, num_of_tried_mirrors, complete_url_or_baseurl))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this function does two independent things now and is no longer very helpful. Consider putting the contents of the function directly in place here? I don't insist on this one though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function was renamed and added a description for the function. Hope that it helps.

/**
* @brief Can be tried another mirror or retried baseurl or fullpath
*
* @param download p_download:...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this some special syntax with the p_download:...? What does it do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, this was provided directly by kdevelop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Then it's better to remove it? The ... probably means you should add your own description? and I'm not sure what the p_* but I don't think we're using it, so let's not add it, it's redundant and useless?

FWIW I find the argument names self-describing, reiterating what the name already says in the argument description has little value. Since this is not a public function, I'd say a simple brief description is sufficient?

* @param complete_path_or_baseurl p_complete_path_or_baseurl: determine type of download - mirrors or baseurl/fullpath
* @return gboolean Return TRUE when another chance to download is allowed.
*/
can_try_more_paths(const LrDownload *download, int num_of_tried_mirrors, gboolean complete_path_or_baseurl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the _path suffix si good, how about naming the function can_retry_download?

@@ -293,7 +293,15 @@ is_max_mirrors_unlimited(const LrDownload *download)
}

static gboolean
can_try_more(const LrDownload *download, int num_of_tried_mirrors, gboolean complete_path_or_baseurl)
/**
* @brief Can be tried another mirror or retried baseurl or fullpath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description isn't great, how about:

Returns whether the download can be retried, using the same URL in case of baseurl or full path, or using another mirror in case of using mirrors.

In case that only one mirror is available, originally download fails.
The patch allow to try the same mirror multiple times till number of
fails is equal to allowed_mirror_failures.

It should provide more robust download, but the fail will require more
time.

https://bugzilla.redhat.com/show_bug.cgi?id=1678588
It allows to try baseurls or full paths up to allowed_mirror_failures
(4x by default). It should provide more robust downloads.

https://bugzilla.redhat.com/show_bug.cgi?id=1678588
@@ -2349,8 +2357,11 @@ check_transfer_statuses(LrDownload *dd, GError **err)
target->tried_mirrors = g_slist_remove(target->tried_mirrors, target->mirror);
num_of_tried_mirrors = g_slist_length(target->tried_mirrors);
}
// complete_url_in_path and target->baseurl doesn't have an alternatives like using
// mirrors, therefore they are handled differently and conditions for retry download
// also differes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end, i'd phrase it:

... conditions for retry also differ

(the fact that it's a download retry is clear from the context)

Additional description should provide a better hint what it does.
@lukash
Copy link
Contributor

lukash commented Jul 24, 2019

Merging, thanks 🙂

@lukash lukash merged commit e2836ac into rpm-software-management:master Jul 24, 2019
@praiskup
Copy link
Member

Thanks guys, any plans for new release?

@pkratoch
Copy link
Contributor

@praiskup, I will release it to Fedora rawhide, 30 and 29 probably today or tomorrow. I will post the bodhi update links here when it's done.

pkratoch added a commit to pkratoch/librepo that referenced this pull request Jul 26, 2019
Due to the following PR, this test currently fails.
rpm-software-management#158

The issue is that librepo removes downloaded package if the download fails.
It is not a critical bug, but the fix will take some time and the PR is quite
urgent.

If you have trouble reproducing this test failiure, note that the test was
skipped when run on tmpfs.
@pkratoch
Copy link
Contributor

@praiskup, sorry for a delay.

After merging this PR, librepo build fails on test test_download_packages_with_resume_02. After a discussion with j-mracek, we decided to skip this test teporarily, since the issue is not severe and the fix will take some time, while this PR is more urgent. I made a PR #160 to skip the test.

@pkratoch
Copy link
Contributor

Fedora 30: https://bodhi.fedoraproject.org/updates/FEDORA-2019-13479b151a
Fedora 29: https://bodhi.fedoraproject.org/updates/FEDORA-2019-b11742ac7d

@praiskup
Copy link
Member

Today we observed this:

2019-08-16T12:04:48Z DEBUG error: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64 (https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64).
2019-08-16T12:04:48Z DEBUG error: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64 (https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64).
2019-08-16T12:04:48Z DEBUG error: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64 (https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64).
2019-08-16T12:04:48Z DEBUG error: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64 (https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64).
2019-08-16T12:04:48Z DEBUG Cannot download 'https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64': Cannot prepare internal mirrorlist: Status code: 503 for https://mirrors.fedoraproject.org/metalink?repo=fedora-30&arch=aarch64.
2019-08-16T12:04:48Z ERROR Failed to download metadata for repo 'fedora'
2019-08-16T12:04:48Z DDEBUG Cleaning up.

The metalink download should poll around the CNAMEs ... though, I'm curious whether we shouldn't put there some sleep call, according to the timestamp all the requests were done at the same time.

@praiskup
Copy link
Member

praiskup commented Aug 16, 2019

Does librepo/libcurl name resolution for each attempt, or are there some caches?
Moving to bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1741931

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants