Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upNetwork retry issue 1602 #2396
Conversation
rust-highfive
assigned
alexcrichton
Feb 19, 2016
This comment has been minimized.
This comment has been minimized.
|
Thanks for the PR @sbeckeriv! Some thoughts of mine:
|
This comment has been minimized.
This comment has been minimized.
|
Thanks for the reply. I will move the net_retry logic in to the utils config object. For the logic it self I believe I know how to do this. I am going to write a function that takes a closure and handles the logic. I will figure out a white list enum too.
I think with the closure concept I could push the logic down the stack, however, that would not blanket all network calls. Do you have some thoughts on how I might go about that? I think most network calls are else where crates. I could update curl-rust to support a retry option pretty easy. Make it an http setting like proxy and timeout. I read that libcurl does not directly support it. I need to read more around the ssh side. Let me know your thoughts. |
This comment has been minimized.
This comment has been minimized.
Nah I can help audit after the infrastructure in place, and then I can also help to ensure that we always push network requests through the same location. I'd also recommend that even if this were an option in curl-rust that we'd still do it in Cargo as we'd likely want to give a UI update as part of it as well anyway. |
This comment has been minimized.
This comment has been minimized.
|
Before I get to far thoughts on these changes: Wrapped retry logic: Thanks! It was fun to figure out the with_retry. |
This comment has been minimized.
This comment has been minimized.
|
Yeah, that looks great to me! I think we may want to have another callback for "was this error one we can retry from", but other than that it looks like the right infrastructure to me |
This comment has been minimized.
This comment has been minimized.
|
Hey, For the "was this error one we can retry from"; error handling is new to me. Should I pass it the CargoResult value or the unwrapped Err? What might that callback look like? How do I check its a Err I care about vs any other error? Thanks! |
alexcrichton
reviewed
Feb 25, 2016
| @@ -134,7 +140,7 @@ impl<'cfg> PackageRegistry<'cfg> { | |||
| Some(&(ref previous, _)) => { | |||
| if previous.precise() == namespace.precise() { | |||
| debug!("load/match {}", namespace); | |||
| return Ok(()) | |||
| return Ok(()); | |||
This comment has been minimized.
This comment has been minimized.
alexcrichton
Feb 25, 2016
Member
Perhaps you have rustfmt on auto-run? Could you back out these changes for now? They tend to clutter up the diff unfortunately :(
alexcrichton
reviewed
Feb 25, 2016
| @@ -93,7 +97,8 @@ impl<'cfg> PackageRegistry<'cfg> { | |||
| let mut ret = Vec::new(); | |||
|
|
|||
| for (_, source) in self.sources.sources_mut() { | |||
| try!(source.download(package_ids)); | |||
| try!(network::with_retry(&self.config, || source.download(package_ids))); | |||
This comment has been minimized.
This comment has been minimized.
alexcrichton
Feb 25, 2016
Member
This probably wants to be attached to the literal network calls themselves rather than way up here at the download layer. For example a download from everything but the registry doesn't actually need to be retried.
alexcrichton
reviewed
Feb 25, 2016
| let repo = try!(self.remote.checkout(&self.db_path)); | ||
|
|
||
| let repo = try!(network::with_retry(&self.config, ||{ | ||
| self.remote.checkout(&self.db_path) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
sbeckeriv
Feb 27, 2016
Author
Contributor
Im going to need some advice here. I started to put a cargo_config: &'cfg Config attribute on GitRemote. This lead to a cascade of lifetime markers. After that cleared up I am getting more errors. I think I can reason about the errors but I just wanted to make sure my approach was right.
Thanks!
Becker
This comment has been minimized.
This comment has been minimized.
alexcrichton
Feb 28, 2016
Member
Ah it should be fine to probably just pass in the Config instead of storing it. Each source already stores a pointer to Config somewhere so it should be readily available to pass in
This comment has been minimized.
This comment has been minimized.
|
I think that it may be good to introduce a new trait: trait NetworkError {
pub fn may_be_spurious(&self) -> bool;
}That way you can define the retry function with a |
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
I pushed down the network call in to the checkout command. I added the NetworkError trait. I am unsure how to apply the bounds in the retry method. I added it as a trait bound to T but the T is the return Ok value for the cargo result not the error. Any advice or examples of how to do this would be great! Thanks again |
alexcrichton
reviewed
Mar 3, 2016
| /// Example: | ||
| /// use util::network; | ||
| /// cargo_result = network.with_retry(&config, || something.download()); | ||
| pub fn with_retry<T, F>(config: &Config, mut callback: F) -> CargoResult<T> |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Mar 3, 2016
Member
I think the signature you want for this method is something along the lines of:
pub fn with_retry<T, E, F>(config: &Config, mut callback: F) -> CargoResult<T>
where F: FnMut() -> Result<T, E>,
E: NetworkError + CargoErrorThis would then match on the result of callback and call maybe_spurious to see whether it is indeed a spurious error or not.
alexcrichton
reviewed
Mar 3, 2016
| // NetworkError trait | ||
|
|
||
| pub trait NetworkError: Error + Send + 'static { | ||
| fn maybe_spurious(&self) -> bool { true } |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Mar 3, 2016
Member
Yeah this looks along the right lines to me, but you may want to define it as:
pub trait NetworkError: CargoError {
fn maybe_spurious(&self) -> bool;
}That way you can elide the extra bounds by just inheriting from CargoError, and it also forces the implementations below to check correctly whether they're related to the network before unconditionally retrying.
This comment has been minimized.
This comment has been minimized.
sbeckeriv
Mar 4, 2016
Author
Contributor
Did not work out of the box. I need some help deciphering what to do. I pushed the failing code but here is my errors: https://gist.github.com/sbeckeriv/bd2173d07e98b0e599d6
Thanks.
Becker
This comment has been minimized.
This comment has been minimized.
sbeckeriv
Mar 4, 2016
Author
Contributor
I think I put this on the wrong comment. I dont think I want to bound the with_retry to the NetworkError trait. I think I want to check in with_retry if the error is NetworkError or not, but all the errors should all be CargoError bound?
alexcrichton
reviewed
Mar 4, 2016
| let mut result: CargoResult<T> = Err(human("failed to initialize network call")); | ||
| for try in 0..net_retry + 1 { | ||
| result = callback(); | ||
| if result.is_err() { |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
sbeckeriv
Mar 6, 2016
Author
Contributor
What do you see maybe_spurious doing? I am thinking a is_retryable makes more sense. Then it would be if result.is_err() && result.is_retryable() which reads like I am thinking the code works. Right now it would return true for the two errors but it could do more for other errors.
This comment has been minimized.
This comment has been minimized.
|
@sbeckeriv yeah the point of the bounds was to highlight the fact that |
This comment has been minimized.
This comment has been minimized.
|
The bounds make perfect sense now. I pushed the config down to the bottom fetch call. I am still getting "mismatched type" errors on the result I am returning. The generics at this level are beyond me. The error text https://gist.github.com/sbeckeriv/3866bd2ab7241da32b7c I have tried many combinations. I thought the error came from how I initialized the result value but I removed that. The normal human err is does not impl network error. I then thought it was because the shell command internally could cause an error but I removed the try with no luck. Do i need to rebuild the result or implement a from? Thanks again, |
This comment has been minimized.
This comment has been minimized.
|
No worries! Here's how I might write this function: pub fn with_retry<T, E, F>(config: &Config, mut callback: F) -> CargoResult<T>
where F: FnMut() -> Result<T, E>,
E: NetworkError
{
let mut remaining = try!(config.net_retry());
loop {
match callback() {
Ok(ret) => return Ok(ret),
Err(ref e) if e.maybe_spurious() && remaining > 0 => {
// print to shell
remaining -= 1;
}
Err(e) => return Err(Box::new(e)),
}
}
}Perhaps that can help you out? |
This comment has been minimized.
This comment has been minimized.
|
That does! When I first started I decomposed but I went the other way for reasons I do not remember. I need to update my test but it compiles now. Thanks! |
This comment has been minimized.
This comment has been minimized.
|
I have the retry at the lowest level fetch. I think this covers all the git actions. I am looking in to the interactions with the registry. |
This comment has been minimized.
This comment has been minimized.
|
As I figure out the code I am not sure I want to extern cargo in to crates-io. I am guessing there are separated in to there own project for a reason. Now I would require Config and network to cross back in to creates. I could just copy the concept but there is no config system for cargo-io helpers. Maybe this is figured out in another pull request? Let me know your thoughts. Thanks again! |
alexcrichton
reviewed
Mar 11, 2016
| } | ||
|
|
||
| impl NetworkError for git2::Error { | ||
| fn maybe_spurious(&self) -> bool { true } |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Mar 11, 2016
Member
For git errors this may want to at least check the class of the error for Net, and perhaps look at the code as well to ensure it's not a known non-spurious error.
alexcrichton
reviewed
Mar 11, 2016
| fn maybe_spurious(&self) -> bool { true } | ||
| } | ||
| impl NetworkError for curl::ErrCode { | ||
| fn maybe_spurious(&self) -> bool { true } |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Mar 11, 2016
Member
We probably want to inspect the error code here a little more to ensure it actually is spurious
This comment has been minimized.
This comment has been minimized.
sbeckeriv
Mar 16, 2016
Author
Contributor
Im having a heck of a time understand how to use the curl errors. I believe this should work.
curl::ErrCode::COULDNT_CONNECT
But believing it doesn't make it work. Thoughts on how I can use the names? Should I give up and use the numbers instead?
This comment has been minimized.
This comment has been minimized.
alexcrichton
Mar 16, 2016
Member
Ah using the raw error codes from curl is fine, this should basically just be a whitelist of known bad error codes that can be retried, and we can expand the list over time.
This comment has been minimized.
This comment has been minimized.
|
Looking good! I think though that this may not catch where we download from the registry (using curl), just git so far. API calls with crates.io also look like they may need instrumentation as well, and you can probably just implement |
This comment has been minimized.
This comment has been minimized.
|
I updated the error checking. For git2::ErrorCode its not clear which errors relate to network activity. Maybe EOF for a disconnect? In my test I found that not connecting to a host gave an git2::ErrorClass::Os error. I am checking for both Os and Net now. Maybe my test is invalid? For curl errors I left a comment to what I believe the number maps to based off of this website https://curl.haxx.se/libcurl/c/libcurl-errors.html Thanks again for all the help. |
alexcrichton
reviewed
Mar 17, 2016
| .file("src/lib.rs", ""); | ||
| .file("src/lib.rs", "").file(".cargo/config", r#" | ||
| [net] | ||
| retry=0 |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Mar 17, 2016
Member
In theory these modifications to tests shouldn't be necessary. All of these are indicative of errors that can never be resolved, so they shouldn't print out warnings anyway
This comment has been minimized.
This comment has been minimized.
sbeckeriv
Mar 17, 2016
Author
Contributor
Yes. These are leftover from before the trait. I should be able to remove these.
alexcrichton
reviewed
Mar 17, 2016
| try!(config.shell() | ||
| .say(format!("failed network call. {} \ | ||
| tries remaining", remaining), | ||
| YELLOW)); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Mar 17, 2016
Member
Could this use shell().warn(..)? It also may want to avoid using a period as we tend to avoid that kind of punctuation in Cargo.
alexcrichton
reviewed
Mar 17, 2016
| 5 | //COULDNT_RESOLVE_PROXY | ||
| 6 | //COULDNT_RESOLVE_HOST | ||
| 28 | //OPERATION_TIMEDOUT | ||
| 56 // RECV_ERROR |
This comment has been minimized.
This comment has been minimized.
alexcrichton
Mar 17, 2016
Member
Can this link to the curl-sys crate and use the symbolic names for these constants?
This comment has been minimized.
This comment has been minimized.
sbeckeriv
Mar 17, 2016
Author
Contributor
curl-sys is a dependency of curl but I cant figure out how to access it. Should I just added it as a dependency?
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
Mar 17, 2016
| fn maybe_spurious(&self) -> bool { | ||
| match self.class() { | ||
| git2::ErrorClass::Net | | ||
| git2::ErrorClass::Os => true, |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
sbeckeriv
Mar 17, 2016
Author
Contributor
This is the class I got for my test when it timed out. When I tell the test to connect to a server that is not running https://github.com/rust-lang/cargo/pull/2396/files#diff-5a6267705b81a22950ae48ebdce1a553R8
I thought it would have gave a Net error but Os is what it printed.
alexcrichton
reviewed
May 11, 2016
| @@ -1,3 +1,4 @@ | |||
| extern crate curl_sys; | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
sbeckeriv
May 11, 2016
Author
Contributor
I moved it to src/cargo/lib.rs I think thats the "root". Let me know if that is wrong.
This comment has been minimized.
This comment has been minimized.
alexcrichton
reviewed
May 11, 2016
| curl_sys::CURLcode::CURLE_COULDNT_CONNECT | | ||
| curl_sys::CURLcode::CURLE_COULDNT_RESOLVE_PROXY | | ||
| curl_sys::CURLcode::CURLE_COULDNT_RESOLVE_HOST | | ||
| curl_sys::CURLcode::CURLE_OPERATION_TIMEDOUT | |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
sbeckeriv
May 11, 2016
Author
Contributor
It looks like curl retries (at least once) on CURLE_OPERATION_TIMEDOUT https://github.com/curl/curl/blob/b9728bca549709a26a5228f1d44f7488dd26811d/src/tool_operate.c#L1449
I do not have any data around which errors people are seeing. It looks like, after a quick search in the curl code, that CURLE_OPERATION_TIMEDOUT covers a lot of different case. After this is merge the list can be changed easy. In this case I believe catching more is better only because if this error code turns out to be bad the work around would be to configure the setting. If you release cargo without this and it turns out that it is the core failure case for curl everyone who is affected need to wait for a new release cycle.
Let me know if you would still like to remove it.
This comment has been minimized.
This comment has been minimized.
alexcrichton
May 11, 2016
Member
Ok, let's leave it in and we can always back it out if weird things happen
alexcrichton
reviewed
May 11, 2016
| Err(ref e) if e.maybe_spurious() && remaining > 0 => { | ||
| try!(config.shell() | ||
| .warn(format!("failed network call {} \ | ||
| tries remaining", remaining))); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
sbeckeriv
May 11, 2016
Author
Contributor
Yes! I agree. I completely removed the "Failed network call" part. It now looks like
" [2/-1] Failed to connect to 127.0.0.1: Connection refused 1 tries remaining"
This comment has been minimized.
This comment has been minimized.
|
Looking good @sbeckeriv, thanks! |
alexcrichton
reviewed
May 12, 2016
| Err(ref e) if e.maybe_spurious() && remaining > 0 => { | ||
| try!(config.shell() | ||
| .warn(format!("{} {} tries remaining", e, | ||
| remaining))); |
This comment has been minimized.
This comment has been minimized.
alexcrichton
May 12, 2016
Member
Perhaps this could be along the lines of:
let msg = format!("spurious network error ({} tries remaining): {}", remaining, e);
try!(config.shell().warn(msg));
This comment has been minimized.
This comment has been minimized.
|
Awesome! Just a minor tweak to some wording, and then perhaps some squashing? Other than that r=me |
sbeckeriv
force-pushed the
sbeckeriv:network-retry-1602
branch
from
4fc800d
to
b3de4c2
May 12, 2016
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton I made it one commit. Thanks for working with me and all the help! I learned a lot. Becker |
alexcrichton
reviewed
May 12, 2016
| Ok(ret) => return Ok(ret), | ||
| Err(ref e) if e.maybe_spurious() && remaining > 0 => { | ||
| let msg = format!("spurious network error ({} tries \ | ||
| remaining): {}", remaining, e); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
sbeckeriv
May 12, 2016
Author
Contributor
The retry config is 2 by default.
count is 2
I run once and fail i output
2 tries remains
subtract 1 count is 1
run a second time
1 tries remaining
subtract 1 count is 0
retry again
just error.
The retry count is the count of retries not the count of total tries. I can subtract one from the config or update the docs to be explicitly clear it is the count of retries and not the total network tries?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Looks like the Windows CI is failing for non-spurious reasons? |
sbeckeriv
force-pushed the
sbeckeriv:network-retry-1602
branch
from
b3de4c2
to
2f04d00
May 12, 2016
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
May 12, 2016
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
Sorry I dont have access to a windows box right now. I might tonight. Does windows not load the .cargo/config file or does it output extra information? |
This comment has been minimized.
This comment has been minimized.
|
It should load that, yeah, but I suspect the problem here is that the exact error message is slightly different. It should be possible to just use |
sbeckeriv
force-pushed the
sbeckeriv:network-retry-1602
branch
from
2f04d00
to
7b03532
May 12, 2016
This comment has been minimized.
This comment has been minimized.
|
Works locally. Lets see! |
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
May 12, 2016
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
sbeckeriv commentedFeb 19, 2016
•
edited
Dearest Reviewer,
This branch resolves #1602 which relates to retrying network
issues automatically. There is a new utility helper for retrying
any network call.
There is a new config called net.retry value in the .cargo.config
file. The default value is 2. The documentation has also been
updated to reflect the new value.
Thanks
Becker