New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error propagation for download and upload client #2925
Comments
Please assign to me :) |
avec plaisir ;-) |
Proposed changes here: davidgcameron@b97ac11 I didn't want to make a pull request yet, since we should discuss how these changes may affect other clients (if there are any?) using the download and upload clients. I've tested that the pilot (with my latest changes that were merged) and rucio download and upload CLI work ok. |
thanks! |
Why do we need to do changes to the API at all? Are the log strings not enough? |
Removing the exceptions might be a bit problematic yes. |
Logs are not enough for callers of the API which need to pass error messages up to other systems (i.e. pilot passing the error back to panda to display on the panda monitor). Exceptions are bad because in a bulk call you may not want to stop transfers just because one file failed. The exception also does not tell you which file failed unless you do some error-prone string parsing. The doc for download_pfns even says that it returns a dictionary with file states including FAILED, so I suppose there were not meant to be exceptions originally. Also I don't think that a transfer failure is an exceptional or unexpected situation so throwing an exception is not really the right thing to do. |
Alright, but there needs to be an aggregated error message per API call that could be passed to other systems like panda? Or how will it be displayed if one file fails because of no source found, another one fails because of checksum validation and a third one succeeds? 'Not all files downloaded' sounds reasonable for me at this point. For the download the API will try to download all given files even if one fails. As Martin mentioned it should be possible to put one error string per file into the exception and this looks to me like the best (and non API breaking) solution. And I think this should then be done for every error and not just in case the protocol gives an error like in your code changes currently. |
Summary of face to face discussion (please correct anything I got wrong): no change in API so as not to break backwards compatibility. There are two options for error propagation: pass the file status dictionary as a keyword argument in the exception, or use the information in the traces that is passed to download client (traces_copy_out) and filled with information. |
…ions Fix rucio#2925" This reverts commit b97ac11.
The commits above revert the previous changes and instead fill the 'stateReason' of the traces with error messages. I had to add a parameter to the uploadclient to support passing a reference to a traces list (it was already in the download client). |
Thanks a lot! Looks fine to me except some small comments: In the download client you have this unused variable And I think you missed at least two points which raise errors quite frequently:
Is there a reason the upload client uses the And one less important point: is |
Thanks for the comments. I fixed the minor issues.
The download client sends a trace for each attempt whereas the upload client only sends the trace after all attempts have finished, so this is the reason to keep
Maybe @tbeerman could answer that one, it could be used for example for the automatic declaration of lost files. |
Travis tests still fail due to:
Is that expected? |
Did you try to restart or does it consistently fail? Sometimes this one shows up but I think this is more travis related. |
The tests also fail in the pull requests but with an Oracle error which I think is unrelated to my changes. |
…ion_in_download_and_upload_client Clients: return error info in traces Fix #2925
…ion_in_download_and_upload_client Clients: return error info in traces Fix #2925
Motivation
Pilots using rucio mover do not report useful information on why a transfer failed. This is because on failure the download and upload client throw an exception with no information inside. The real reason for the error should be propagated better back to the caller of the client.
Modification
The dowloadclient and uploadclient should return a dictionary of file status (containing appropriate error information) instead of raising an exception.
The text was updated successfully, but these errors were encountered: