New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Taught PulpImporter to retry (once) on _import_file() failure. #1278
Conversation
15a17f4
to
90c0a31
Compare
Attached issue: https://pulp.plan.io/issues/8633 |
Ordering-on-export insures we won't deadlock at import-time when repo-versions have overlapping content. Accepting this change without the referenced pulpcore fix will result in fixing the deadlock, but generating unique-constraint violations in the same place at import-time. fixes #7904 Requires PR: pulp/pulpcore#1278
Ordering-on-export insures we won't deadlock at import-time when repo-versions have overlapping content. Accepting this change without the referenced pulpcore fix will result in fixing the deadlock, but generating unique-constraint violations in the same place at import-time. fixes #7904 Requires PR: pulp/pulpcore#1278
Ordering-on-export insures we won't deadlock at import-time when repo-versions have overlapping content. Accepting this change without the referenced pulpcore fix will result in fixing the deadlock, but generating unique-constraint violations in the same place at import-time. fixes #7904 Required PR: pulp/pulpcore#1278
pulpcore/app/tasks/importer.py
Outdated
) | ||
# Second try will either succeed, or raise an error | ||
try: | ||
a_result = _import_file(os.path.join(rv_path, filename), res_class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will either succeed (as in have no errors) or raise an Exception, right?
In that case, there is no use for the while True:
construct.
Instead, i would put the first invocation in a try
construct, catch the specific constraints exception and do one retry in the except
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left the while-true construct in place because one thought I had was "is retrying exactly once the right thing to do?" I haven't convinced myself that there might be scenarios where multiple-retries are the Right Answer - but neither have I convinced myself one is The Way, either. Any thoughts?
RE retry-inside-except - mmm, yeah could do that. I try to avoid code that expects exceptions, inside an except: block, just because of Bad Experiences with nested exceptions in the past. May well be cleaner/more understandable here to do so tho, especially if "try just once" is the way we go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also get rid of while loop in case we end up re-trying once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a race condition in django-import-export's get_or_init_instance() that is exercised by importing repo-versions concurrently. We attempt an import and check for errors, retrying ONCE if encountered. On a second error, fail the attempt. The test added for pulp_rpm #7904 cover this case. fixes #8633 [nocoverage]
90c0a31
to
6c013a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One cosmetic suggestion, however.
try: | ||
log.info(_("Importing file {}.").format(fpath)) | ||
with open(fpath, "r") as json_file: | ||
data = Dataset().load(json_file.read(), format="json") | ||
resource = resource_class() | ||
log.info(_("...Importing resource {}.").format(resource.__class__.__name__)) | ||
return resource.import_data(data, raise_errors=True) | ||
return resource.import_data(data, raise_errors=do_raise) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could call that variable raise_errors
for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I actually don't like exposing parameter names - esp someone else's parameter-names - in this way. To me, the parameter is telling _import_files() to either raise any errors found, or to deal with them. _import_files() is choosing to implement that contract by relying on what import_data() does - but that's an implementation detail.
Parameter names belong to the API that implement them, exposing them up the tree feels like a leaky/broken encapsulation to me.
Ordering-on-export insures we won't deadlock at import-time when repo-versions have overlapping content. Accepting this change without the referenced pulpcore fix will result in fixing the deadlock, but generating unique-constraint violations in the same place at import-time. fixes #7904 Required PR: pulp/pulpcore#1278
There's a race condition in django-import-export's get_or_init_instance()
that is exercised by importing repo-versions concurrently. We attempt
an import and check for errors, retrying ONCE if encountered. On a
second error, fail the attempt.
The test added for pulp_rpm #7904 cover this case.
fixes #8633
ref #7904
[nocoverage]