Fixed migration of docker content when corrupted content has been #352
Conversation
| @@ -129,6 +129,7 @@ def pipeline_stages(self): | |||
| DockerContentSaver(), | |||
| ResolveContentFutures(), | |||
| InterrelateContent(), | |||
| #RemoveEmtpyTags(), | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving post_hook only would not solve problem for upgraded installations that already contain problem with tagged_manifest=null. In this case a migration is needed to remove existing tags that point to null manifest andd then leave all the subsequent job for future runs on the post_save hook.
or - I get implemented this stage that would loop trough dc and check type(dc.content) == Tag and remove those that have tagged_manifest=null.
| @@ -407,61 +408,62 @@ def get_remote_by_importer_id(importer_id): | |||
| downloaded=pulp2content.downloaded | |||
| ) | |||
| if artifact is None: | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am pretty sure i have screwed up the indentation since at least the counter sduring the migration got off, but the gist of this change is that we cannot continue when artifact is None due to the content that has futures, those futures need to be resolved otherwise we would get into the problem where in one batch there would be manifests+tags( in the ContentSaverStage), and since tags rely on manifests being already saved at the tags _pre_save_hook() tags won't have needed information at the save() time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to check with @quba42 on how this change will impact debian pipeline.
I have checked rpm one - it does have futures -Rpms, that are getting resolved and then in the interrelate stage relation betwen modules and rpms are created. For the case when a modular rpm would miss, there would be created one relation less. tldr - rpm plugin is not impacted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure.
Here is what I do rely on: Later types in the order given by the ordered dict DebMigrator.content_models, may query for the objects created by earlier types from their create_pulp3_content() method. e.g.:
release = pulp3_models.Release.objects.filter(<some_uniquely_identifying_filter>).first()
Currently, if such queries return None, the pipeline will fail.
I understand I need to add graceful handling for the case that this query will return None because the target content was never created because it's artifact was corrupt, and it was therefore skipped (should be trivial to do).
Edit: See #353 (perhaps not quite as trivial as I had hoped).
I am not sure if there can also be a case were the query returns None because the target content has not yet passed sufficiently far through the pipeline, which would be a bigger problem...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should not be the case because debian package is a future type content and will be resolved/saved/skipped if corrupted/ by the time PackageReleaseComponent content will be saved.
16a0850
to
3e57bca
Compare
| skipped_content += 1 | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, maybe let's not nest it more and try to keep continue in place.
It will mean that the futures needs to be set for resolution at the end, outside of the loop.
| if pb: | ||
| pb.increment() | ||
|
|
||
| if has_future and dc: | ||
| futures.append(dc) | ||
| resolve_futures = len(futures) >= DEFAULT_BATCH_SIZE or pb.done == pb.total | ||
| all_content_done = pb.done == pb.total or (pb.done + skipped_content) == pb.total |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like that skipped content is counted as a part of pb.done anyway, becasue of the line 464.
| if pb: | ||
| pb.increment() | ||
|
|
||
| if has_future and dc: | ||
| futures.append(dc) | ||
| resolve_futures = len(futures) >= DEFAULT_BATCH_SIZE or pb.done == pb.total | ||
| all_content_done = pb.done == pb.total or (pb.done + skipped_content) == pb.total | ||
| resolve_futures = len(futures) >= DEFAULT_BATCH_SIZE or all_content_done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide to have set future resolution outside of the loop as well, we can get rid off the second condition completely.
|
|
||
| """ | ||
| for dc in batch: | ||
| if type(dc.content) == Tag and not dc.content.tagged_manifest: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not critical here so it;s up to you, but I'd vote for the isinstance instead of type. Taught myself not to use type in Python at all. Just a as a good habit, so it doesn't bite in certain situations.
6a76c2f
to
c70cd6b
Compare
CHANGES/8594.bugfix
Outdated
| @@ -0,0 +1 @@ | |||
| Handle skipping of corrupted docker content that has futures. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about Handled properly skipping of corrupted docker content.?
Past tense for changelog entries + I think users have no idea what futures are, so I'm trying to make the changelog entry more user oriented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
Please see my only comment about the changelog entry.
c70cd6b
to
980d64c
Compare
identified.
[noissue]