Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sync pipeline #608

Merged
merged 1 commit into from Mar 25, 2022
Merged

Refactor sync pipeline #608

merged 1 commit into from Mar 25, 2022

Conversation

mdellweg
Copy link
Member

@mdellweg mdellweg commented Mar 3, 2022

fixes #495

@mdellweg mdellweg force-pushed the refactor_sync branch 8 times, most recently from 7fc3be1 to ee06a72 Compare March 8, 2022 12:50
Comment on lines +687 to +692
manifest_list=manifest_dc.content,
image_manifest=dc.content,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, i really think we are using these variables the wrong way around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much effort will it take to rename those variables?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It involves a data migration at least. I don't want to mix that up in another big PR like this.

@mdellweg mdellweg changed the title WIP Refactor sync pipeline Refactor sync pipeline Mar 8, 2022
@mdellweg mdellweg marked this pull request as ready for review March 8, 2022 14:53
@mdellweg mdellweg force-pushed the refactor_sync branch 2 times, most recently from 6017733 to d801f93 Compare March 11, 2022 14:22
pulp_container/app/tasks/sync_stages.py Outdated Show resolved Hide resolved
Comment on lines +235 to +243
for manifest_dc in self.manifest_dcs:
config_blob_dc = manifest_dc.extra_data.get("config_blob_dc")
if config_blob_dc:
manifest_dc.content.config_blob = await config_blob_dc.resolution()
for blob_dc in manifest_dc.extra_data["blob_dcs"]:
# Just await here. They will be associated in the post_save hook.
await blob_dc.resolution()
await self.put(manifest_dc)
self.manifest_dcs.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we end up in a state where self.manifest_dcs.clear() will be invoked less frequently than the actual iteration (because of await blob_dc.resolution())? In the end, we will be iterating over the same manifest declarative contents.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the question. We collect the manifest_dc in a bucket, we make sure they are ready to processed, we shove them in the pipeline we empty the bucket and start over.
There is no break, so the only thing that can end the loop prematurely is a fatal exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that if there is any possibility to run resolve_flush for the same declarative content objects. So, we will be calling resolve_flush on declarative content that was already queued.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can happen that multiple declarative content objects result in the same content, but every dc instance will only be part of a resolve_flush call once. (That is why we clear the list.)

Comment on lines +687 to +692
manifest_list=manifest_dc.content,
image_manifest=dc.content,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much effort will it take to rename those variables?

pulp_container/app/tasks/sync_stages.py Outdated Show resolved Hide resolved
Comment on lines +141 to +144
[
self.create_listed_manifest(list_dc, manifest_data)
for manifest_data in content_data.get("manifests")
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also handle a state where a manifest could not be downloaded in self.create_listed_manifest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#648

Let's keep refactoring and features separate.

CHANGES/495.misc Outdated Show resolved Hide resolved
Remove content interrelate stage in favor of content.resolution and
_post_save.

fixes pulp#495

Co-authored-by: Ľuboš Mjachky <lmjachky@redhat.com>
Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through it and it exactly implements this pseudo code #495 (comment) except for the popping the dc and replacing with the resolved content, which is ok and i think even reads better simply by accessing the dc.content instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove the interrelate stage from the sync pipeline
3 participants