Conversation
|
Hello @bmbouter! Thanks for updating the PR.
Comment last updated on November 30, 2018 at 19:45 Hours UTC |
ff2fd97
to
2ba95be
Compare
| @@ -105,6 +107,7 @@ def tls_storage_path(self, name): | |||
| password = models.TextField(blank=True) | |||
| last_synced = models.DateTimeField(blank=True, null=True) | |||
| connection_limit = models.PositiveIntegerField(default=20) | |||
| policy = models.TextField(choices=POLICY_CHOICES, default=POLICY_MODE.IMMEDIATE) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for a plugin writer to override these choices and default somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They mostly could right now by defining the policy attribute on the subclass with another set of choices. There is one issue in that the helptext is duplicated because our use of choices isn't correct afaict. This PR kind of does the same thing since that problem already exists w/ the TASK_STATES. Other than that though they could do that, and we should fix that too but I'm not sure how yet. I filed it as this issue here: https://pulp.plan.io/issues/4127
Note that these values affect the behavior of the streamer so if a plugin writer starts modifying the CHOICES values they are enabling/disabling features for the streamer. They need to be aware of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm removing this default because it doesn't work w/ DRF.
2ba95be
to
30f8e84
Compare
30f8e84
to
65f79c8
Compare
f50d4f9
to
beeb666
Compare
beeb666
to
7aa2048
Compare
bd06557
to
d0d5ede
Compare
d0d5ede
to
88fedc9
Compare
|
There is an issue with this because the 'policy' field is not optional. A default of 'immediate' is intended and is applied to the model layer here. I can't see how to set a default as a ChoiceField in the drf docs here. When I create a remote without setting it I get: Or should this be a required field with no default? |
88fedc9
to
c4f201d
Compare
Codecov Report
@@ Coverage Diff @@
## master #3738 +/- ##
==========================================
+ Coverage 54.25% 54.34% +0.09%
==========================================
Files 64 64
Lines 2835 2841 +6
==========================================
+ Hits 1538 1544 +6
Misses 1297 1297
Continue to review full report at Codecov.
|
pulpcore/pulpcore/constants.py
Outdated
| CACHE_ONLY='cache_only', | ||
| ) | ||
|
|
||
| POLICY_CHOICES = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should also have these descriptions in the docs somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree, and an explanation of the whole lazy thing in general. Maybe I should go ahead and write this page? https://docs.pulpproject.org/en/3.0/nightly/workflows/deferred-download.html
What about also me renaming that page to "Lazy Download"? Deferred Download just sounds so formal to me.
@asmacdo what do you think? ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. The only hesitation I have is if this workflow differs plugin to plugin. For now, I think what you propose makes sense, if we get variation, we can move those docs to the plugin_template and each plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmacdo I wrote this story which I can also do today. https://pulp.plan.io/issues/4218
Can you help me groom that story? Then I can write the docs in another PR very soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm taking the other PR as assigned so that we can merge this one. Thanks for grooming it @asmacdo
| ] | ||
| pipeline = [self.first_stage] | ||
| if self.download_artifacts: | ||
| pipeline.extend([QueryExistingArtifacts(), ArtifactDownloader(), ArtifactSaver()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering whether we should still have QueryExistingArtifacts() in the non-download case. I think this makes sense for on demand, we might not want to do it for cache only(?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a correctness interpretation of the name download_artifacts not including QueryExistingArtifacts is probably wrong. Query+Replace of Artifacts in the pipeline is not the same as downloading. So I see what you are saying.
My concern with adding it is the extra runtime cost; from our testing AIUI that stage adds a non-trivial amount of extra time to the overall runtime from the DB querying when processing say 70K Artifacts.
I think we should either do 1 of a few things.
a) Add back QueryExistingArtifacts and accept the runtime outcome
b) s/download_artifacts/handle_artifacts/ so that the name is more correct.
c) another idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to find out later that downloading a remote artifact is not necessary? (e.g. when the artifact is requested; I don't know the full picture) Then, not replacing artifacts here would not hurt too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning toward leaving it the way it is. When artifacts are not downloaded they should not be query/replaced in the pending-artifact either. Mainly because the replacement isn't deterministic (may or may not be replaced) so logic further down the pipeline should not rely on it. Leaving QueryExistingArtifacts in the pipeline without the downloading stage seems to add overhead without adding any value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the renaming (b), I'm not convinced it add clarity. I'm not a fan of names that include "handle" because it's such a vague term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the handle name is bad. What about include_artifact_stages=True and then it would include/disinclude all 3 of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the core, or a plugin includes an artifact related stage (in the future) that needs to be left in the pipeline even when artifacts are not downloaded? This naming assumes that will never happen.
The name download_artifacts is toggling a specific behavior that has the side-effect of omitting some of the stages. The renaming to include_artifact_stages changes the toggled behavior to something that has the side-effect of toggling downloading which seems less clear. I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'm leaving the 3 stages being omitted and the name as is.
| @@ -11,7 +11,7 @@ | |||
|
|
|||
| class DeclarativeVersion: | |||
|
|
|||
| def __init__(self, first_stage, repository, mirror=True): | |||
| def __init__(self, first_stage, repository, mirror=True, download_artifacts=True): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 passing (boolean) download_artifacts instead of sync policy name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @bmbouter.
c4f201d
to
16c8526
Compare
|
@jortel I pushed those changes. |
16c8526
to
e39db91
Compare
Required PR: pulp/pulp_file#132 This includes a serializer that validates input correctly. This also switches the finalize, write_data, and handle_headers callbacks to be coroutines themselves. This is required because within those callbacks the streamer needs to await on coroutines and that can only be done from withing a coroutine itself. This also adjusts the Remote.get_downloader() method signature to take `url` or `remote_artifact` as kwargs instead of `url` as a positional parameter. This allows plugin writers (and the streamer) to get downloaders configured with digest and size validation automatically from remote_artifacts. https://pulp.plan.io/issues/3763 closes pulp#3763
e39db91
to
3daf3bf
Compare
Required PR: pulp/pulp#3738 pulp_file now meaningfully uses the 'policy' attribute when syncing content. This is meant to be used with this PR: pulp/pulp#3738 https://pulp.plan.io/issues/4160 closes #4160
Required PR: pulp/pulp_file#132
This includes a serializer that validates input correctly.
https://pulp.plan.io/issues/3763
closes #3763