Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

Adds 'policy' to Remote #3738

Merged
merged 1 commit into from Nov 30, 2018
Merged

Conversation

bmbouter
Copy link
Member

@bmbouter bmbouter commented Nov 5, 2018

Required PR: pulp/pulp_file#132

This includes a serializer that validates input correctly.

https://pulp.plan.io/issues/3763
closes #3763

@pep8speaks
Copy link

pep8speaks commented Nov 5, 2018

Hello @bmbouter! Thanks for updating the PR.

Comment last updated on November 30, 2018 at 19:45 Hours UTC

@@ -105,6 +107,7 @@ def tls_storage_path(self, name):
password = models.TextField(blank=True)
last_synced = models.DateTimeField(blank=True, null=True)
connection_limit = models.PositiveIntegerField(default=20)
policy = models.TextField(choices=POLICY_CHOICES, default=POLICY_MODE.IMMEDIATE)
Copy link
Contributor

@daviddavis daviddavis Nov 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a plugin writer to override these choices and default somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They mostly could right now by defining the policy attribute on the subclass with another set of choices. There is one issue in that the helptext is duplicated because our use of choices isn't correct afaict. This PR kind of does the same thing since that problem already exists w/ the TASK_STATES. Other than that though they could do that, and we should fix that too but I'm not sure how yet. I filed it as this issue here: https://pulp.plan.io/issues/4127

Note that these values affect the behavior of the streamer so if a plugin writer starts modifying the CHOICES values they are enabling/disabling features for the streamer. They need to be aware of that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm removing this default because it doesn't work w/ DRF.

@bmbouter
Copy link
Member Author

There is an issue with this because the 'policy' field is not optional. A default of 'immediate' is intended and is applied to the model layer here. I can't see how to set a default as a ChoiceField in the drf docs here.

When I create a remote without setting it I get:

{
    "policy": [
        "This field is required."
    ]
}

Or should this be a required field with no default?

@codecov
Copy link

codecov bot commented Nov 29, 2018

Codecov Report

Merging #3738 into master will increase coverage by 0.09%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3738      +/-   ##
==========================================
+ Coverage   54.25%   54.34%   +0.09%     
==========================================
  Files          64       64              
  Lines        2835     2841       +6     
==========================================
+ Hits         1538     1544       +6     
  Misses       1297     1297
Impacted Files Coverage Δ
pulpcore/pulpcore/constants.py 100% <ø> (ø) ⬆️
pulpcore/pulpcore/app/serializers/repository.py 94.8% <100%> (+0.03%) ⬆️
pulpcore/pulpcore/app/models/repository.py 56.37% <100%> (+1.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dfca08...3daf3bf. Read the comment docs.

CACHE_ONLY='cache_only',
)

POLICY_CHOICES = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should also have these descriptions in the docs somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree, and an explanation of the whole lazy thing in general. Maybe I should go ahead and write this page? https://docs.pulpproject.org/en/3.0/nightly/workflows/deferred-download.html

What about also me renaming that page to "Lazy Download"? Deferred Download just sounds so formal to me.

@asmacdo what do you think? ^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. The only hesitation I have is if this workflow differs plugin to plugin. For now, I think what you propose makes sense, if we get variation, we can move those docs to the plugin_template and each plugin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asmacdo I wrote this story which I can also do today. https://pulp.plan.io/issues/4218

Can you help me groom that story? Then I can write the docs in another PR very soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking the other PR as assigned so that we can merge this one. Thanks for grooming it @asmacdo

]
pipeline = [self.first_stage]
if self.download_artifacts:
pipeline.extend([QueryExistingArtifacts(), ArtifactDownloader(), ArtifactSaver()])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering whether we should still have QueryExistingArtifacts() in the non-download case. I think this makes sense for on demand, we might not want to do it for cache only(?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a correctness interpretation of the name download_artifacts not including QueryExistingArtifacts is probably wrong. Query+Replace of Artifacts in the pipeline is not the same as downloading. So I see what you are saying.

My concern with adding it is the extra runtime cost; from our testing AIUI that stage adds a non-trivial amount of extra time to the overall runtime from the DB querying when processing say 70K Artifacts.

I think we should either do 1 of a few things.

a) Add back QueryExistingArtifacts and accept the runtime outcome
b) s/download_artifacts/handle_artifacts/ so that the name is more correct.
c) another idea

Copy link

@gmbnomis gmbnomis Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to find out later that downloading a remote artifact is not necessary? (e.g. when the artifact is requested; I don't know the full picture) Then, not replacing artifacts here would not hurt too much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning toward leaving it the way it is. When artifacts are not downloaded they should not be query/replaced in the pending-artifact either. Mainly because the replacement isn't deterministic (may or may not be replaced) so logic further down the pipeline should not rely on it. Leaving QueryExistingArtifacts in the pipeline without the downloading stage seems to add overhead without adding any value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the renaming (b), I'm not convinced it add clarity. I'm not a fan of names that include "handle" because it's such a vague term.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the handle name is bad. What about include_artifact_stages=True and then it would include/disinclude all 3 of them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the core, or a plugin includes an artifact related stage (in the future) that needs to be left in the pipeline even when artifacts are not downloaded? This naming assumes that will never happen.

The name download_artifacts is toggling a specific behavior that has the side-effect of omitting some of the stages. The renaming to include_artifact_stages changes the toggled behavior to something that has the side-effect of toggling downloading which seems less clear. I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'm leaving the 3 stages being omitted and the name as is.

@@ -11,7 +11,7 @@

class DeclarativeVersion:

def __init__(self, first_stage, repository, mirror=True):
def __init__(self, first_stage, repository, mirror=True, download_artifacts=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 passing (boolean) download_artifacts instead of sync policy name.

@bmbouter bmbouter added the Issue label Nov 30, 2018
@pulp pulp deleted a comment from bmbouter Nov 30, 2018
Copy link
Contributor

@jortel jortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @bmbouter.

@bmbouter
Copy link
Member Author

@jortel I pushed those changes.

Required PR: pulp/pulp_file#132

This includes a serializer that validates input correctly.

This also switches the finalize, write_data, and handle_headers
callbacks to be coroutines themselves. This is required because within
those callbacks the streamer needs to await on coroutines and that can
only be done from withing a coroutine itself.

This also adjusts the Remote.get_downloader() method signature to take
`url` or `remote_artifact` as kwargs instead of `url` as a positional
parameter. This allows plugin writers (and the streamer) to get
downloaders configured with digest and size validation automatically
from remote_artifacts.

https://pulp.plan.io/issues/3763
closes pulp#3763
@bmbouter bmbouter merged commit e4488d0 into pulp:master Nov 30, 2018
@bmbouter bmbouter deleted the add-policy-attribute branch November 30, 2018 20:52
daviddavis pushed a commit to daviddavis/pulp_file that referenced this pull request May 14, 2019
Required PR: pulp/pulp#3738

pulp_file now meaningfully uses the 'policy' attribute when syncing
content.

This is meant to be used with this PR:
pulp/pulp#3738

https://pulp.plan.io/issues/4160
closes #4160
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants