Add always_download to DeclarativeArtifact #50
Conversation
a2471d6
to
b9e5998
Compare
Codecov Report
@@ Coverage Diff @@
## master #50 +/- ##
==========================================
+ Coverage 42.53% 42.85% +0.32%
==========================================
Files 22 22
Lines 710 707 -3
==========================================
+ Hits 302 303 +1
+ Misses 408 404 -4
Continue to review full report at Codecov.
|
60d8943
to
a923477
Compare
| args: unused positional arguments passed along to :class:`~pulpcore.plugin.stages.Stage`. | ||
| kwargs: unused keyword arguments passed along to :class:`~pulpcore.plugin.stages.Stage`. | ||
| """ | ||
|
|
||
| def __init__(self, max_concurrent_content=200, *args, **kwargs): | ||
| def __init__(self, max_concurrent_content=200, download_artifacts=True, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this is useful, but this seems a bit confusing: ArtifactDownloader(download_artifacts=False). We could avoid this by the plugin writers setting it on all versus some of the DeclarativeContent objects? I think that is almost as easy and would make this all less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attribute would be need to be renamed for this change to make sense. Maybe da.no_download and those are just no-opped by these stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think, it were less confusing if it was named lazy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That name would be clearer. I'm concerned though that we have two mechanisms when we can do it almost as easily with one. A plugin writer who has the "mixed content/metadata" needs and wants to add lazy will have to mark some portion of the DeclarativeArtifacts attributes but not others. Having to additionally specify that functionality should be "disabled" seems like more work and more to maintain when it could be not set by the plugin writer like they did before adding "lazy" support.
Does this concern make sense or maybe I'm misunderstanding the need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more about the easy plugin with no special needs.
But you are right, that having two mechanisms, when one could do seems odd.
How should the field on d_artifact be named? lazy, no_download, ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to no_download. I think it's the most self-descriptive name. Things that can't be downloaded can't be saved, so this captures the intent the most clearly to me.
As an aside on the "easy plugin" case, they have some other options already. They can either use DeclarativeVersion with the existing download_artifacts parameter, or a custom pipeline that omits the Artifact handling stages. I documented those options here https://docs.pulpproject.org/en/pulpcore-plugin/nightly/reference/lazy-support.html#adding-support-when-using-declarativeversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to only have one way to do this:
Always include the artifact_stages and rely on the no_download member. So i am going to remove the download_artifacts option from the DeclarativeVersion.
Otherwise we would have the the confusion, what download_artifacts combined with no_download means.
Another option were to rename download_artifacts to include_artifact_stages, but i think, this is not instructive to a user without broad understanding of the stages.
a923477
to
776b505
Compare
|
@bmbouter I now have reduced it to only having a |
|
@mdellweg let me try to share some info on the design considerations that brought us here. With all the info on the table we can make a good decision together. Consider the performance implications for plugins that would include Artifact stages that don't need them. I haven't measured it in practice, but architecturally having three stages in pipelines where they aren't needed will perform slower. My feeling with pipelines is that they are highly re-combinable and probably shouldn't be the same plugin-to-plugin. I believe a design will only work for everyone if it is performing the best for everyone.** I think the majority of plugins are not in the situation that Docker, Debian, and Maven are in, i.e. they want to download some but not all Artifacts during lazy. So having all plugins include these stages when many of them won't need them during lazy sync doesn't seem the best. What do you think about these considerations and how they inform this design? Please make a case for the design you think is the best. **As a short aside on performance and the "one-pipeline" approach, overall I try to avoid subjective choices about which plugins we should optimize the pipeline for. I want to avoid global designs that optimize for some plugins but sub-optimize for others. That is a non-pareto efficient solution. The beauty of the pipeline architecture is that each plugin can use the pipeline that is right for them, creating a pareto efficient solution by avoiding a single global optimization. |
edef5af
to
f182905
Compare
|
@bmbouter |
576797e
to
10053f6
Compare
|
@mdellweg I'm concerned about the feature flipping when we don't necessarily need it. It seems simpler to leave the checking of |
|
@bmbouter Let me try to explain, why i prefer this version: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one nitpicky var name suggestion, but on the whole I think this is the right pattern.
Flags vs Leaving out stages
Simplicity
The complexity of leaving out stages is that a plugin writer is essentially operating on the entire pipeline as a single entity. Instead, the flag design lets the plugins practically ignore lazy-- they create declarative content and artifacts with 1 extra piece of useful metadata. Those declarative artifacts are handled differently by a core stage, which plugin writers don't need to worry about because the effect of the variables is obvious. The key point is that there is only needs to be 1 pipeline per sync task, which means the plugin writer does not need an in depth understanding of all of the core stages.
Plugin similarity
Another major advantage to this approach is that all the plugins will be able to use it. Though some plugins might work just fine by leaving certain stages out it would force us to figure out an entirely different pattern for deb, docker, maven, npm, etc. We should make 1 interface if at all possible.
Performance
I don't think this will have very much of an impact on performance because the affected stages will skip the object without executing anything expensive. Clearly, they will operate a little slower, but I'd bet that the database will still be the bottleneck and therefore these no-ops will not cause a noticeable slowdown.
docs/reference/lazy-support.rst
Outdated
| these stages included, no Artifact downloading will occur. Content unit saving will occur, which | ||
| will correctly create the lazy content units. | ||
| passing the `sync_lazy` parameter to the `ArtifactDownloader` and `ArtifactSaver` stages. | ||
| They will then function selectively on the `support_lazy` flag of `DeclarativeArtifact`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a naming suggestion that I think would be a little more obvious. (Current naming is pretty good though)
s/sync_lazy/defer_downloads/
s/support_lazy/deferrable/
I think this is just a little bit more explicit because it indicates what will happen. ie, if defer_downloads and deferrable are both true, that download will be deferred.
|
Let's make this conversation a bit simpler by removing one of the areas of debate. I'm ok to include stages even when they are not needed. This should allow us all to talk about the user experience and not worry about the performance stuff yet. Including those stages does make it easy to talk about pipelines in a consistent way and custom pipelines can still be built if plugin writers really want better performance. Now that all stages can be included w/ the default pipeline, I want to make a case that the data of the DeclarativeArtifact should drive the pipeline for simplicity and we should avoid "behavior by configuration" aka 'modes'. I don't think feature flipping behaviors of individual stages is helping our user experience; it's making things more confusing. It would be simpler for the Artifact stages respect As an architectural point, if you want a stage to have a significantly different behavior, my belief is that we should have a different stage with a different implementation. The pipeline is a form of functional programming. It's defined by what you include, not how you are procedurally configuring each thing. Note that I'm talking about configuring individual stages not about a factory like DeclarativeVersion which does configure the pipeline with pipeline-wide options. Still, the Also look at the many other features in the pipeline. Notice how we didn't take a 'modes' approach with the "futures resolution" feature. Plugin writer's either use it or they don't. It's always there if they need it. They don't have to 'turn it on'. We should do the same here with |
|
As I read it, @bmbouter's most important point is:
However, either solution violates this concept. The 2 var solution ( From the docs, "The plugin writer needs to specify a first_stage that will create a DeclarativeContent object for each Content unit that should exist in the new RepositoryVersion." IMO, the responsibility of the first stage is to download and parse metadata and should explicitly not be making any decisions about how the rest of the pipeline should behave. (ie, the FirstStage should not need to care whether the sync is lazy or not) The DeclarativeVersion is the object that is responsible for creating the pipeline to act on the declarative content, and I would expect to tell that object to behave differently for alternative scenarios. (We already configure the declarative version to behave differently whether the sync is mirror or not.) This puts the code branching where the differences matter, (ArtifactDownloader and ArtifactSaver) and abstracts that branching so the plugin writer shouldn't need to worry about it. To sum up, FirstStage is "what to sync" the DeclarativeVersion takes care of "how to sync". |
|
So this means we would need two different implementations of ArtifactDownloader? One being sepecific and one downloading everything? |
|
Well my point was not to make two, but more that we should bake this behavioral aspect directly into ArtifactDownloader so that it's always there. Then when (for any reason lazy or otherwise) a plugin writer needs to have an Artifact not downloaded but have its RemoteArtifacts created it can set |
10053f6
to
2a4475b
Compare
|
@mdellweg I can come through and update the docs in a separate piece of work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
This doesn't have to be a breaking change if we don't remove the download_artifact aspect to |
|
It feels like not going all the way when leaving the |
2a4475b
to
a691387
Compare
|
@mdellweg I'll go on to make another PR fixing the docs. Would you be willing to update https://pulp.plan.io/issues/4209 with our final solution? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@bmbouter Seems, i cannot edit the Message body. Only add a comment. So feel free to copy the text upstairs and adjust it. Thanks for all the discussions. |
|
I checked the permissions and I think they are there. I can handle this one. Redmine is confusing in this area though. It's kind of silly. After clicking edit you have to click 'edit' again to show the original body to edit. |
|
@mdellweg ty for leaving this comment https://pulp.plan.io/issues/4209#note-10 |
Required PR: pulp/pulp_file#177
https://pulp.plan.io/issues/4209
re #4209