This repository has been archived by the owner on Dec 7, 2022. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
This proposal attempts to provide tooling intended to reduce the effort required for plugin writers to contribute a plugin.
It focus narrowly on:
The proposed solution is layered.
Layer 1
Concurrent Downloading.
Plugin writers are free to use any lib they want to download files. However, I don't know of any that support concurrency and validation. The proposed solution is an abstract
Download(noun) object that can also be executed concurrently within aBatch. A basicHttpDownloadis provided. TheDownloadhierarchy is intended to be extended to support additional protocols such asFTP or BitTorrent in the future and work nicely with theBatch. A basic file size and checksum validation is included. Additional validations can be supplied by both platform and plugin writers.Example:
Layer 2
Content Changes To A Repository.
The goal is for the plugin writer to only be concerned with determining what content needs to be added/deleted from a repository. Once determined, a generator of
RemoteContentto be added and a generator ofContentto be removed is used to create aChangeSet. The term remote content refers to content that exists in the remote repository but does not yet exist locally (in pulp). TheChangeSetis then applied to the repository. The result is a generator ofChangeReport. Each report contains an action (ADDED | DELETED) and the content. The plugin can iterate these reports to determine overall success for failure.The
ChangeSetis capable of detecting theImporterdownload policy and act accordingly. When content downloading is deferred, catalog entries are created instead of downloading artifact files. Concurrent downloading is handled by theChangeSetusing the layer1downloadlib.The plugin writer does not need to be concerned with whether a content (unit) already exists (or not). The
ChangeSetdetermines this and acts appropriately.The concept of
RemoteContentandRemoteArtifactis introduced. The term remote is used because it describes content and artifacts that exist in the remote repository but does not exist in the local Pulp repository.Example:
Discussion
The deferred (Lazy) catalog cannot contain everything needed to create a Download object.
The thing that's missing is logic. It may require a specialized
Downloadclasswith special error handling that knows to get an authentication token (or something).
Here are a few options:
Importer.get_artifact_download().The base class implementation would be sufficient for most cases and would rarely need
to be overridden by a plugin writer. I'm leaning this way.
DownloadBuilder.Both options make it the responsibility of the plugin writer but #2 may keep the Importer API cleaner.
FAQ
Question: Is layer1 just another Nectar?
Answer: Sort of but no. I really hope not. It is far simpler and delegates all of the
heavy lifting to requests and concurrent.futures.
Question: Why have layer1 at all? Wny not just use requests?
Answer: Plugin writers are free to use whatever they want. However, requests does not provide
concurrent downloading. Also, requests is only HTTP. Since downloading needs to be part of the
plugin API, we need a formal way for plugins to participate. Supporting The Streamer is one example.
Question: Is the ChangeSet another
Stepframework?Answer: No. Unlike the step framework, the ChangeSet provides support for a very specific
part of the importer synchronization workflow.
Question: Does the ChangeSet do progress reporting?
Answer: Yes.
Question: How is the memory footprint constrained?
Answer: Every component is designed to work-with or be a generator. Even a component
using a
Queuewill restrict the queue size.Question: For plugins that need to import one kind of content to determine additional content
that needs to be imported, how would this work?
Answer: The plugin could use (or chain) multiple
ChangeSets. The plugin can iterate the reportsyielded by the ChangeSet.apply() to determine the content that needs to be added using a subsequent ChangeSet.
Question: For docker, downloads need to get and share auth token. How would this work?
Answer: The
Downloadhas a dictionary (context) for sharing information. TheBatchensuresthat all downloads share the same context. The first docker request that detects that it needs
a new token can obtain it and put it in the context to be shared with other downloads. The docker
plugin will need to use a subclass of
HttpDownloadwith a customon_error(). But should onlyrequire a few lines of code. This is why the plugin is responsible for providing the download object.
Question: How are fatal errors handled within any of this machinery?
Answer: Regardless of which thread or component occurs, an exception is propagated and
raised to the caller.
Question: What if my plugin is downloading with something other than regular HTTP?
Answer: The
Downloadis a callable and can be implemented to support any protocol.The
Batchwill provided batched, concurrent downloading for any kind ofDownload.Question What if the plugin needs to download the file before creating the
Content.For example: In RPM - all RPMs are stored using a SHA256 as part of the unit key even when the
metadata only includes SHA1. This would mean that the
ChangeSetwould need to download the file (rpm) and generate the SHA256 before creating the DB records. Or, provide a way for the plugin to participate in this flow. Right?Answer Well, this may be flawed to begin with. What about deferred (Lazy) downloading.
When the download policy is deferred the file will not be available to create the SHA256.
I think we need to challenge the practice of normalizing the checksum to SHA256 to start with.
Question How would mirror lists be handled?
Answer A subclass of HttpDownload would be provided by the plugin. It would implement resolving the list of mirrors and stash it in the context to be shared with other downloads involved in the same batch download. The last mirror used could also be stored in the context to support round-robin.
This includes a sanity test but eventually needs some unit tests and some combination of functional and perhaps smash tests. The sanity test will not be included in the final PR as-is.
TODO
ChangeSetonly has logging roughed in.