Skip to content
This repository has been archived by the owner on Oct 28, 2019. It is now read-only.

Commit

Permalink
Document stages concept
Browse files Browse the repository at this point in the history
  • Loading branch information
Matthias Dellweg committed Mar 19, 2019
1 parent 25bb749 commit 7463677
Show file tree
Hide file tree
Showing 5 changed files with 112 additions and 16 deletions.
2 changes: 1 addition & 1 deletion docs/api-reference/stages.rst
@@ -1,4 +1,4 @@
.. _stages-docs:
.. _stages-api-docs:

pulpcore.plugin.stages
======================
Expand Down
8 changes: 8 additions & 0 deletions docs/plugin-writer/concepts/index.rst
Expand Up @@ -64,3 +64,11 @@ Tasks are deployed from Views or Viewsets, please see :ref:`kick-off-tasks`.
tasks/add-remove
tasks/publish
tasks/export

Sync Pipeline
-------------

.. toctree::
:maxdepth: 2

sync_pipeline/sync_pipeline
67 changes: 67 additions & 0 deletions docs/plugin-writer/concepts/sync_pipeline/sync_pipeline.rst
@@ -0,0 +1,67 @@
.. _stages-concept-docs:

Synchronizing Repositories with the async-Pipeline
==================================================

To accomplish the steps outlined in :ref:`sync-docs` in an efficient way, pulp provides a high
level api to construct a pipeline of stages. Those stages work in parallel like an assembly line
using pythons `async` feature in combination with the `asyncio` library. Each stage takes
designated content units from an incoming queue of type :class:`asyncio.Queue` and performes an
individual task on them before passing them to the outgoing queue that is connected to the next
stage.

The anathomy of a stage is that it inherits :class:`pulpcore.plugin.stages.Stage` and overwrites
its asynchronous callback :meth:`run`.
In :meth:`run` it can retrieve incoming declarative content individually via the asynchronous
iterator :meth:`self.items` or in batches via :meth:`self.batches`.
It can pass on declarative content with :meth:`self.put`.

The sync pipeline is headed by a `first_stage`, that is supposed to download upstream metadata
and iterate over all upstream content references. For each such reference, it creates a
:class:`pulpcore.plugin.stages.DeclarativeContent` that contains a prefilled but unsaved instance
of a subclass of :class:`pulpcore.plugin.content.Content`, as well as a list of
:class:`pulpcore.plugin.stages.DeclarativeArtifact`. The latter combine an unsaved instance of
:class:`pulpcore.plugin.content.Artifact` with a url to retrieve it.
The :class:`pulpcore.plugin.stages.DeclarativeContent` objects, that describe, what a content will
look like when properly downloaded and saved to the database, are passed one by one to the next
pipeline stage.
The responsibility of providing this `first_stage` lies completely in the plugins domain, since
this is the part of the pipeline specific to the repository type.

The pulp plugin api provides the following stages which also comprise the default pipeline in the
following order:

1. :class:`pulpcore.plugin.stages.QueryExistingContents`
2. :class:`pulpcore.plugin.stages.QueryExistingArtifacts`
3. :class:`pulpcore.plugin.stages.ArtifactDownloader`
4. :class:`pulpcore.plugin.stages.ArtifactSaver`
5. :class:`pulpcore.plugin.stages.ContentSaver`
6. :class:`pulpcore.plugin.stages.RemoveDuplicates`
7. :class:`pulpcore.plugin.stages.ArtifactSaver`
8. :class:`pulpcore.plugin.stages.ResolveContentFutures`

Lazy synchronizing
------------------

See :ref:`lazy-support`.

.. _multi-level-discovery:

Multiple level discovery
------------------------

Plugins like `pulp_deb` and `pulp_docker` use content artifacts to enumerate more content.
To support this pattern, the declarative content allows to be associated with a
:class:`asyncio.Future`, that is resolved when the content reaches the
:class:`pulpcore.plugin.stages.ResolveContentFutures` stage.
By awaiting this Future, one can implement an informational back loop into earlier stages.
.. warning::

In order to prevent deadlocks, be sure that you mark the declarative content with
`does_batch=False`, and that you do not drop it without resolving the future.

.. hint::

If you need downloaded artifacts of this content for further discovery, make sure to
provide `deferred_download=False` to the
:class:`pulpcore.plugin.stages.DeclarativeArtifact`.
4 changes: 3 additions & 1 deletion docs/plugin-writer/concepts/tasks/add-remove.rst
Expand Up @@ -30,12 +30,14 @@ working directory setup, and database cleanup after encountering failures.
Every action that creates a new RepositoryVersion *must* be asynchronous (defined as a task).
Task reservations are necessary to prevent race conditions.

.. _sync-docs:

Synchronizing
-------------

.. tip::

Please consider using the high level :ref:`stages-docs` for actual implementations.
Please consider using the high level :ref:`stages-concept-docs` for actual implementations.

Most plugins will define a synchronize task, which fetches content from a remote repository, and
adds it to a Pulp repository.
Expand Down
47 changes: 33 additions & 14 deletions docs/reference/lazy-support.rst
Expand Up @@ -8,30 +8,40 @@ downloading their associated Artifacts. By convention, users expect the `Remote.
determine when Artifacts will be downloaded. See the user docs for specifics on the user
expectations there.

.. _lazy-support-with-da:

Adding Support when using DeclarativeVersion
============================================

.. warning::

This section is outdated.

Plugins like `pulp-file` sync content using `DeclarativeVersion`, which takes an option called
`download_artifacts` which defaults to `True`. Lazy support can be added by specifying
`download_artifacts=False`.
Plugins like `pulp-file` sync content using `DeclarativeVersion`.
Lazy support can be added by specifying `deferred_download=True` at instanciation of
:class:`pulpcore.plugin.stages.DeclarativeArtifact`.

`Remote.policy` can take several values. To easily translate them, consider a snippet like this one
taken from `pulp-file`.::

download = (remote.policy == Remote.IMMEDIATE) # Interpret policy to download Artifacts or not
dv = DeclarativeVersion(first_stage, repository, mirror=mirror, download_artifacts=download)
async def run(self):
# Interpret download policy
deferred_download = (self.remote.policy != Remote.IMMEDIATE)
<...>
da = DeclarativeArtifact(
artifact=artifact,
url=url,
relative_path=relative_path,
remote=self.remote,
deferred_download=deferred_download,
)
<...>

.. hint::

Adding Support when using a Custom Stages API Pipeline
======================================================
The `deferred_download` flag is used at the artifact level, to support lazy concepts for
plugins that need some artifacts to download immediately in all cases.
See also :ref:`multi-level-discovery`.

.. warning::

This section is outdated.
Adding Support when using a Custom Stages API Pipeline
======================================================

Plugins like `pulp-rpm` that sync content using a custom pipeline can enable lazy support by
excluding the `QueryExistingArtifacts`, `ArtifactDownloader` and `ArtifactSaver` stages. Without
Expand All @@ -47,6 +57,15 @@ this one inspired by `pulp-rpm`::
stages.extend([QueryExistingArtifacts(), ArtifactDownloader(), ArtifactSaver()])
stages.extend(the_rest_of_the_pipeline) # This adds the Content and Association Stages

.. warning::

Skipping of those Stages does not work with :ref:`multi-level-discovery`.
If you need some artifacts downloaded anyway, follow the example on
:ref:lazy-support-with-dv` and include the artifact stages in the custom pipeline.

.. hint::

Consider to also exclude the `ResolveContentFutures` stage.

What if the Custom Pipeline Needs Artifact Downloading?
=======================================================
Expand Down Expand Up @@ -84,7 +103,7 @@ specific Artifact path, which actually matches against a `ContentArtifact` not a
`Artifact` exists, it is served to the client. Otherwise a `RemoteArtifact` allows the `Artifact` to
be downloaded on-demand and served to the client.

If `Remote.policy == Remote.ON_DEMAND` the Artifact is saved on the first download. This causes
If `remote.policy == Remote.ON_DEMAND` the Artifact is saved on the first download. This causes
future requests to serve the already-downloaded and validated Artifact.

.. note::
Expand Down

0 comments on commit 7463677

Please sign in to comment.