Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

Introducing asyncio stages #3559

Merged
merged 37 commits into from Aug 3, 2018
Merged

Conversation

bmbouter
Copy link
Member

@bmbouter bmbouter commented Jul 23, 2018

This PR pairs with pulp_file changes here: pulp/pulp_file#102

I use the test script below. It resets my Pulp3 installation, sync pulp_file, upload another file, associate with the repo, and then resync again with mirror='true'.

It's expected to make 3 versions, and Pulp should have 4 content units in it.
v1 - 3 content units matching the remote pulp_file repo
v2 - 4 content units, included the newly uploaded one
v3 - 3 content units, after the sync with mirror='true'. This verifies that mirror is working.

set -v 

# Reset everything

read -p "Stop the webserver, then press enter"

sudo systemctl stop pulp_resource_manager
sudo systemctl stop pulp_worker@1
sudo systemctl stop pulp_worker@2

pulp-manager reset_db --noinput
pulp-manager migrate
pulp-manager reset-admin-password --password admin
rm -rf /var/lib/pulp/artifact/*

sudo systemctl start pulp_resource_manager
sudo systemctl start pulp_worker@1
sudo systemctl start pulp_worker@2

read -p "Start the webserver, then press enter"

# Create a repository, 'foo'

http POST http://localhost:8000/pulp/api/v3/repositories/ name=foo

export REPO_HREF=$(http :8000/pulp/api/v3/repositories/ | jq -r '.results[] | select(.name == "foo") | ._href')

# Create a remote, 'bar'

http POST http://localhost:8000/pulp/api/v3/remotes/file/ name='bar' url='https://repos.fedorapeople.org/pulp/pulp/demo_repos/test_file_repo/PULP_MANIFEST'

export REMOTE_HREF=$(http :8000/pulp/api/v3/remotes/file/ | jq -r '.results[] | select(.name == "bar") | ._href')

# Sync 'foo' using 'bar'

http POST $REMOTE_HREF'sync/' repository=$REPO_HREF

# Upload an artifact

http --form POST http://localhost:8000/pulp/api/v3/artifacts/ file@/home/vagrant/devel/foo.tar.gz

export ARTIFACT_HREF=$(http http://localhost:8000/pulp/api/v3/artifacts/?sha256=2a123bec2a2e1df9cd8705861f61cb6df5453f7eaab424f1984658152211e4a5 | jq -r '.results[] | ._href')

http POST http://localhost:8000/pulp/api/v3/content/file/files/ relative_path=foo.tar.gz artifact="$ARTIFACT_HREF"

export CONTENT_HREF=$(http :8000/pulp/api/v3/content/file/files/ | jq -r '.results[] | select(.relative_path == "foo.tar.gz") | ._href')

http POST $REPO_HREF'versions/' add_content_units:="[\"$CONTENT_HREF\"]"

# Sync 'foo' using 'bar'

http POST $REMOTE_HREF'sync/' repository=$REPO_HREF

@pep8speaks
Copy link

pep8speaks commented Jul 23, 2018

Hello @bmbouter! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 03, 2018 at 20:21 Hours UTC

@bmbouter bmbouter force-pushed the introducing-asyncio-stages branch 2 times, most recently from bb1c765 to 07739d5 Compare July 23, 2018 20:13
@codecov
Copy link

codecov bot commented Jul 23, 2018

Codecov Report

Merging #3559 into master will decrease coverage by 0.19%.
The diff coverage is 14.28%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #3559     +/-   ##
=========================================
- Coverage    58.2%   58.01%   -0.2%     
=========================================
  Files          59       60      +1     
  Lines        2467     2477     +10     
=========================================
+ Hits         1436     1437      +1     
- Misses       1031     1040      +9
Impacted Files Coverage Δ
__init__.py 0% <ø> (ø)
pulpcore/pulpcore/app/models/progress.py 51.78% <0%> (-2.94%) ⬇️
pulpcore/pulpcore/app/models/content.py 68.57% <50%> (-3.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1fe4ca0...631031e. Read the comment docs.

@@ -138,6 +138,15 @@ def natural_key(self):
"""
return tuple(getattr(self, f) for f in self.natural_key_fields())

def natural_key_dict(self):
Copy link
Contributor

@jortel jortel Jul 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be simplified to:

return {f: getattr(self, f) for f in self.natural_key_fields()}

This is simple enough for the caller to just do. I don't think this (1-line) method adds sufficient value to be added.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use case I think about with this is plugin writers wanting to query for a unit they have in memory using filter() or Q(). There are two places I needed to do this for in DeclarativeVersion. I wanted to state it's purpose and see what you thought.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the use case I'd imaged as well so my opinion hasn't changed.

@bmbouter bmbouter force-pushed the introducing-asyncio-stages branch 3 times, most recently from 4b9d312 to 95c9bda Compare July 25, 2018 19:13
the `artifact` attributes may be incomplete because not all digest information can be computed
until the Artifact is downloaded.

Attributes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing the generated docs to not call these parameters, but variables instead. It renders weird:
declarativeartifactdocs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fixing this in the next push by using Sphinx's :no_members:


>>> artifact_downloader(max_concurrent_downloads=42).stage # This is the real stage

in_q data type: A `~pulpcore.plugin.stages.DeclarativeContent` with potentially files missing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The references to other docs are not resolving correctly in the built docs.
artifact_downloader_docs

Copy link
Member Author

@bmbouter bmbouter Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This problem is all over the docs in this PR. I'm going through and fixing all of them with the next push.

Copy link
Member

@dkliban dkliban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

if not content:
raise ValueError(_("DeclarativeContent must have a 'content'"))
if d_artifacts:
self.d_artifacts = d_artifacts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest:

self.d_artifacts = d_artifacts or []

would be simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool will include in next push.

and defaults to 100.

Returns:
A single coroutine that can be used to run, wait, or cancel the entire pipeline with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should document it returns an asyncio.Future. (I think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point it's only returning a coroutine. I looked into sphinx's support for indicating something is a coroutine, but they aren't there yet unfortunately. sphinx-doc/sphinx#4777

@jortel
Copy link
Contributor

jortel commented Aug 1, 2018

@bmbouter, Review finished. A few of the comments ended up being repetitive, sorry.

Copy link
Contributor

@jortel jortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the comments are not merge blockers but would improve code quality. However, a few comments regarding API and correctness need to be addressed.

@bmbouter
Copy link
Member Author

bmbouter commented Aug 2, 2018

@dralley @jortel Now that all stages inherit from BaseStage I think FirstStage should be deleted, and I'll refactor the docs to have plugin writer's make their first stage directly from BaseStage. Sound good?

@bmbouter
Copy link
Member Author

bmbouter commented Aug 2, 2018

@jortel This PR still needs 3 things I'll do tomorrow morning:

  1. remove FirstStage and use BaseStage instead
  2. Handle the artifact collision concern
  3. Switch default_storage() to DefaultStorage()

I'll ping you when it's ready for final review, Friday AM.

await asyncio.gather(*futures)


class EndStage(BaseStage):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big 👍 for making the stages objects! This will unlock a lot of potential.

At the risk of being pedantic :(

Why not just Stage?

The rare cases I see the Base prefix, it strikes me as odd. The class name is a classification of objects and should read naturally using the IsA lexicon.

For example, the natural way to describe an QueryExistingArtifacts is to say:

"A QueryExistingArtifacts` IsA Stage".

not

"A QueryExistingArtifacts` IsA BaseStage".

I don't think designating this as a base class though naming adds value. This isn't a blocker but, IMHO, it would be more correct in the classic object modeling sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think s/BaseStage/Stage/ is a great change. I'll make it on my next push along w/ the last remaining fixes.

from .content_unit_stages import ContentUnitSaver, QueryExistingContentUnits


class FirstStage:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is FirstStage still needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, but I was too tired to remove it last night. It'll be removed in next push.

run. Default is 100.
"""

def __init__(self, max_concurrent_downloads=100, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the *args, **kwargs for?

If needed, please docstring what can be passed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not formally spelled out in Python, but *args and **kwargs should be accepted on all init() calls and used when calling super().init(*args, **kwargs). Otherwise the parent object can never receive its parameters through the subclass' init().

For places where *args and **kwargs are used, I'm going to add a docstring that it takes *args and **kwargs and it will say unused params passed on to Stage to stay DRY w/ the docstrings.


def __init__(self, max_concurrent_downloads=100, *args, **kwargs):
self.max_concurrent_downloads = max_concurrent_downloads
super().__init__(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better practice to initialize objects in heirarchical order. Please call super().__init__() (first) before setting extended attributes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do in next push

for unit in self.new_version.content.all():
unit = unit.cast()
self.unit_keys_by_type[type(unit)].add(unit.natural_key())
super().__init__(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same re: calling super as: here
Same re: args, kwargs as: here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup will fix in next push.


def __init__(self, new_version, *args, **kwargs):
self.new_version = new_version
super().__init__(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same re: calling super as: here
Same re: args, kwargs as: here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup will fix in next push.

'remote': declarative_artifact.remote,
}
rel_path = content_artifact.relative_path
remote_artifact_map[rel_path] = remote_artifact_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still correlated by rel_path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup this and the storage backend both are being fixed in next push.

* switch storage backend to DefaultStorage()
* super() is now called first in anytime it's used
* FirstStage is removed
* BaseStage -> Stage
* the relative path now handles duplicate rel_paths correctly
* documenting args and kwargs

I did some final hand testing, and I also ensure the docs build and look
good.

https://pulp.plan.io/issues/3844
re pulp#3844
)
declarative_artifact.artifact = new_artifact
to_download_count = to_download_count + 1
pb.done = pb.done + to_download_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI: using += would be more concise.

'remote': declarative_artifact.remote,
}
content_pk = content_artifact.content.pk
remote_artifact_map[content_pk] = remote_artifact_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After switching to integer PK, the content_artifact.content.pk will be (None) because the model has not yet been saved. Right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is rebased onto the integer PK already so it uses them currently. The pk is set from L#151.

I verified I cannot use the models themselves because they are unsaved and Django raises an unhashable error. We could use sha512 instead. The only reason I didn't is to save memory. Should I switch it to that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm going to switch it.

Without this commit, an exception will occur when one content unit has
two or more Artifacts are associated with it.

https://pulp.plan.io/issues/3844
re pulp#3844
@bmbouter
Copy link
Member Author

bmbouter commented Aug 3, 2018

Thank you @jortel , @dkliban, @dralley , and @gmbnomis for all the collaboration and help with this! 💯 I really think the plugin writers are going to get a big benefit from it.

I'm merging it and https://github.com/pulp/pulp_file/pull/102/files

@bmbouter bmbouter merged commit 890ceba into pulp:master Aug 3, 2018
@bmbouter bmbouter deleted the introducing-asyncio-stages branch August 3, 2018 20:38
daviddavis pushed a commit to daviddavis/pulp_file that referenced this pull request May 14, 2019
daviddavis pushed a commit to daviddavis/pulp_file that referenced this pull request May 14, 2019
daviddavis pushed a commit to daviddavis/pulp_file that referenced this pull request May 14, 2019
daviddavis pushed a commit to daviddavis/pulp_file that referenced this pull request May 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
5 participants