Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add repository_version param as a building context #1687

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

git-hyagi
Copy link
Contributor

closes: #479

Comment on lines 52 to 54
Non-staff users, lacking read access to the `artifacts` endpoint, may encounter restricted
functionality as they are prohibited from listing artifacts uploaded to Pulp and utilizing
them within the build process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality will no longer be limited by the access to the artifacts endpoint.

Comment on lines 66 to 51
TASK_HREF=$(http --form POST :$REPO_HREF'build_image/' -F "containerfile@./Containerfile" \
repo_version=$REPO_VERSION | jq -r '.task')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TASK_HREF=$(http --form POST :$REPO_HREF'build_image/' -F "containerfile@./Containerfile" \
repo_version=$REPO_VERSION | jq -r '.task')
TASK_HREF=$(http --form POST :$REPO_HREF'build_image/' "containerfile@./Containerfile" \
repo_version=$REPO_VERSION | jq -r '.task')

@@ -696,6 +696,7 @@ class ContainerRepositoryViewSet(
"condition": [
"has_model_or_obj_perms:container.build_image_containerrepository",
"has_model_or_obj_perms:container.view_containerrepository",
"has_repository_model_or_obj_perms:file.view_filerepository",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this line, I am receiving an HTTP 500 error.

[pulp]  | pulp [d8943394f02c4759b3deac8f518d2218]: django.request:ERROR: Internal Server Error: /pulp/api/v3/repositories/container/container/01907884-8984-7905-bc76-58b8c747c347/build_image/
[pulp]  | Traceback (most recent call last):
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/django/core/handlers/exception.py", line 55, in inner
[pulp]  |     response = get_response(request)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/django/core/handlers/base.py", line 197, in _get_response
[pulp]  |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
[pulp]  |     return view_func(*args, **kwargs)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_framework/viewsets.py", line 124, in view
[pulp]  |     return self.dispatch(request, *args, **kwargs)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 509, in dispatch
[pulp]  |     response = self.handle_exception(exc)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 469, in handle_exception
[pulp]  |     self.raise_uncaught_exception(exc)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
[pulp]  |     raise exc
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 497, in dispatch
[pulp]  |     self.initial(request, *args, **kwargs)
[pulp]  |   File "/src/pulpcore/pulpcore/app/viewsets/base.py", line 300, in initial
[pulp]  |     super().initial(request, *args, **kwargs)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 415, in initial
[pulp]  |     self.check_permissions(request)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_framework/views.py", line 332, in check_permissions
[pulp]  |     if not permission.has_permission(request, self):
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_access_policy/access_policy.py", line 69, in has_permission
[pulp]  |     allowed = self._evaluate_statements(statements, request, view, action)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_access_policy/access_policy.py", line 113, in _evaluate_statements
[pulp]  |     matched = self._get_statements_matching_conditions(
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_access_policy/access_policy.py", line 262, in _get_statements_matching_conditions
[pulp]  |     passed = self._check_condition(condition, request, view, action)
[pulp]  |   File "/usr/local/lib/python3.9/site-packages/rest_access_policy/access_policy.py", line 286, in _check_condition
[pulp]  |     result = method(request, view, action, arg)
[pulp]  |   File "/src/pulpcore/pulpcore/app/global_access_conditions.py", line 407, in has_repository_model_or_obj_perms
[pulp]  |     return request.user.has_perm(permission) or has_repository_obj_perms(
[pulp]  |   File "/src/pulpcore/pulpcore/app/global_access_conditions.py", line 367, in has_repository_obj_perms
[pulp]  |     plugin_repository = Repository.objects.get(pk=view.kwargs["repository_pk"]).cast()
[pulp]  | KeyError: 'repository_pk'

Reproducer:

pulp user create --username test --password krokodyl1 --email test@test.com
pulp user role-assignment add --username test --role "container.containerdistribution_collaborator" --object ""
pulp user role-assignment add --username test --role "container.containerrepository_content_manager" --object ""

FILE_REPO=$(pulp file repository create --name foo --autopublish | jq -r '.pulp_href')
pulp file content upload --repository foo --file ./setup.py --relative-path setup.py

CONTAINER_REPO=$(pulp container repository create --name build | jq -r '.pulp_href') 

http -a test:krokodyl1 --form POST :5001${CONTAINER_REPO}build_image/ "containerfile@./Containerfile" repo_version=${FILE_REPO}versions/1/

I do not think we can inject the policy from other entities' contexts here (repo_version=FILE_HREF). The policy is available only for the current viewset scope (CONTAINER_REPO). I think we need to run permission checks manually. Similarly to

repositories_by_namespace = get_objects_for_user(
or create a customized policy condition, like in
def has_namespace_or_obj_perms(request, view, action, permission):
.

@mdellweg, any ideas on how to properly add permission checks for entities created from within a different plugin? In our case, we want to check if a user has permissions to read pulp_file repository and build an image inside pulp_container repository.

file_bindings.RepositoriesFileApi.modify(file_repo_with_auto_publish.pulp_href, data).task
)

build_response = container_repository_api.build_image(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a test case that triggers users' permissions checks before building an image?

with user_helpless, pytest.raises(APIException):
    container_repository_api.build_image()
    
with user_manager:
    container_repository_api.build_image()

...

Comment on lines 815 to 826
if "repo_version" in data:
version_href = data["repo_version"]
repo_version_artifacts = RepositoryVersion.objects.get(pk=version_href.pk).artifacts
files = FileContent.objects.filter(
digest__in=repo_version_artifacts.values("sha256")
).values("_artifacts__pk", "relative_path")
if len(files) == 0:
raise serializers.ValidationError(
_("No file found for the specified repository version.")
)
for file in files:
artifacts[str(file["_artifacts__pk"])] = file["relative_path"]
Copy link
Member

@lubosmj lubosmj Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image building is still considered to be a tech-preview. Have you considered changing the current API to better fit our needs? Instead of passing a list of artifacts to the task, we can try passing only the reference to the repository version, after verifying user's permissions. The task will then unpack the repository version and run the suggested queries ^. Running these queries while serializing input data may block api-app during the execution. We can outsource the evaluation to the tasking system freely.

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch 2 times, most recently from b947807 to 3bb3329 Compare July 10, 2024 13:50
Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I left a couple of small suggestions. Also, I would still ditch the artifacts parameter from the build endpoint. What do you think, @ipanova, should we keep it or not?

Comment on lines 137 to 144
repo_version_artifacts = RepositoryVersion.objects.get(pk=repo_version).artifacts
files = FileContent.objects.filter(
digest__in=repo_version_artifacts.values("sha256")
).values("_artifacts__pk", "relative_path")
if len(files) == 0:
raise ValidationError(_("No file found for the specified repository version."))
for file in files:
artifacts[str(file["_artifacts__pk"])] = file["relative_path"]
Copy link
Member

@lubosmj lubosmj Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we have the list of artifacts already available from RepositoryVersion.objects.get(pk=repo_version).artifacts. Have you considered iterating over this queryset instead of devising a new dictionary and then running another DB queries for getting the corresponding artifacts in the for loop below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I could not find the relative_path (that will be used in the Containerfile) of each file in the artifacts queryset. Is there a better way to find it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something along these lines:

for content in FileContent.objects.filter(pk__in=repository_version.content).prefetch_related("contentartifact_set"):
    content._artifacts.get(), content.relative_path
    ...

But, you will need to check whether prefetch_related or select_related is working in this scenario. And, whether content._artifacts.get() hits the database or not. The idea is to hit the database fewer times.


Sometimes, there is already implemented the same thing that you are trying accomplish. In this case, skimming through the pulpcore code might be helpful:

  1. https://github.com/pulp/pulpcore/blob/ac08d63534227c07e67e01ca8a0afdf97c038389/pulp_file/app/tasks/publishing.py#L67-L82
  2.  pulpcore/app/models/repository.py:                    content_batch_qs.prefetch_related("contentartifact_set")
     pulpcore/app/serializers/content.py:        Content.objects.prefetch_related("_artifacts").all()
    

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you may want to verify whether the artifacts are in place or not. A user could sync a repository with the on-demand policy and then use it as the build context for pulp-container. We should be careful about this.

Artifact.objects.filter(pk__in=artifacts.keys()).touch()
elif serializer.validated_data.get("repo_version"):
repo_version = serializer.validated_data["repo_version"]
self._check_file_repo_permission(request, repo_version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, we may want to declare a new entry inside the access policy, like we did for the repository viewset:

"action": ["sync"],
"principal": "authenticated",
"effect": "allow",
"condition": [
"has_model_or_obj_perms:container.sync_containerrepository",
"has_remote_param_model_or_obj_perms:container.view_containerremote",
"has_model_or_obj_perms:container.view_containerrepository",
],
},

Comment on lines 59 to 60
# workaround for drf-spectacular not allowing artifacts=None and raising an exception
# (a bytes-like object is required, not 'NoneType')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that when I want to build a new image from a Containerfile that does not require any additional artifacts, I cannot use the python bindings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Testing (from main branch) with:

    build_response = container_repository_api.build_image(
        repository.pulp_href, containerfile=containerfile_name, artifacts=None
    )

will fail:

E               TypeError: a bytes-like object is required, not 'NoneType'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so unfortunate. Is there any chance that we provide an invalid openapi schema for the generation?

@@ -0,0 +1 @@
Added a feature to allow using file `repository_version` as a building context of a Containerfile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Added a feature to allow using file `repository_version` as a building context of a Containerfile.
Added support for using the file `repository_version` as the build context for a Containerfile.

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch 5 times, most recently from 0e87500 to a988728 Compare July 17, 2024 19:13
Comment on lines 126 to 128
serializer = serializers.OCIBuildImageSerializer(
data=request.data, context={"request": request}
)
Copy link
Member

@lubosmj lubosmj Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we cannot make this more generic, we should rename the function to something like "has_image_build_file_repo_perms".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad!

@@ -0,0 +1 @@
Removed support for using raw artifacts as the build context for a Containerfile.
Copy link
Member

@lubosmj lubosmj Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. You can mention that we replaced "artifacts" by "repo_version" in the feature changelog.

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch from a988728 to 010d8fb Compare July 18, 2024 15:02
Comment on lines 1 to 2
Replaced `artifacts` by `repository_version` as the parameter to provide the build context
for a Containerfile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of repository_version? pulp-container's one? 😜

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops ... good catch!

@@ -41,12 +41,18 @@ CMD ["cat", "/inside-image.txt"]' >> Containerfile
## Build an OCI image

```bash
TASK_HREF=$(http --form POST :$REPO_HREF'build_image/' containerfile@./Containerfile \
artifacts="{\"$ARTIFACT_HREF\": \"foo/bar/example.txt\"}" | jq -r '.task')
ARTIFACT_SHA256=$(http :$ARTIFACT_HREF | jq -r '.sha256')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer care about artifacts, do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum ... we don't, I kept this artifact just as a sample/doc to populate the repository:
https://github.com/pulp/pulp_container/pull/1687/files/010d8fb88c2361abbe8f4cefc5a6b62ba42308ee#diff-8d94fc0843f51b6f430b79c609efee42205be12f7fee3369c1dd7824eeabe779R47-R48

I'll update it to create the file content "directly".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do. We should forget about using the artifacts endpoint.

Comment on lines 128 to 130
obj = view.get_object() if view.detail else None
context = {"request": request}
serializer = view.serializer_class(instance=obj, data=request.data, context=context)
Copy link
Member

@lubosmj lubosmj Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help me understand this. This is invoking ContainerRepositorySerializer instead of OCIBuildImageSerializer, correct? Is the repo_version always initialized during the serialization when provided? Or, is this function always returning True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help me understand this. This is invoking ContainerRepositorySerializer instead of OCIBuildImageSerializer, correct?

Hum... I'm seeing the view.serializer_class returning OCIBuildImageSerializer in the debugger.

Is the repo_version always initialized during the serialization when provided?

Yes, when we provide a repo_version, the serializer will:
https://github.com/pulp/pulp_container/pull/1687/files/010d8fb88c2361abbe8f4cefc5a6b62ba42308ee#diff-1689ddc49f7d0e21a80ae6006966cfb47766de4a004c7bb1b8e7f442b285384eR752
https://github.com/pulp/pulp_container/pull/1687/files/010d8fb88c2361abbe8f4cefc5a6b62ba42308ee#diff-1689ddc49f7d0e21a80ae6006966cfb47766de4a004c7bb1b8e7f442b285384eR780-R781

    repo_version = RepositoryVersionRelatedField(...)
... 
        if "repo_version" in data:
        |   data["repo_version"] = data["repo_version"].pk

Or, is this function always returning True?

If my tests were valid, it is returning False if the user has no permission to the file repo.

Running the following test returned "You do not have permission to perform this action.":

pulp user create --username test --password krokodyl1 --email test@test.com
pulp user role-assignment add --username test --role "container.containerdistribution_collaborator" --object ""
pulp user role-assignment add --username test --role "container.containerrepository_content_manager" --object ""
echo 'Hello world!' > /tmp/example.txt

FILE_REPO=$(pulp file repository create --name foo --autopublish | jq -r '.pulp_href')
pulp file content upload --repository foo --file /tmp/example.txt --relative-path example4.txt

CONTAINER_REPO=$(pulp container repository create --name build | jq -r '.pulp_href')
mkdir -p /tmp/test
echo 'FROM centos:7
COPY example4.txt /inside-image.txt
CMD ["cat", "/inside-image.txt"]' > /tmp/test/Containerfile

http -a test:krokodyl1 --form POST :5001${CONTAINER_REPO}build_image/ "containerfile@/tmp/test/Containerfile" repo_version=${FILE_REPO}versions/1/

and then it worked (returned "202 Accepted") with:

pulp user role-assignment add --username test --role "file.filerepository_viewer" --object ""
http -a test:krokodyl1 --form POST :5001${CONTAINER_REPO}build_image/ "containerfile@/tmp/test/Containerfile" repo_version=${FILE_REPO}versions/1/

Copy link
Member

@lubosmj lubosmj Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for clarifying this. I thought that it would just take the default repository viewset serializer from

serializer_class = serializers.ContainerRepositorySerializer

Since this works out of the box and it is generic enough, do you think we should move this access policy check to pulpcore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... considering it could be helpful to other plugins I would vote yes, but considering the need to do a serialization I am not confident that it is a good approach (at least, not yet to be used by other plugins).
Do you think the serialization is fine and we should not be too worried about it?

Copy link
Member

@lubosmj lubosmj Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, something similar is already used in https://github.com/pulp/pulpcore/blob/92785d9e3ed917b8b0bec4c24c3669f594a81018/pulpcore/app/global_access_conditions.py#L146. But, I guess this whole discussion needs someone who has better expertise in this area...

@lubosmj
Copy link
Member

lubosmj commented Jul 23, 2024

This is blocked by pulp/pulpcore#5618.

Comment on lines 119 to 127
def test_build_image_from_repo_version_with_creator_user(
build_image,
containerfile_name,
create_file_and_container_repos_with_sample_data,
delete_orphans_pre,
gen_user,
):
"""Test if a user (with the expected permissions) can build an OCI image."""
container_repo, file_repo = create_file_and_container_repos_with_sample_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the pulp_container's container_repo fixture instead.

def test_build_image(
pulpcore_bindings,
@pytest.fixture
def create_file_and_container_repos_with_sample_data(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def create_file_and_container_repos_with_sample_data(
def populated_file_repo(

You can use the pulpcore's file_repo fixture and upload content to it. There is no need to enable auto-publishing.


def _copy_file_from_artifact(context_path, relative_path, artifact):
dest_path = os.path.join(context_path, relative_path)
dirs = os.path.split(dest_path)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dirs = os.path.split(dest_path)[0]
dirs = os.path.dirname(dest_path)

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch from c4b90a1 to fc6d605 Compare July 24, 2024 12:14
lubosmj
lubosmj previously approved these changes Jul 24, 2024
Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me! Let's wait for other members to review this.

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch from fc6d605 to ef4a129 Compare July 26, 2024 11:29
Comment on lines 1 to 2
Replaced `artifacts` by `repository_version` (i.e., a file plugin repository version HREF)
as the parameter to provide the build context for a Containerfile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that "artifacts" is removed in turn. You should add a removal snippet.

Remind me: Image building is still tech-preview?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, image building is still tech-preview. Should we add a removal snippet even this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say, yes.

@@ -956,7 +956,7 @@ def build_image(self, request, pk):
"containerfile_pk": str(containerfile.pk),
"tag": tag,
"repository_pk": str(repository.pk),
"artifacts": artifacts,
"repo_version": repo_version.pk,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"repo_version": repo_version.pk,
"repository_version_pk": repository_version.pk,

It's easier to follow around if we don't change the name in the middle.
And maybe this should be build_context_pk instead.

@@ -750,11 +749,10 @@ class OCIBuildImageSerializer(ValidateFieldsMixin, serializers.Serializer):
tag = serializers.CharField(
required=False, default="latest", help_text="A tag name for the new image being built."
)
artifacts = serializers.JSONField(
repository_version = RepositoryVersionRelatedField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we call that build_context?


containerfile_path = os.path.join(working_directory, "Containerfile")
if repo_version:
file_repository = RepositoryVersion.objects.get(pk=repo_version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming here could also be more consistent:

Suggested change
file_repository = RepositoryVersion.objects.get(pk=repo_version)
build_context = RepositoryVersion.objects.get(pk=build_context_pk)

content_artifacts = ContentArtifact.objects.filter(
content__in=file_repository.content
).order_by("-content__pulp_created")
for content in content_artifacts.select_related("artifact").iterator():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too: using the name "content" is misleading since it's content artifact.
BTW: I think it would be better to loop over actual FileContent to be future proof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... even if with FileContent we would need to do more database hits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we add a filter(content__pulp_type="file") to be sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Comment on lines 9 to 11
Containerfile is the same as for a Dockerfile. The same REST API endpoint also accepts a JSON
string that maps artifacts in Pulp to a filename. Any artifacts passed in are available inside the
string that maps artifacts in Pulp to a filename. Any file passed in are available inside the
build container at `/pulp_working_directory`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that still true? Didn't we say we remove the artifacts parameter and ask for a build_context instead?

@@ -774,11 +773,10 @@ class OCIBuildImageSerializer(ValidateFieldsMixin, serializers.Serializer):
tag = serializers.CharField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, i cannot comment on 10 lines above ...
I think, we should not keep the containerfile_artifact as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... I could think of 2 ways to do that:

  • provide a new field to pass the pulp_href of the File Content with the Containerfile (not sure if this could be considered an advantage, but with this approach users could use a different repository to store the Containerfile or pass the same as the build_context)
  • use the build_context field and add a logic to validate if the Containerfile is present in the specified File Repository

What do you think of these ideas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's valuable to be able to use the same build-context with different Container files.

That being said, we could add a containerfile name field to select one file from the build context. Or, yes, we just allow to specify a file_content item for now. I don't know exactly what workflows we need to cover. We still (already) have the option to upload a file on the go, right? So adding more options feels not too bad.

"relative path (name) inside the /pulp_working_directory of the build container "
"executing the Containerfile.",
help_text=_("RepositoryVersion to be used as the build context for container images."),
allow_null=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want this to be a FileRepositoryVersion, right? Can we do something about the queryset here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... I tried to find a FileRepositoryVersion model, but I couldn't.
I also tried to pass a FileRepository qs, but it didn't work:

    build_context = RepositoryVersionRelatedField(
        required=False,
        help_text=_("RepositoryVersion to be used as the build context for container images."),
        allow_null=True,
        queryset = FileRepository.objects.all()
    )


django.core.exceptions.FieldError: Cannot resolve keyword 'number' into field. Choices are: alternatecontentsourcepath, autopublish, content, core_pulp_exporter, description, distributions, group_roles, manifest, name, next_version, pulp_created, pulp_domain, pulp_domain_id, pulp_id, pulp_labels, pulp_last_updated, pulp_type, pulpimporterrepository, remote, remote_id, repository_ptr, repository_ptr_id, repositorycontent, retain_repo_versions, uploads, user_hidden, user_roles, versions

I also tried to "adapt" the qs like (to do a reverse lookup of the repository version from repository):

        queryset = FileRepository.objects.all().repositoryversion_set,

but django complained:

AttributeError: 'QuerySet' object has no attribute 'repositoryversion_set'

Am I doing something wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's not possible. But maybe again RepositoryVersion.objects.filter(repository__pulp_type="file")?

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch 2 times, most recently from 2b1da87 to 7b030c8 Compare July 30, 2024 19:54
@lubosmj lubosmj dismissed their stale review July 30, 2024 20:32

More changes have been applied since the last review.

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch from 7b030c8 to d6bb18e Compare July 31, 2024 18:39
Copy link
Member

@lubosmj lubosmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I follow the documentation to try out the examples, step by step (the only change I made was to the base container, I used alpine:3.9), I receive the following error. Do you think we can fix that? The issue is that we are trying to save the manifest that already exists in the database.

TASK_HREF=$(http --form POST :5001$CONTAINER_REPO'build_image/' containerfile_name=MyContainerfile \
build_context=${FILE_REPO}versions/2/ | jq -r '.task')
[pulp]           | pulp [b8bd7f171ad44854bcd899be74b43c39]: pulpcore.tasking.tasks:INFO: Task 01915551-60c7-75f4-8e41-7a2bd80b724c failed (duplicate key value violates unique constraint "container_manifest_digest_d362b1f0_uniq"
[pulp]           | DETAIL:  Key (digest)=(sha256:87989b67cf10c8f3471fb7ec3bb959f92c6aa7d6fd8de6d935907e29d125e4d0) already exists.)
[pulp]           | pulp [b8bd7f171ad44854bcd899be74b43c39]: pulpcore.tasking.tasks:INFO:   File "/src/pulpcore/pulpcore/tasking/tasks.py", line 75, in _execute_task
[pulp]           |     result = func(*args, **kwargs)
[pulp]           | 
[pulp]           |   File "/src/pulp_container/pulp_container/app/tasks/builder.py", line 188, in build_image_from_containerfile
[pulp]           |     repository_version = add_image_from_directory_to_repository(image_dir, repository, tag)
[pulp]           | 
[pulp]           |   File "/src/pulp_container/pulp_container/app/tasks/builder.py", line 77, in add_image_from_directory_to_repository
[pulp]           |     manifest.save()
[pulp]           | 
[pulp]           |   File "/usr/lib64/python3.9/contextlib.py", line 79, in inner
[pulp]           |     return func(*args, **kwds)
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django_lifecycle/mixins.py", line 139, in save
[pulp]           |     save(*args, **kwargs)
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 814, in save
[pulp]           |     self.save_base(
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 877, in save_base
[pulp]           |     updated = self._save_table(
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 1020, in _save_table
[pulp]           |     results = self._do_insert(
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/models/base.py", line 1061, in _do_insert
[pulp]           |     return manager._insert(
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/models/manager.py", line 87, in manager_method
[pulp]           |     return getattr(self.get_queryset(), name)(*args, **kwargs)
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/models/query.py", line 1805, in _insert
[pulp]           |     return query.get_compiler(using=using).execute_sql(returning_fields)
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1822, in execute_sql
[pulp]           |     cursor.execute(sql, params)
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 67, in execute
[pulp]           |     return self._execute_with_wrappers(
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
[pulp]           |     return executor(sql, params, many, context)
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
[pulp]           |     return self.cursor.execute(sql, params)
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/utils.py", line 91, in __exit__
[pulp]           |     raise dj_exc_value.with_traceback(traceback) from exc_value
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/django/db/backends/utils.py", line 89, in _execute
[pulp]           |     return self.cursor.execute(sql, params)
[pulp]           | 
[pulp]           |   File "/usr/local/lib/python3.9/site-packages/psycopg/cursor.py", line 97, in execute
[pulp]           |     raise ex.with_traceback(None)

image


```bash
TASK_HREF=$(http --form POST :$REPO_HREF'build_image/' containerfile@./Containerfile \
artifacts="{\"$ARTIFACT_HREF\": \"foo/bar/example.txt\"}" | jq -r '.task')
TASK_HREF=$(http --form POST :$CONTAINER_REPO'build_image/' "containerfile@./Containerfile" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow use a real port number here? Not sure how we use it across the documentation. Or, at least use an out-of-nowhere environment variable? I got bitten by this.

Comment on lines 10 to 11
string that maps artifacts in Pulp to a filename. Any files passed in (via `build_context`) are
available inside the build container at the path defined in File Content `relative-path`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this quite captures it. It's more like the files in the build_context repository version are available at there relative path in the build context of the container. Yes this sounds like a tautology (which in turn is a sign that we chose a proper name for it). So we should probably focus more on how the parameter build_context defines the actual "build context" the container image is created with.

Note: To make a file actually be present in the running container, you need to first have it in a file repository version, then use the latter as the build_context for the image builder feature. Then still, you need to have the ADD <file> <location in container> statement.

# a "repo" or "repository_version" arguments, so we need to pass "build_context" as
# "repository_version" to be able to validate the permissions
if data.get("build_context", None):
data["repository_version"] = data["build_context"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh bugger!
I guess we cannot pass extra parameters to those functions (they are used to define the access policies).
One other option would be to roll our own permission condition. I'm not yet convinced we should do that.

Comment on lines 782 to 787
def __init__(self, *args, **kwargs):
"""Initializer for OCIBuildImageSerializer."""
super().__init__(*args, **kwargs)
self.fields["containerfile_artifact"].required = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make that function disappear completely now.


def validate(self, data):
"""Validates that all the fields make sense."""
data = super().validate(data)

if bool(data.get("containerfile", None)) == bool(data.get("containerfile_name", None)):
raise serializers.ValidationError(
_("'containerfile' or 'containerfile_name' must be specified.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_("'containerfile' or 'containerfile_name' must be specified.")
_("Exactly one of 'containerfile' or 'containerfile_name' must be specified.")

@@ -96,7 +96,11 @@ def add_image_from_directory_to_repository(path, repository, tag):


def build_image_from_containerfile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have an issue here.
You know tasks can be older than your current installation. We never required draining the task queue as a prerequisite for upgrading the codebase. So i wonder if we need to keep the artifacts parameter around. That however raises the next question, whether we still need to handle it properly or whether we use it to make the task fail with a good error message.
Since we are changing it almost completely, i guess the feature "image building" is in tech-preview. So the reach of this question is kind of limited.
@lubosmj your call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, the safest thing to do would be to leave the old API untouched (including the ViewSet action), introduce a new one, and then deprecate the old one after 2 releases. Otherwise, we would leave artifacts in place forever. Since this functionality is flagged as "tech-preview" and we completely overhauled the API, I call this concern unimportant.

Personally, I would rather implement a safety feature in pulpcore that will detect potential changes of the function signature after unpickling the function arguments: https://github.com/pulp/pulpcore/blob/e2f03715de68736b78942e5fa3aca98de2c318f7/pulpcore/tasking/tasks.py#L70-L75.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's kind of safe, since an old task would just fail to start up on a new codebase. And we'd be like: "Hey yeah, tech preview. Try starting again." 🤷 .
More dangerous are changes, where the task didn't change the signature, but interprets it's parameters in a different way.

Comment on lines 151 to 156
try:
containerfile
except NameError:
raise RuntimeError(
'"' + containerfile_name + '" containerfile not found in build_context!'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of hard to grasp.
I think we can assume that either we have a containerfile_pk or we have both the containerfile_name and the buildcontext.
So can we just add a check that containerfile_name is one of the relative_paths in the build_context?
I think we can even move that check into the serializer helping the user with an early error message.

Comment on lines 160 to 161
with open(containerfile_path, "wb") as dest:
shutil.copyfileobj(containerfile.file, dest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a small optimization possible here in case the containerfile is already on disk as part of the build_context. What if we move this copy operation up to line 125 and carry only the containerfile_path around from there.
Sounds like it could make use of copy_file_from_artifact too.

@@ -696,6 +696,7 @@ class ContainerRepositoryViewSet(
"condition": [
"has_model_or_obj_perms:container.build_image_containerrepository",
"has_model_or_obj_perms:container.view_containerrepository",
"has_repo_or_repo_ver_param_model_or_obj_perms:file.view_filerepository",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe these functions do allow to pass parameters.
(It would need a change in pulpcore first, right?)

containerfile_pk = None
if containerfile := serializer.validated_data.get("containerfile_artifact", None):
try:
containerfile.save()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note at this point. This will raise some warnings in the log stream, I believe and some people are kind of sensitive to the that. We should be prepared to change the logic to "look first, try to save then, except look again."
Don't change anything just yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, i wonder now if the containerfile should be a PulpTemporaryFile instead of an Artifact after all.
@lubosmj WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is good idea. If a user uploads a raw containerfile, we should store it into a PulpTemporaryFile, so it will be accessible from a task easily, no artifacts involved.

Artifact.objects.filter(pk__in=artifacts.keys()).touch()
build_context_pk = None
if build_context := serializer.validated_data.get("build_context", None):
build_context_pk = build_context.pk
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to put the build contexts repository (not the repository version) into the shared_resources for this task.

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch from d6bb18e to c90a1fc Compare August 21, 2024 20:05
@@ -68,15 +73,13 @@ def add_image_from_directory_to_repository(path, repository, tag):
manifest_digest = calculate_digest(bytes_data)
manifest_text_data = bytes_data.decode("utf-8")

manifest = Manifest(
manifest, _ = Manifest.objects.update_or_create(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this needs some explanation. Also i think we cannot do this. Content is immutable by design. (That is like a result of the number one Pulp architecture decisions that content is supposed to be shared.)
Did you mean get_or_create?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review and explaining about Content immutability (I didn't know that)!
I was trying to fix an issue, found by Lubos++, where trying to reuse a Containerfile in different build tasks would fail.
Since Content must be immutable, I agree that get_or_create should be used in these cases.

manifest.save()
tag = Tag(name=tag, tagged_manifest=manifest)
tag.save()
tag, _ = Tag.objects.update_or_create(name=tag, tagged_manifest=manifest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -43,7 +48,7 @@ def get_or_create_blob(layer_json, manifest, path):
artifact=layer_artifact, content=blob, relative_path=layer_json["digest"]
).save()
if layer_json["mediaType"] != MEDIA_TYPE.CONFIG_BLOB_OCI:
BlobManifest(manifest=manifest, manifest_blob=blob).save()
BlobManifest.objects.update_or_create(manifest=manifest, manifest_blob=blob)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And i think it applies here too.

@@ -96,7 +96,11 @@ def add_image_from_directory_to_repository(path, repository, tag):


def build_image_from_containerfile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's kind of safe, since an old task would just fail to start up on a new codebase. And we'd be like: "Hey yeah, tech preview. Try starting again." 🤷 .
More dangerous are changes, where the task didn't change the signature, but interprets it's parameters in a different way.

@git-hyagi git-hyagi force-pushed the build-context-from-repoversion branch from c90a1fc to 71a3fe3 Compare August 26, 2024 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

I can specify a repository_version as the build context for container images
3 participants