Added the SharedAttributeManager feature. #3156

ggainey · 2022-08-31T19:30:49Z

fixes #2824.
[nocoverage]

ggainey · 2022-08-31T19:34:19Z

~~This is VERY draft - the functionality from the associated feature is working with a few exceptions (e.g., /add/ doesn't auto-apply the current managed_attributes, you need to call /apply/ yourself)~~

All paths work. See comments here for security concerns. The code is absolutely in Dire Need of eyes and suggestions.

Test work in progress; currently; script I currently use to test the existing functionality is here:

https://github.com/ggainey/pulp_startup/blob/main/sam/exercise

pulpcore/app/models/base.py

CHANGES/2824.feature

docs/workflows/shared-attribute-management.rst

pulpcore/app/models/base.py

bmbouter · 2022-09-02T15:12:35Z

pulpcore/app/models/base.py

+    # TODO: Need this override because somewhere in tasking "assumes" everything can be cast()
+    # need to go back and find it and open an issue
+    def cast(self):
+        return self


Can we remove this and see what happens? If it is an issue in practice let's solve it now instead of taking an issue on the backlog (to me). Regardless of resolution, knowing what happens will still be required pre-merge.

We fail in tasks/base at lines like https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py#L68 . There are four uses of cast() in that code that I haven't grokked, tbqh. Can they all be replaced with the construct mdellweg uses at https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py#L38 ?

If that's the only line, it means that general_create was limited to Master-Detail models. But we can certainly move the call to cast on it's own line with a Suppress(AttributeError) context.

There are four places in https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py where we cast() a result - 38, 68, 86, and 104. 104 protects it in what looks to be a reasonable way, which I've implemented 68 and 86 (missed 38 as you note below, sorry!) wdyt?

pulpcore/app/models/fields.py

bmbouter · 2022-09-02T15:15:54Z

pulpcore/app/serializers/sam.py

+        if not href:
+            raise ValidationError("Must not be empty.")


Is this actually needed? I thought DRF would handle this with the required=True. Can you confirm?

pulpcore/app/tasks/sam.py

bmbouter · 2022-09-02T15:26:45Z

pulpcore/app/serializers/sam.py

+    managed_attributes = DictField(
+        child=JSONField(),
+        help_text=_("A JSON Object of the attributes and values being managed by this object."),
+        required=False,


I imagined we would be requiring this right from the start. What's the use case you imagined?

In my head, I imagined "I know I'm going to want a 'manage my socket params' SAM, but I don't know what the exact attributes are yet; let's get one set up" was something an admin/user might think. wdyt?

bmbouter · 2022-09-02T15:31:16Z

pulpcore/app/serializers/sam.py

+    managed_entities = ListField(
+        child=CharField(),
+        help_text=_("A list of HREFs of the entities being managed."),
+        required=False,


I imagined we would be requiring this right from the start. What's the use case you imagined?

This one I feel more strongly about - "I know what my remote-cnx-info is going to be, I don't have the Remote yet" is def the kind of thing I would do when setting up an instance.

pulpcore/app/tasks/sam.py

pulpcore/app/viewsets/custom_filters.py

pulpcore/app/viewsets/sam.py

bmbouter · 2022-09-02T15:39:30Z

There is discussion on Matrix about having a SAM have its data be write_only=True. I think if the endpoint is correctly RBAC'd that shouldn't be necessary, but I'd like to ask the group here. Not knowing what is in a SAM makes it much less usable, I would hate that as a user.

bmbouter · 2022-09-02T15:40:05Z

I left the bulk of my comments. The other major thing it needs (as chatted about on Matrix) is tests. Overall though, this is really, really great!

mdellweg · 2022-09-06T08:01:43Z

pulpcore/app/tasks/base.py

-    instance = serializer.save().cast()
+    instance = serializer.save()
    resource = CreatedResource(content_object=instance)


I'm not so sure about this. I think this will save a different thing in the content_object. It may not result in a different view to the user, because we call cast again when we create the href from it.

Ah, makes sense - how about this :

instance = serializer.save() if isinstance(instance, MasterModel): instance = instance.cast()

Now that i think of it... The serializer is detail already. So it's instance should be detail all the way down. Maybe it is proper to just not cast here.

Publication and Distribution both use the AsyncCreateMixin. Walking through the Distribution call verifies you are correct, sir :)

newswangerd · 2022-09-06T14:48:16Z

pulpcore/app/models/base.py

+
+    name = models.TextField(db_index=True, unique=True)
+    managed_attributes = EncryptedJSONField(blank=True, null=True)
+    managed_entities = ArrayField(models.TextField(), null=True)


This is going to break if you change your API_ROOT for some reason

There are a number of things that will break/not-work if you change API_ROOT on an established system ~~(e.g., tasks record created-resources as pulp-hrefs) That may be something we have to Think Hard about - but I don't think this PR introduces it.~~ Incorrect, that uses GenericRelationModel so it's an actual link to the object.

We could go that route; I was aiming for making SAM be only very-loosely-coupled to what it was/could manage. @bmbouter , what are your thoughts here?

It's true that this approach would break if you changed API_ROOT. It's also (I think) true that there isn't any other part of Pulp that stores hrefs as a foreign key type data structure (I just looked). What would be ideal would be a GenericForiegnKey Many-To-Many implementation, but I don't see a way to do that that is supported by Django. So you'd end up basically doing a DIY implementation where the ArrayField stores tuples with one being an entry of the type (like how GenericForeignKey works) and the second being the pk into that table.

That would be slightly nicer for the user. Much less nice for @ggainey, but yes nicer for the user.

At least we store task resources as href or arbitrary string. So changing API_ROOT without draining the task queue first is terribly dangerous.
But yes, i agree, using GenericForeignKeys is a good choice. Having them in an ArrayField probably means that we cannot have them autodetatch from the SAM on deletion.

I'm saying we can't use GenericForeignKeys because of the ManyToMany requirement. We'd have to DIY that basically which I'm uneasy about...

I think we have a similar many-to-many relationship with ObjectRoles. At least worth exploring.

newswangerd · 2022-09-06T15:20:21Z

I'm having issues with write only fields.

Here's my SAM:

{
    "pulp_href": "/api/automation-hub/pulp/api/v3/shared-attribute-managers/c2b83fe5-1f66-4af6-8131-b0aceb995f4b/",
    "pulp_created": "2022-09-06T15:17:47.869183Z",
    "name": "test manager",
    "managed_attributes": {
        "download_concurrency": 3,
        "proxy_password": "foo"
    },
    "managed_entities": [
        "/api/automation-hub/pulp/api/v3/remotes/ansible/collection/591eab6b-db51-4910-b321-114dcd904a0a/",
        "/api/automation-hub/pulp/api/v3/remotes/ansible/collection/e8ed8b84-ad77-4a6c-8782-156d4ffd87c8/"
    ]
}

When I try to apply it

pulp_1   | pulp []: pulpcore.tasking.pulpcore_worker:INFO: Starting task 6c7429e0-2e8b-4ebd-9aae-f37455696480
pulp_1   | pulp []: pulpcore.app.tasks.sam:WARNING: Attributes {'download_concurrency': 3, 'proxy_password': 'foo'} didn't validate for entity <CollectionRemote: rh-certified>
pulp_1   | pulp []: pulpcore.app.tasks.sam:WARNING: Attributes {'download_concurrency': 3, 'proxy_password': 'foo'} didn't validate for entity <CollectionRemote: community>
pulp_1   | pulp []: pulpcore.tasking.pulpcore_worker:INFO: Task completed 6c7429e0-2e8b-4ebd-9aae-f37455696480

Removing the proxy_password attribute makes the SAM apply correctly.

newswangerd · 2022-09-06T15:22:48Z

It's currently impossible to tell why the SAM failed to apply on the task page. Mine failed and I got this task:

The progress report indicates that I have 2 failed resources, but I have no idea which one or why.

{
    "pulp_href": "/api/automation-hub/pulp/api/v3/tasks/6c7429e0-2e8b-4ebd-9aae-f37455696480/",
    "pulp_created": "2022-09-06T15:18:05.680301Z",
    "state": "completed",
    "name": "pulpcore.app.tasks.sam.update_managed_entities",
    "logging_cid": "",
    "started_at": "2022-09-06T15:18:05.731353Z",
    "finished_at": "2022-09-06T15:18:05.933563Z",
    "error": null,
    "worker": "/api/automation-hub/pulp/api/v3/workers/f1d11a2f-2365-4495-8e33-dcc7075d89e9/",
    "parent_task": null,
    "child_tasks": [],
    "task_group": null,
    "progress_reports": [
        {
            "message": "Updating Managed Entities",
            "code": "sam.apply",
            "state": "completed",
            "total": 2,
            "done": 2,
            "suffix": null
        },
        {
            "message": "Successful Updates",
            "code": "sam.apply_success",
            "state": "completed",
            "total": 2,
            "done": 0,
            "suffix": null
        },
        {
            "message": "Failed Updates",
            "code": "sam.apply_failures",
            "state": "completed",
            "total": 2,
            "done": 2,
            "suffix": null
        }
    ],
    "created_resources": [],
    "reserved_resources_record": [
        "/api/automation-hub/pulp/api/v3/remotes/ansible/collection/591eab6b-db51-4910-b321-114dcd904a0a/",
        "/api/automation-hub/pulp/api/v3/shared-attribute-managers/c2b83fe5-1f66-4af6-8131-b0aceb995f4b/",
        "/api/automation-hub/pulp/api/v3/remotes/ansible/collection/e8ed8b84-ad77-4a6c-8782-156d4ffd87c8/"
    ]
}

ggainey · 2022-09-06T17:30:44Z

It's currently impossible to tell why the SAM failed to apply on the task page. Mine failed and I got this task:

The immediate issue is that the remote-validator is working - the error (which you have no way of seeing) is

{'non_field_errors': [ErrorDetail(string='proxy credentials cannot be specified without a proxy', code='invalid')]}

If you create a remote that has a proxy-url/user/password, you can change just one with a SAM. But you can't (or at least, this code works to make it hard to) create an "impossible" object.

This def points out that logging needs to be WAY better, great test. I'll work on that this afternoon.

newswangerd · 2022-09-06T18:27:31Z

@ggainey so, after some more testing, I think that for this to really be viable in hub we need two major changes:

Some way to validate the data right away when the SAM is created/updated:

When a user updates a SAM on via the UI, we need to be able to tell them that there is an issue and how to fix it, or it's going to result in a large number of customer support cases as well as lots of broken remotes. I'm not sure how to do this. One approach would be to get all the serializers for each distinct type in the managed_resources field and see if the data can be applied to that, but that won't work for serializers that have unique required fields like "name", and it doesn't account for invalid combinations of data on individual instances (like the proxy one I found by accident). Another approach would be to check each instance in managed_resources during updates, but that's time consuming and will likely have to go in a task, which the UI can't really parse effectively to give user's feedback one which fields need changing.

Some way to mask sensitive data on the API

Encryption at rest is a good start, but we're also not allowed to expose secrets via the API. You might be able to solve this with some kind of configuration that tells the system which keys to redact?

bmbouter · 2022-09-06T18:46:57Z

FWIW, I'm not entirely sure why we have write_only fields at all. I remember there was a lot of discussion about it, but really with RBAC in place, anyone who is authorized to read the credentials should be able to read the credentials. Maybe someone can refresh my memory here. The conclusion I've reached is that we shouldn't have write_only fields at all.

ggainey · 2022-09-06T18:49:42Z

Some way to validate the data right away when the SAM is created/updated:

The problem is that models often validate attributes against each other, using an existing object's values for partial updates. Proxy is a perfect example - you can change, for example, just the password, but only if the remote in question has a proxy-url and proxy-username . How do we do that a-priori?

The code as it is now, instantiates the serializer for each entity in turn, and then tells the serializer "this is a partial update" - which means "just proxy-password" works, as long as the remote already has proxy-username and proxy-url.

The change I'm just making now reports validation-errors back in the update-task's "error" field, as a dictionary of {entity-href: validation-error-string(s)}. It looks like this right this second:

...
error": {
        "/pulp/api/v3/remotes/rpm/rpm/7f1903ed-94ba-4c53-b81f-f79f7f5b8b30/": "Validation errors: {'non_field_errors': [ErrorDetail(string='proxy credentials cannot be specified without a proxy', code='invalid')]}"
    },
...

(I would like that to be a little less ugly, but the important thing is to return the specific error(s) for the specific entity)

Some way to mask sensitive data on the API

I'm not sure how to "mask the data on the API" at the same time as satisfying #2824 (comment) . Either the admin can see the data (via the API) - or it's protected/hidden.

It's going to take some thinking to figure out how to know what attributes to not-show via the API in a general-purpose feature like this. Maybe a configurable list of attribute-names? And then the managed_attributes gets a serializer that returns a dictionary with "HIDDEN" for any attr in that list?

ggainey · 2022-09-06T19:23:19Z

It's going to take some thinking to figure out how to know what attributes to not-show via the API in a general-purpose feature like this. Maybe a configurable list of attribute-names? And then the managed_attributes gets a serializer that returns a dictionary with "HIDDEN" for any attr in that list?

@newswangerd I just pushed a POC for this - see https://github.com/pulp/pulpcore/pull/3156/files#diff-a455308bb98b78dd42c2c7e60d0faa5293c3e651e465ff992c4ff6907d77a201R125 for the concept. Obviously, the 'sensitive_attributes' here would have to generalized/config-controlled or something, but this is the approach I'm thinking of.

Output of the API, and what is "actually in the DB", look like this:

$ http :/pulp/api/v3/shared-attribute-managers/
{
    "count": 1,
    "next": null,
    "previous": null,
    "results": [
        {
            "managed_attributes": {
                "download_concurrency": 3,
                "proxy_password": "HIDDEN"
            },
            "managed_entities": [
                "/pulp/api/v3/remotes/rpm/rpm/7f1903ed-94ba-4c53-b81f-f79f7f5b8b30/"
            ],
            "name": "SAM",
            "pulp_created": "2022-09-06T18:31:10.108752Z",
            "pulp_href": "/pulp/api/v3/shared-attribute-managers/c8748a9a-3f67-4371-8fde-a666a406419c/"
        }
    ]
}

$ pulpcore-manager shell
In [1]: from pulpcore.app.models.base import SharedAttributeManager
In [2]: a = SharedAttributeManager.objects.first()
In [3]: a.managed_attributes
Out[3]: {'proxy_password': 'blech', 'download_concurrency': 3}

[noissue] Required PR: pulp/pulpcore#3156

fixes pulp#2824. [nocoverage]

ggainey · 2022-09-08T19:07:28Z

Closing, see #2824 (comment)

ggainey marked this pull request as draft August 31, 2022 19:31

ggainey force-pushed the shared_config_manager branch from b5aa68c to 91820d2 Compare August 31, 2022 19:36

ggainey requested review from bmbouter, ipanova and newswangerd August 31, 2022 19:37

mdellweg reviewed Aug 31, 2022

View reviewed changes

pulpcore/app/models/base.py Show resolved Hide resolved

ggainey force-pushed the shared_config_manager branch 4 times, most recently from 648ae7d to a2f659a Compare September 1, 2022 19:36

ggainey changed the title ~~DRAFT - Initial POC for SharedAttributeManager entity.~~ Added the SharedAttributeManager feature. Sep 1, 2022

ggainey mentioned this pull request Sep 1, 2022

Add CLI support for shared-attribute-managers pulp/pulp-cli#547

Closed

ggainey marked this pull request as ready for review September 2, 2022 12:58

ggainey force-pushed the shared_config_manager branch from a2f659a to ef2a1fb Compare September 2, 2022 15:01