Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the SharedAttributeManager feature. #3156

Closed
wants to merge 1 commit into from

Conversation

ggainey
Copy link
Contributor

@ggainey ggainey commented Aug 31, 2022

fixes #2824.
[nocoverage]

@ggainey ggainey marked this pull request as draft August 31, 2022 19:31
@ggainey
Copy link
Contributor Author

ggainey commented Aug 31, 2022

This is VERY draft - the functionality from the associated feature is working with a few exceptions (e.g., /add/ doesn't auto-apply the current managed_attributes, you need to call /apply/ yourself)

All paths work. See comments here for security concerns. The code is absolutely in Dire Need of eyes and suggestions.

Test work in progress; currently; script I currently use to test the existing functionality is here:

https://github.com/ggainey/pulp_startup/blob/main/sam/exercise

@ggainey ggainey force-pushed the shared_config_manager branch 4 times, most recently from 648ae7d to a2f659a Compare September 1, 2022 19:36
@ggainey ggainey changed the title DRAFT - Initial POC for SharedAttributeManager entity. Added the SharedAttributeManager feature. Sep 1, 2022
@ggainey ggainey marked this pull request as ready for review September 2, 2022 12:58
Comment on lines 103 to 106
# TODO: Need this override because somewhere in tasking "assumes" everything can be cast()
# need to go back and find it and open an issue
def cast(self):
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this and see what happens? If it is an issue in practice let's solve it now instead of taking an issue on the backlog (to me). Regardless of resolution, knowing what happens will still be required pre-merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We fail in tasks/base at lines like https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py#L68 . There are four uses of cast() in that code that I haven't grokked, tbqh. Can they all be replaced with the construct mdellweg uses at https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py#L38 ?

Copy link
Member

@mdellweg mdellweg Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the only line, it means that general_create was limited to Master-Detail models. But we can certainly move the call to cast on it's own line with a Suppress(AttributeError) context.

Copy link
Contributor Author

@ggainey ggainey Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are four places in https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py where we cast() a result - 38, 68, 86, and 104. 104 protects it in what looks to be a reasonable way, which I've implemented 68 and 86 (missed 38 as you note below, sorry!) wdyt?

Comment on lines +48 to +49
if not href:
raise ValidationError("Must not be empty.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually needed? I thought DRF would handle this with the required=True. Can you confirm?

pulpcore/app/tasks/sam.py Outdated Show resolved Hide resolved
pulpcore/app/tasks/sam.py Outdated Show resolved Hide resolved
managed_attributes = DictField(
child=JSONField(),
help_text=_("A JSON Object of the attributes and values being managed by this object."),
required=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined we would be requiring this right from the start. What's the use case you imagined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my head, I imagined "I know I'm going to want a 'manage my socket params' SAM, but I don't know what the exact attributes are yet; let's get one set up" was something an admin/user might think. wdyt?

managed_entities = ListField(
child=CharField(),
help_text=_("A list of HREFs of the entities being managed."),
required=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined we would be requiring this right from the start. What's the use case you imagined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I feel more strongly about - "I know what my remote-cnx-info is going to be, I don't have the Remote yet" is def the kind of thing I would do when setting up an instance.

pulpcore/app/tasks/sam.py Outdated Show resolved Hide resolved
pulpcore/app/tasks/sam.py Outdated Show resolved Hide resolved
@bmbouter
Copy link
Member

bmbouter commented Sep 2, 2022

There is discussion on Matrix about having a SAM have its data be write_only=True. I think if the endpoint is correctly RBAC'd that shouldn't be necessary, but I'd like to ask the group here. Not knowing what is in a SAM makes it much less usable, I would hate that as a user.

@bmbouter
Copy link
Member

bmbouter commented Sep 2, 2022

I left the bulk of my comments. The other major thing it needs (as chatted about on Matrix) is tests. Overall though, this is really, really great!

@ggainey ggainey force-pushed the shared_config_manager branch 2 times, most recently from 6560f60 to 4eee512 Compare September 5, 2022 16:56
Comment on lines -38 to 39
instance = serializer.save().cast()
instance = serializer.save()
resource = CreatedResource(content_object=instance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about this. I think this will save a different thing in the content_object. It may not result in a different view to the user, because we call cast again when we create the href from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense - how about this :

    instance = serializer.save()
    if isinstance(instance, MasterModel):
        instance = instance.cast()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that i think of it... The serializer is detail already. So it's instance should be detail all the way down. Maybe it is proper to just not cast here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Publication and Distribution both use the AsyncCreateMixin. Walking through the Distribution call verifies you are correct, sir :)


name = models.TextField(db_index=True, unique=True)
managed_attributes = EncryptedJSONField(blank=True, null=True)
managed_entities = ArrayField(models.TextField(), null=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to break if you change your API_ROOT for some reason

Copy link
Contributor Author

@ggainey ggainey Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of things that will break/not-work if you change API_ROOT on an established system (e.g., tasks record created-resources as pulp-hrefs) That may be something we have to Think Hard about - but I don't think this PR introduces it. Incorrect, that uses GenericRelationModel so it's an actual link to the object.

We could go that route; I was aiming for making SAM be only very-loosely-coupled to what it was/could manage. @bmbouter , what are your thoughts here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that this approach would break if you changed API_ROOT. It's also (I think) true that there isn't any other part of Pulp that stores hrefs as a foreign key type data structure (I just looked). What would be ideal would be a GenericForiegnKey Many-To-Many implementation, but I don't see a way to do that that is supported by Django. So you'd end up basically doing a DIY implementation where the ArrayField stores tuples with one being an entry of the type (like how GenericForeignKey works) and the second being the pk into that table.

That would be slightly nicer for the user. Much less nice for @ggainey, but yes nicer for the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least we store task resources as href or arbitrary string. So changing API_ROOT without draining the task queue first is terribly dangerous.
But yes, i agree, using GenericForeignKeys is a good choice. Having them in an ArrayField probably means that we cannot have them autodetatch from the SAM on deletion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying we can't use GenericForeignKeys because of the ManyToMany requirement. We'd have to DIY that basically which I'm uneasy about...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a similar many-to-many relationship with ObjectRoles. At least worth exploring.

@newswangerd
Copy link
Contributor

I'm having issues with write only fields.

Here's my SAM:

{
    "pulp_href": "/api/automation-hub/pulp/api/v3/shared-attribute-managers/c2b83fe5-1f66-4af6-8131-b0aceb995f4b/",
    "pulp_created": "2022-09-06T15:17:47.869183Z",
    "name": "test manager",
    "managed_attributes": {
        "download_concurrency": 3,
        "proxy_password": "foo"
    },
    "managed_entities": [
        "/api/automation-hub/pulp/api/v3/remotes/ansible/collection/591eab6b-db51-4910-b321-114dcd904a0a/",
        "/api/automation-hub/pulp/api/v3/remotes/ansible/collection/e8ed8b84-ad77-4a6c-8782-156d4ffd87c8/"
    ]
}

When I try to apply it

pulp_1   | pulp []: pulpcore.tasking.pulpcore_worker:INFO: Starting task 6c7429e0-2e8b-4ebd-9aae-f37455696480
pulp_1   | pulp []: pulpcore.app.tasks.sam:WARNING: Attributes {'download_concurrency': 3, 'proxy_password': 'foo'} didn't validate for entity <CollectionRemote: rh-certified>
pulp_1   | pulp []: pulpcore.app.tasks.sam:WARNING: Attributes {'download_concurrency': 3, 'proxy_password': 'foo'} didn't validate for entity <CollectionRemote: community>
pulp_1   | pulp []: pulpcore.tasking.pulpcore_worker:INFO: Task completed 6c7429e0-2e8b-4ebd-9aae-f37455696480

Removing the proxy_password attribute makes the SAM apply correctly.

@newswangerd
Copy link
Contributor

It's currently impossible to tell why the SAM failed to apply on the task page. Mine failed and I got this task:

The progress report indicates that I have 2 failed resources, but I have no idea which one or why.

{
    "pulp_href": "/api/automation-hub/pulp/api/v3/tasks/6c7429e0-2e8b-4ebd-9aae-f37455696480/",
    "pulp_created": "2022-09-06T15:18:05.680301Z",
    "state": "completed",
    "name": "pulpcore.app.tasks.sam.update_managed_entities",
    "logging_cid": "",
    "started_at": "2022-09-06T15:18:05.731353Z",
    "finished_at": "2022-09-06T15:18:05.933563Z",
    "error": null,
    "worker": "/api/automation-hub/pulp/api/v3/workers/f1d11a2f-2365-4495-8e33-dcc7075d89e9/",
    "parent_task": null,
    "child_tasks": [],
    "task_group": null,
    "progress_reports": [
        {
            "message": "Updating Managed Entities",
            "code": "sam.apply",
            "state": "completed",
            "total": 2,
            "done": 2,
            "suffix": null
        },
        {
            "message": "Successful Updates",
            "code": "sam.apply_success",
            "state": "completed",
            "total": 2,
            "done": 0,
            "suffix": null
        },
        {
            "message": "Failed Updates",
            "code": "sam.apply_failures",
            "state": "completed",
            "total": 2,
            "done": 2,
            "suffix": null
        }
    ],
    "created_resources": [],
    "reserved_resources_record": [
        "/api/automation-hub/pulp/api/v3/remotes/ansible/collection/591eab6b-db51-4910-b321-114dcd904a0a/",
        "/api/automation-hub/pulp/api/v3/shared-attribute-managers/c2b83fe5-1f66-4af6-8131-b0aceb995f4b/",
        "/api/automation-hub/pulp/api/v3/remotes/ansible/collection/e8ed8b84-ad77-4a6c-8782-156d4ffd87c8/"
    ]
}

@ggainey
Copy link
Contributor Author

ggainey commented Sep 6, 2022

It's currently impossible to tell why the SAM failed to apply on the task page. Mine failed and I got this task:

The immediate issue is that the remote-validator is working - the error (which you have no way of seeing) is

{'non_field_errors': [ErrorDetail(string='proxy credentials cannot be specified without a proxy', code='invalid')]} 

If you create a remote that has a proxy-url/user/password, you can change just one with a SAM. But you can't (or at least, this code works to make it hard to) create an "impossible" object.

This def points out that logging needs to be WAY better, great test. I'll work on that this afternoon.

@newswangerd
Copy link
Contributor

@ggainey so, after some more testing, I think that for this to really be viable in hub we need two major changes:

  1. Some way to validate the data right away when the SAM is created/updated:

When a user updates a SAM on via the UI, we need to be able to tell them that there is an issue and how to fix it, or it's going to result in a large number of customer support cases as well as lots of broken remotes. I'm not sure how to do this. One approach would be to get all the serializers for each distinct type in the managed_resources field and see if the data can be applied to that, but that won't work for serializers that have unique required fields like "name", and it doesn't account for invalid combinations of data on individual instances (like the proxy one I found by accident). Another approach would be to check each instance in managed_resources during updates, but that's time consuming and will likely have to go in a task, which the UI can't really parse effectively to give user's feedback one which fields need changing.

  1. Some way to mask sensitive data on the API

Encryption at rest is a good start, but we're also not allowed to expose secrets via the API. You might be able to solve this with some kind of configuration that tells the system which keys to redact?

@bmbouter
Copy link
Member

bmbouter commented Sep 6, 2022

FWIW, I'm not entirely sure why we have write_only fields at all. I remember there was a lot of discussion about it, but really with RBAC in place, anyone who is authorized to read the credentials should be able to read the credentials. Maybe someone can refresh my memory here. The conclusion I've reached is that we shouldn't have write_only fields at all.

@ggainey
Copy link
Contributor Author

ggainey commented Sep 6, 2022

  1. Some way to validate the data right away when the SAM is created/updated:

The problem is that models often validate attributes against each other, using an existing object's values for partial updates. Proxy is a perfect example - you can change, for example, just the password, but only if the remote in question has a proxy-url and proxy-username . How do we do that a-priori?

The code as it is now, instantiates the serializer for each entity in turn, and then tells the serializer "this is a partial update" - which means "just proxy-password" works, as long as the remote already has proxy-username and proxy-url.

The change I'm just making now reports validation-errors back in the update-task's "error" field, as a dictionary of {entity-href: validation-error-string(s)}. It looks like this right this second:

...
error": {
        "/pulp/api/v3/remotes/rpm/rpm/7f1903ed-94ba-4c53-b81f-f79f7f5b8b30/": "Validation errors: {'non_field_errors': [ErrorDetail(string='proxy credentials cannot be specified without a proxy', code='invalid')]}"
    },
...

(I would like that to be a little less ugly, but the important thing is to return the specific error(s) for the specific entity)

  1. Some way to mask sensitive data on the API

I'm not sure how to "mask the data on the API" at the same time as satisfying #2824 (comment) . Either the admin can see the data (via the API) - or it's protected/hidden.

It's going to take some thinking to figure out how to know what attributes to not-show via the API in a general-purpose feature like this. Maybe a configurable list of attribute-names? And then the managed_attributes gets a serializer that returns a dictionary with "HIDDEN" for any attr in that list?

@ggainey
Copy link
Contributor Author

ggainey commented Sep 6, 2022

It's going to take some thinking to figure out how to know what attributes to not-show via the API in a general-purpose feature like this. Maybe a configurable list of attribute-names? And then the managed_attributes gets a serializer that returns a dictionary with "HIDDEN" for any attr in that list?

@newswangerd I just pushed a POC for this - see https://github.com/pulp/pulpcore/pull/3156/files#diff-a455308bb98b78dd42c2c7e60d0faa5293c3e651e465ff992c4ff6907d77a201R125 for the concept. Obviously, the 'sensitive_attributes' here would have to generalized/config-controlled or something, but this is the approach I'm thinking of.

Output of the API, and what is "actually in the DB", look like this:

$ http :/pulp/api/v3/shared-attribute-managers/
{
    "count": 1,
    "next": null,
    "previous": null,
    "results": [
        {
            "managed_attributes": {
                "download_concurrency": 3,
                "proxy_password": "HIDDEN"
            },
            "managed_entities": [
                "/pulp/api/v3/remotes/rpm/rpm/7f1903ed-94ba-4c53-b81f-f79f7f5b8b30/"
            ],
            "name": "SAM",
            "pulp_created": "2022-09-06T18:31:10.108752Z",
            "pulp_href": "/pulp/api/v3/shared-attribute-managers/c8748a9a-3f67-4371-8fde-a666a406419c/"
        }
    ]
}

$ pulpcore-manager shell
In [1]: from pulpcore.app.models.base import SharedAttributeManager
In [2]: a = SharedAttributeManager.objects.first()
In [3]: a.managed_attributes
Out[3]: {'proxy_password': 'blech', 'download_concurrency': 3}

ggainey added a commit to ggainey/pulp_file that referenced this pull request Sep 8, 2022
ggainey added a commit to ggainey/pulp_file that referenced this pull request Sep 8, 2022
@ggainey ggainey marked this pull request as draft September 8, 2022 18:57
@ggainey ggainey closed this Sep 8, 2022
@ggainey
Copy link
Contributor Author

ggainey commented Sep 8, 2022

Closing, see #2824 (comment)

@ggainey ggainey deleted the shared_config_manager branch July 18, 2023 12:46
@ggainey ggainey restored the shared_config_manager branch July 18, 2023 12:50
@ggainey ggainey deleted the shared_config_manager branch February 2, 2024 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

As a user I want to share remote configurations among several remotes.
4 participants