New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the SharedAttributeManager feature. #3156
Conversation
All paths work. See comments here for security concerns. The code is absolutely in Dire Need of eyes and suggestions. Test work in progress; currently; script I currently use to test the existing functionality is here: https://github.com/ggainey/pulp_startup/blob/main/sam/exercise |
b5aa68c
to
91820d2
Compare
648ae7d
to
a2f659a
Compare
a2f659a
to
ef2a1fb
Compare
pulpcore/app/models/base.py
Outdated
# TODO: Need this override because somewhere in tasking "assumes" everything can be cast() | ||
# need to go back and find it and open an issue | ||
def cast(self): | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this and see what happens? If it is an issue in practice let's solve it now instead of taking an issue on the backlog (to me). Regardless of resolution, knowing what happens will still be required pre-merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We fail in tasks/base at lines like https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py#L68 . There are four uses of cast() in that code that I haven't grokked, tbqh. Can they all be replaced with the construct mdellweg uses at https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py#L38 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the only line, it means that general_create
was limited to Master-Detail models. But we can certainly move the call to cast on it's own line with a Suppress(AttributeError)
context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are four places in https://github.com/pulp/pulpcore/blob/main/pulpcore/app/tasks/base.py where we cast() a result - 38, 68, 86, and 104. 104 protects it in what looks to be a reasonable way, which I've implemented 68 and 86 (missed 38 as you note below, sorry!) wdyt?
if not href: | ||
raise ValidationError("Must not be empty.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually needed? I thought DRF would handle this with the required=True
. Can you confirm?
managed_attributes = DictField( | ||
child=JSONField(), | ||
help_text=_("A JSON Object of the attributes and values being managed by this object."), | ||
required=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagined we would be requiring this right from the start. What's the use case you imagined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my head, I imagined "I know I'm going to want a 'manage my socket params' SAM, but I don't know what the exact attributes are yet; let's get one set up" was something an admin/user might think. wdyt?
managed_entities = ListField( | ||
child=CharField(), | ||
help_text=_("A list of HREFs of the entities being managed."), | ||
required=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagined we would be requiring this right from the start. What's the use case you imagined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one I feel more strongly about - "I know what my remote-cnx-info is going to be, I don't have the Remote yet" is def the kind of thing I would do when setting up an instance.
There is discussion on Matrix about having a SAM have its data be |
I left the bulk of my comments. The other major thing it needs (as chatted about on Matrix) is tests. Overall though, this is really, really great! |
6560f60
to
4eee512
Compare
instance = serializer.save().cast() | ||
instance = serializer.save() | ||
resource = CreatedResource(content_object=instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure about this. I think this will save a different thing in the content_object. It may not result in a different view to the user, because we call cast
again when we create the href from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense - how about this :
instance = serializer.save()
if isinstance(instance, MasterModel):
instance = instance.cast()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that i think of it... The serializer is detail already. So it's instance should be detail all the way down. Maybe it is proper to just not cast
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Publication and Distribution both use the AsyncCreateMixin. Walking through the Distribution call verifies you are correct, sir :)
|
||
name = models.TextField(db_index=True, unique=True) | ||
managed_attributes = EncryptedJSONField(blank=True, null=True) | ||
managed_entities = ArrayField(models.TextField(), null=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to break if you change your API_ROOT
for some reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of things that will break/not-work if you change API_ROOT on an established system (e.g., tasks record created-resources as pulp-hrefs) That may be something we have to Think Hard about - but I don't think this PR introduces it. Incorrect, that uses GenericRelationModel so it's an actual link to the object.
We could go that route; I was aiming for making SAM be only very-loosely-coupled to what it was/could manage. @bmbouter , what are your thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that this approach would break if you changed API_ROOT
. It's also (I think) true that there isn't any other part of Pulp that stores hrefs as a foreign key type data structure (I just looked). What would be ideal would be a GenericForiegnKey Many-To-Many implementation, but I don't see a way to do that that is supported by Django. So you'd end up basically doing a DIY implementation where the ArrayField stores tuples with one being an entry of the type (like how GenericForeignKey works) and the second being the pk into that table.
That would be slightly nicer for the user. Much less nice for @ggainey, but yes nicer for the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least we store task resources as href or arbitrary string. So changing API_ROOT
without draining the task queue first is terribly dangerous.
But yes, i agree, using GenericForeignKeys is a good choice. Having them in an ArrayField probably means that we cannot have them autodetatch from the SAM on deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying we can't use GenericForeignKeys because of the ManyToMany requirement. We'd have to DIY that basically which I'm uneasy about...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a similar many-to-many relationship with ObjectRoles. At least worth exploring.
I'm having issues with write only fields. Here's my SAM: {
"pulp_href": "/api/automation-hub/pulp/api/v3/shared-attribute-managers/c2b83fe5-1f66-4af6-8131-b0aceb995f4b/",
"pulp_created": "2022-09-06T15:17:47.869183Z",
"name": "test manager",
"managed_attributes": {
"download_concurrency": 3,
"proxy_password": "foo"
},
"managed_entities": [
"/api/automation-hub/pulp/api/v3/remotes/ansible/collection/591eab6b-db51-4910-b321-114dcd904a0a/",
"/api/automation-hub/pulp/api/v3/remotes/ansible/collection/e8ed8b84-ad77-4a6c-8782-156d4ffd87c8/"
]
} When I try to apply it
Removing the |
It's currently impossible to tell why the SAM failed to apply on the task page. Mine failed and I got this task: The progress report indicates that I have 2 failed resources, but I have no idea which one or why. {
"pulp_href": "/api/automation-hub/pulp/api/v3/tasks/6c7429e0-2e8b-4ebd-9aae-f37455696480/",
"pulp_created": "2022-09-06T15:18:05.680301Z",
"state": "completed",
"name": "pulpcore.app.tasks.sam.update_managed_entities",
"logging_cid": "",
"started_at": "2022-09-06T15:18:05.731353Z",
"finished_at": "2022-09-06T15:18:05.933563Z",
"error": null,
"worker": "/api/automation-hub/pulp/api/v3/workers/f1d11a2f-2365-4495-8e33-dcc7075d89e9/",
"parent_task": null,
"child_tasks": [],
"task_group": null,
"progress_reports": [
{
"message": "Updating Managed Entities",
"code": "sam.apply",
"state": "completed",
"total": 2,
"done": 2,
"suffix": null
},
{
"message": "Successful Updates",
"code": "sam.apply_success",
"state": "completed",
"total": 2,
"done": 0,
"suffix": null
},
{
"message": "Failed Updates",
"code": "sam.apply_failures",
"state": "completed",
"total": 2,
"done": 2,
"suffix": null
}
],
"created_resources": [],
"reserved_resources_record": [
"/api/automation-hub/pulp/api/v3/remotes/ansible/collection/591eab6b-db51-4910-b321-114dcd904a0a/",
"/api/automation-hub/pulp/api/v3/shared-attribute-managers/c2b83fe5-1f66-4af6-8131-b0aceb995f4b/",
"/api/automation-hub/pulp/api/v3/remotes/ansible/collection/e8ed8b84-ad77-4a6c-8782-156d4ffd87c8/"
]
} |
The immediate issue is that the remote-validator is working - the error (which you have no way of seeing) is
If you create a remote that has a proxy-url/user/password, you can change just one with a SAM. But you can't (or at least, this code works to make it hard to) create an "impossible" object. This def points out that logging needs to be WAY better, great test. I'll work on that this afternoon. |
@ggainey so, after some more testing, I think that for this to really be viable in hub we need two major changes:
When a user updates a SAM on via the UI, we need to be able to tell them that there is an issue and how to fix it, or it's going to result in a large number of customer support cases as well as lots of broken remotes. I'm not sure how to do this. One approach would be to get all the serializers for each distinct type in the
Encryption at rest is a good start, but we're also not allowed to expose secrets via the API. You might be able to solve this with some kind of configuration that tells the system which keys to redact? |
FWIW, I'm not entirely sure why we have write_only fields at all. I remember there was a lot of discussion about it, but really with RBAC in place, anyone who is authorized to read the credentials should be able to read the credentials. Maybe someone can refresh my memory here. The conclusion I've reached is that we shouldn't have write_only fields at all. |
The problem is that models often validate attributes against each other, using an existing object's values for partial updates. Proxy is a perfect example - you can change, for example, just the password, but only if the remote in question has a proxy-url and proxy-username . How do we do that a-priori? The code as it is now, instantiates the serializer for each entity in turn, and then tells the serializer "this is a partial update" - which means "just proxy-password" works, as long as the remote already has proxy-username and proxy-url. The change I'm just making now reports validation-errors back in the update-task's "error" field, as a dictionary of {entity-href: validation-error-string(s)}. It looks like this right this second:
(I would like that to be a little less ugly, but the important thing is to return the specific error(s) for the specific entity)
I'm not sure how to "mask the data on the API" at the same time as satisfying #2824 (comment) . Either the admin can see the data (via the API) - or it's protected/hidden. It's going to take some thinking to figure out how to know what attributes to not-show via the API in a general-purpose feature like this. Maybe a configurable list of attribute-names? And then the managed_attributes gets a serializer that returns a dictionary with "HIDDEN" for any attr in that list? |
4eee512
to
aa52e7c
Compare
@newswangerd I just pushed a POC for this - see https://github.com/pulp/pulpcore/pull/3156/files#diff-a455308bb98b78dd42c2c7e60d0faa5293c3e651e465ff992c4ff6907d77a201R125 for the concept. Obviously, the 'sensitive_attributes' here would have to generalized/config-controlled or something, but this is the approach I'm thinking of. Output of the API, and what is "actually in the DB", look like this:
|
aa52e7c
to
42bf6e8
Compare
[noissue] Required PR: pulp/pulpcore#3156
[noissue] Required PR: pulp/pulpcore#3156
fixes pulp#2824. [nocoverage]
42bf6e8
to
10d4e72
Compare
Closing, see #2824 (comment) |
fixes #2824.
[nocoverage]