Skip to content

Conversation

@AvlWx2014
Copy link
Contributor

Overview

Fix a bug in Repository#remove_content causing all content to be removed from a repository, even when one or more content type IDs are passed to it.

From the Pulp 2.x docs, the body of an unassociate request should be a criteria object of which type_ids is a field.

The current implementation adds

{"type_ids": [...]}

to the request body, when it should be

{"criteria": {"type_ids": [...]}}

This is also consistent with other parts of the codebase e.g. the Search API, which correctly adds the type_ids field to a parent criteria object.

Approach

Rather than working with raw dicts and hard-coded string keys, my initial approach was to model the criteria object with a new data class called CriteriaDocument that encapsulates necessary fields, and knows how to serialize itself to JSON for the request body.

Eventually I noticed that the search API uses existing Criteria and Matcher classes, as well as some helper functions to do the same thing. In fact, there is an open issue that asks to extend Repository#remove_content to accept and handle Criteria. After seeing this, it made sense to try and use existing APIs to fix this bug and start moving in that direction.

Unfortunately, there seems to be some complex handling of Criteria in the search API in order to properly serialize them for client requests due to the need to incorporate MongoDB query syntax. It is unclear how much of that extra handling of Criteria is necessary to properly extend remove_content to accept Criteria. If most or all of the same handling is needed, then it would be beneficial to extract the business logic to a couple classes, or to possibly rename some existing helper functions (current helpers are named specifically for the search API) to avoid duplicating code and/or sewing confusion (e.g. remove_content using a function called search_for_criteria is a bit confusing).

The updated implementation here prefers a Criteria object passed in using the criteria kwarg, but falls back to the old type_ids kwarg if no Criteria is present. Some changes to both Client and FakeClient were made to work with a Criteria instance rather than a list of content type IDs. test_remove_content.py needed to be updated to reflect the correct expected value of the unassociate request body.

Copy link
Contributor

@rohanpm rohanpm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see one issue with the code changes, but there are some issues with the PR:

  • CI fails - ignore this for now, I think it might be bogus, I'll investigate it a bit.
  • the chain of commits has some ideas you tried and then abandoned. Can you please revise the history to drop some of those commits? e.g. "git rebase -i"
  • formatting changes are being mixed with code changes. Can you please avoid that? Note, "git add -p" is helpful to pick which parts of a file to commit.

With respect to "organize imports" in particular, I'm not confident that what pycharm does will be the same as what other tools and IDEs are doing. If you want to apply some automated sorting of the imports, can you please instead do it by 'isort' and integrating that with pre-commit tool - here's an example, https://github.com/release-engineering/pubtools/blob/main/.pre-commit-config.yaml . It means all the conventions for source code formatting in the repo are under the control of pre-commit, and everyone can follow them regardless of IDE as the exact tools and versions are encoded in the config.

If you want to pursue that I'd suggest opening a separate PR which could be submitted first.

# (more generically named) functions or a class to avoid duplicating code.
# for reference see search_content, _impl.client.Client._search_repo_units,
# _impl.client.Client._search, and _impl.client.search.search_for_criteria
criteria = kwargs.get("criteria")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally making use of a 'criteria' object seems like a good idea, but I'd rather we don't accept a criteria argument to this function yet. Can we please drop this so the "if criteria is None" code path is the only path?

This is public API, if we support 'criteria' we need to at least document it, also document and test which types do/don't work. I think it's premature to do that if we're not ready to support it "properly" yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I have made the change to drop a criteria kwarg and instead stick with type_ids.

@rohanpm
Copy link
Contributor

rohanpm commented Aug 4, 2021

  • CI fails - ignore this for now, I think it might be bogus, I'll investigate it a bit.

Should be fixed by #106 .

@AvlWx2014 AvlWx2014 force-pushed the bugfix-clear-repo-content-type branch from e08cdae to 9ed4a92 Compare August 6, 2021 12:16
@AvlWx2014
Copy link
Contributor Author

With respect to "organize imports" in particular, I'm not confident that what pycharm does will be the same as what other tools and IDEs are doing. If you want to apply some automated sorting of the imports, can you please instead do it by 'isort' and integrating that with pre-commit tool

I believe PyCharm by default organizes imports alphabetically, and groups them based on PEP-8 guidelines. I would hope other IDEs also do this too, but I agree it makes more sense to use a standard tool (which looks like it does the same thing) so I will take a look at adding isort to the pre-commit config. Happily, it seems isort has tried very hard to be compatible with Black, which is a plus.

@rohanpm
Copy link
Contributor

rohanpm commented Aug 8, 2021

Since #106 was submitted, could you please rebase this to get a clean Travis CI run?

If `type_ids` keyword arg is passed in to `Repository#remove_content` then wrap the list of content type IDs in a `Criteria` object using `_content_type_id` as the field name. This way, when `search_for_criteria` is called, a `PulpSearch` object with its `type_ids` field properly filled out is returned. This can be used to fill out the JSON body of the `unassociate` POST request.
@AvlWx2014 AvlWx2014 force-pushed the bugfix-clear-repo-content-type branch from 94f0b1a to dbc3fdd Compare August 9, 2021 12:56
@rohanpm rohanpm self-requested a review August 10, 2021 04:48
@AvlWx2014
Copy link
Contributor Author

@rohanpm et al - I do not appear to have permissions to perform the merge so feel free to pull the trigger when you think is appropriate

@rohanpm rohanpm merged commit 2f22608 into release-engineering:master Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants