Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear repo content type multi instance #40

Conversation

AvlWx2014
Copy link
Contributor

Overview

Allow the --content-type option to clear-repo to be specified multiple times. Each instance of --content-type still conforms to the current behavior of being a comma-separated list of content types.

Approach

Add a new argparse.Action subclass called SplitAndExtend that can be re-used for any situation like this and allows for delimiters other than "," to be used (though "," is the default). This Action class is now used in ClearRepo#add_args to configure the parser to accept multiple instances of --content-type

The ClearRepo#content_type property accessor has been updated so that it no longer has to do string splitting and instead just returns the accumulated list parsed from the command-line.

Multiple new test cases are added to cover some different situations that could arise when using the new Action, especially if the options list is being built programmatically.

One additional test case is added specifically for the clear-repo command to ensure it behaves as expected with multiple content types.

Closes #25.

@AvlWx2014
Copy link
Contributor Author

I see why the static checks failed. The analysis score regressed, and it's due to the addition of this warning from SplitAndExtend:

pubtools/_pulp/arguments.py:119:4: W1113: Keyword argument before variable positional arguments list in the definition of init function (keyword-arg-before-vararg)

Stylistically, I would prefer the allowed-in-Python3 way of being able to write the init method

def __init__(self, *args, split_on=",", **kwargs)

which is a syntax error in Python 2. However this way

def __init__(self, split_on=",", *args, **kwargs)

caused this warning in static analysis.

I suppose I could forgo a keyword argument, but I'd like to be able to keep it to separate it from the keyword args of the argparse API.

Copy link
Member

@rohanpm rohanpm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The analysis score regressed

FYI the outcome of the test isn't based on whether the score regressed, it's strictly based on whether there were any complaints of severity fatal, error or warning.

The pylint complaint is valid, isn't it? There wouldn't be a way to pass split_on as a keyword argument and also pass more positional arguments?

My suggestion would be one of:

(1) Extract split_on from kwargs

def __init__(self, *args, **kwargs):
    split_on = kwargs.pop('split_on', ',')

Or, (2) don't implement any split_on argument, just support ','. Other characters aren't needed to meet the requirements. They may never be needed; if they later are, maybe we've managed to drop python2 by that time so we'd be able to use py3-only syntax freely.

Anyway, it needs to be made to pass somehow.

A separate issue - maybe you were already planning to remove this, but can we please not have commits like "Trigger update on upstream pull request due to GitHub outage" in PRs? You should be able to get the same effect by just a "git commit --amend" with no content. Maybe adding "--reset-author" (which also resets timestamp) if the former doesn't seem to do the trick.

@AvlWx2014 AvlWx2014 force-pushed the clear-repo-content-type-multi-instance branch from abb3051 to d3ec452 Compare August 11, 2021 12:24
@AvlWx2014
Copy link
Contributor Author

The pylint complaint is valid, isn't it? There wouldn't be a way to pass split_on as a keyword argument and also pass more positional arguments?

I see now that you are correct. I had experimented with what I thought was a recreation of this in a 2.7 interpreter and I was able to call it and print all the arguments, but now I'm questioning my experiment. I have fixed this in the latest commit by taking your first suggestion above using kwargs.pop.

You should be able to get the same effect by just a "git commit --amend" with no content. Maybe adding "--reset-author" (which also resets timestamp) if the former doesn't seem to do the trick.

Thank you for the suggestion! I will use this approach in the future rather than git commit --allow-empty.

# calling filter(None, Iterable) is equivalent to
# filter(lambda item: bool(item), Iterable)
# i.e. yields items which evaluate to True.
split = list(filter(None, split))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this filtering?

# (1)
"value1," => ["value1"]
"a,,b" => ["a", "b"]

# (2)
"value1," => ["value1", ""]
"a,,b" => ["a", "", "b"]

As I understand it, this implements (1). Is there a reason the behavior in (1) is preferred over (2)?

I feel it's the other way around - (1) tries to guess what the caller probably meant, which is dangerous as the guess can be wrong. (2) accurately maintains the data passed in by the caller, who may have had a valid reason for passing an empty string. Generic code like this can't know that "silently ignore empty arguments" is the right thing to do for all use-cases.

If this was inspired by the [x for x in type_strs if x] or None in content_type which is being replaced - I think that code was just a way of making it return None if no argument was provided, it wasn't a goal to filter out empty strings passed by the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct this filtering implements (1) in your snippet. I noted in a comment within __call__ that the intent is to remove empty strings introduced by something like --content-type rpm, as you demonstrated. The motivation is to prevent "" from appearing in the list of content types.

While this code was not inspired by [x for x in type_strs if x] or None, it does have the same effect as the first operand in that expression. So with [x for x in type_strs if x] or None being replaced, removing the filter statement would introduce the possibility of empty strings in the list of content types going forward.

You make a good point that this code can't know that filtering empty arguments is the right thing to do in all cases, so perhaps removing the filtering is the correct thing to do overall. I just want to make it known that this will be new behavior.

Perhaps it makes more sense that if "" needs to be filtered out of the list of content types then that should be implemented in pubtools-pulplib by the Repository or Client for operations where that is the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps removing the filtering is the correct thing to do overall

Yeah, can we please do that? I'm concerned the filtering will have unintended consequences in the long term.

removing the filter statement would introduce the possibility of empty strings in the list of content types going forward

I'm not concerned about that - I'm pretty certain that, in the previous code:

        type_strs = (self.args.content_type or "").split(",")
        return [x for x in type_strs if x] or None

The filtering of empty values was just a shortcut to deal with the fact that the default value calculated by the first line would end up as [""] rather than []. If callers actually pass empty strings through here, I'd say it's acceptable to crash.

Copy link
Contributor Author

@AvlWx2014 AvlWx2014 Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering empty strings has been removed and the test cases updated to reflect this.

def test_split_and_extend_single_delimited_instance():
"""Test a single option instance, with all values in a delimited list."""
# test one instance, all delimited using different delimiters
for delimiter in ",.-/":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also consider using pytest.fixture or pytest.mark.parametrize for delimiter here.

The pytest.fixture approach might not be obvious... it'd look something like:

@pytest.fixture(params=[',', '.', '-', '/'])
def any_delimiter(request):
  return request.param

Then any test which accepts 'any_delimiter' as an argument will be invoked once for each possible value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much cleaner, great suggestion!

tests/shared/test_arguments.py Show resolved Hide resolved
assert args.option == FAKE_OPTION_VALUES


def test_split_and_extend_multiple_instances_with_trailing_delimiters(parser):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering of empty arguments seems like it'll affect more than just trailing delimiters, so it'd be nice to see tests for empty values appearing elsewhere as well, e.g. ",a,b" , "a,,b" .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and updated this test to use a variety of cases that would produce empty values in different places and quantities. I am happy to make separate test functions for each instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, if it were me I'd go the opposite direction and have more of these tests combined together and parametrized, such as:

@pytest.mark.parametrize("argv,expected_option", [
  (["--option", "x"], ["x"]),
  (["--option", "x,y"], ["x", "y"]),
  (["--option", "x", "--option", "y"], ["x", "y"]),
  (["--option", "x", "--option", "y,z"], ["x", "y", "z"]),
  (["--option", "x,y", "--option", "a,b"], ["x", "y", "a", "b"]),
  (["--option", "a,,b", "--option", "c,"], ["a", "", "", "b", "c", ""]),
])
def test_split_extend(argv, expected_option, parser, delimiter):
    parser.add_argument("--option", type=str, action=SplitAndExtend, split_on=delimiter)
    sys.argv = ["command"] + argv
    args = parser.parse_args()
    assert args.option == expected_option

i.e. there's just a mapping between input values and expected output values, and the test implementation is the same for all cases (you might only need different test functions if you have e.g. some crashing cases or something else where it doesn't work normally).

I find this easier to read, more maintainable and I think it can achieve greater coverage with less test code needed. This is all quite subjective though and I'm just happy if the relevant cases are tested at all, regardless of the style.

Copy link
Contributor Author

@AvlWx2014 AvlWx2014 Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, so I went ahead and updated the test code to make use of this.

I started down the path of adding a dependency on delimiter so that cases 1, 3, 4, and 5 in your example would use the given delimiter instead of "," e.g. a,,b, a--b, a//b, and a..b. The code started getting messy, so I elected for a second simple test function to test that SplitAndExtend can handle different delimiters for a basic case.

@AvlWx2014 AvlWx2014 force-pushed the clear-repo-content-type-multi-instance branch from 18cbc32 to f16e5fa Compare August 17, 2021 14:28
@rohanpm rohanpm merged commit 0289c44 into release-engineering:master Aug 17, 2021
@@ -73,6 +73,8 @@ def add_args(self):
"--content-type",
help="remove only content of these comma-separated type(s)",
type=str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't notice this until after submit - I wonder does the 'type' argument do anything now?

If it has any effect now, it's probably meant to be list. But maybe it just does nothing once you start using action.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like using type=str isnt needed since values are read in as strings by default, but I don't mind it. It doesn't hurt to be explicit about the type of the parsed values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the point is that parsed.content_type used to return a str, so that was consistent with type, but now it returns a list, so that's inconsistent with type.

Is it correct that type now means "type of elements within list"? Docs don't say anything about how type and action work together as far as I can tell.

If you find for example that setting type=int doesn't cause any tests to fail, I'd take that as evidence that the argument has no effect now and in that case I'd be in favor of removing it, because I think it just misleads the reader into thinking that it has some effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I see what you mean. ClearRepo.content_type returns a list or None as it did before, but ClearRepo.args.content_type now returns a list instead of a comma-separated str.

I did not check for access to ClearRepo.args.content_type directly, but no tests failed with that change so we might be safe.

I'd be a fan of adding an additional property to the ClearRepo API to return the comma-separated string if it's a common need. But I can also understand the case to remove the type argument all together for the reason you mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pubtools-pulp-clear-repo: Allow --content-type to be specified multiple times
2 participants