Clear repo content type multi instance #40

AvlWx2014 · 2021-08-10T15:03:33Z

Overview

Allow the --content-type option to clear-repo to be specified multiple times. Each instance of --content-type still conforms to the current behavior of being a comma-separated list of content types.

Approach

Add a new argparse.Action subclass called SplitAndExtend that can be re-used for any situation like this and allows for delimiters other than "," to be used (though "," is the default). This Action class is now used in ClearRepo#add_args to configure the parser to accept multiple instances of --content-type

The ClearRepo#content_type property accessor has been updated so that it no longer has to do string splitting and instead just returns the accumulated list parsed from the command-line.

Multiple new test cases are added to cover some different situations that could arise when using the new Action, especially if the options list is being built programmatically.

One additional test case is added specifically for the clear-repo command to ensure it behaves as expected with multiple content types.

Closes #25.

AvlWx2014 · 2021-08-10T22:45:49Z

I see why the static checks failed. The analysis score regressed, and it's due to the addition of this warning from SplitAndExtend:

pubtools/_pulp/arguments.py:119:4: W1113: Keyword argument before variable positional arguments list in the definition of init function (keyword-arg-before-vararg)

Stylistically, I would prefer the allowed-in-Python3 way of being able to write the init method

def __init__(self, *args, split_on=",", **kwargs)

which is a syntax error in Python 2. However this way

def __init__(self, split_on=",", *args, **kwargs)

caused this warning in static analysis.

I suppose I could forgo a keyword argument, but I'd like to be able to keep it to separate it from the keyword args of the argparse API.

rohanpm

The analysis score regressed

FYI the outcome of the test isn't based on whether the score regressed, it's strictly based on whether there were any complaints of severity fatal, error or warning.

The pylint complaint is valid, isn't it? There wouldn't be a way to pass split_on as a keyword argument and also pass more positional arguments?

My suggestion would be one of:

(1) Extract split_on from kwargs

def __init__(self, *args, **kwargs):
    split_on = kwargs.pop('split_on', ',')

Or, (2) don't implement any split_on argument, just support ','. Other characters aren't needed to meet the requirements. They may never be needed; if they later are, maybe we've managed to drop python2 by that time so we'd be able to use py3-only syntax freely.

Anyway, it needs to be made to pass somehow.

A separate issue - maybe you were already planning to remove this, but can we please not have commits like "Trigger update on upstream pull request due to GitHub outage" in PRs? You should be able to get the same effect by just a "git commit --amend" with no content. Maybe adding "--reset-author" (which also resets timestamp) if the former doesn't seem to do the trick.

Add an argparse Action subclass to help accumulate multiple instances of a command-line option in to a list.

AvlWx2014 · 2021-08-11T12:58:20Z

The pylint complaint is valid, isn't it? There wouldn't be a way to pass split_on as a keyword argument and also pass more positional arguments?

I see now that you are correct. I had experimented with what I thought was a recreation of this in a 2.7 interpreter and I was able to call it and print all the arguments, but now I'm questioning my experiment. I have fixed this in the latest commit by taking your first suggestion above using kwargs.pop.

You should be able to get the same effect by just a "git commit --amend" with no content. Maybe adding "--reset-author" (which also resets timestamp) if the former doesn't seem to do the trick.

Thank you for the suggestion! I will use this approach in the future rather than git commit --allow-empty.

rohanpm · 2021-08-16T05:27:14Z

pubtools/_pulp/arguments.py

+        # calling filter(None, Iterable) is equivalent to
+        # filter(lambda item: bool(item), Iterable)
+        # i.e. yields items which evaluate to True.
+        split = list(filter(None, split))


Why do we do this filtering?

# (1) "value1," => ["value1"] "a,,b" => ["a", "b"] # (2) "value1," => ["value1", ""] "a,,b" => ["a", "", "b"]

As I understand it, this implements (1). Is there a reason the behavior in (1) is preferred over (2)?

I feel it's the other way around - (1) tries to guess what the caller probably meant, which is dangerous as the guess can be wrong. (2) accurately maintains the data passed in by the caller, who may have had a valid reason for passing an empty string. Generic code like this can't know that "silently ignore empty arguments" is the right thing to do for all use-cases.

If this was inspired by the [x for x in type_strs if x] or None in content_type which is being replaced - I think that code was just a way of making it return None if no argument was provided, it wasn't a goal to filter out empty strings passed by the caller.

You are correct this filtering implements (1) in your snippet. I noted in a comment within __call__ that the intent is to remove empty strings introduced by something like --content-type rpm, as you demonstrated. The motivation is to prevent "" from appearing in the list of content types.

While this code was not inspired by [x for x in type_strs if x] or None, it does have the same effect as the first operand in that expression. So with [x for x in type_strs if x] or None being replaced, removing the filter statement would introduce the possibility of empty strings in the list of content types going forward.

You make a good point that this code can't know that filtering empty arguments is the right thing to do in all cases, so perhaps removing the filtering is the correct thing to do overall. I just want to make it known that this will be new behavior.

Perhaps it makes more sense that if "" needs to be filtered out of the list of content types then that should be implemented in pubtools-pulplib by the Repository or Client for operations where that is the case.

perhaps removing the filtering is the correct thing to do overall

Yeah, can we please do that? I'm concerned the filtering will have unintended consequences in the long term.

removing the filter statement would introduce the possibility of empty strings in the list of content types going forward

I'm not concerned about that - I'm pretty certain that, in the previous code:

type_strs = (self.args.content_type or "").split(",") return [x for x in type_strs if x] or None

The filtering of empty values was just a shortcut to deal with the fact that the default value calculated by the first line would end up as [""] rather than []. If callers actually pass empty strings through here, I'd say it's acceptable to crash.

Filtering empty strings has been removed and the test cases updated to reflect this.

rohanpm · 2021-08-16T05:29:56Z

tests/shared/test_arguments.py

+def test_split_and_extend_single_delimited_instance():
+    """Test a single option instance, with all values in a delimited list."""
+    # test one instance, all delimited using different delimiters
+    for delimiter in ",.-/":


You could also consider using pytest.fixture or pytest.mark.parametrize for delimiter here.

The pytest.fixture approach might not be obvious... it'd look something like:

@pytest.fixture(params=[',', '.', '-', '/']) def any_delimiter(request): return request.param

Then any test which accepts 'any_delimiter' as an argument will be invoked once for each possible value.

This is much cleaner, great suggestion!

tests/shared/test_arguments.py

rohanpm · 2021-08-16T05:34:37Z

tests/shared/test_arguments.py

+        assert args.option == FAKE_OPTION_VALUES
+
+
+def test_split_and_extend_multiple_instances_with_trailing_delimiters(parser):


The filtering of empty arguments seems like it'll affect more than just trailing delimiters, so it'd be nice to see tests for empty values appearing elsewhere as well, e.g. ",a,b" , "a,,b" .

I went ahead and updated this test to use a variety of cases that would produce empty values in different places and quantities. I am happy to make separate test functions for each instead.

Honestly, if it were me I'd go the opposite direction and have more of these tests combined together and parametrized, such as:

@pytest.mark.parametrize("argv,expected_option", [ (["--option", "x"], ["x"]), (["--option", "x,y"], ["x", "y"]), (["--option", "x", "--option", "y"], ["x", "y"]), (["--option", "x", "--option", "y,z"], ["x", "y", "z"]), (["--option", "x,y", "--option", "a,b"], ["x", "y", "a", "b"]), (["--option", "a,,b", "--option", "c,"], ["a", "", "", "b", "c", ""]), ]) def test_split_extend(argv, expected_option, parser, delimiter): parser.add_argument("--option", type=str, action=SplitAndExtend, split_on=delimiter) sys.argv = ["command"] + argv args = parser.parse_args() assert args.option == expected_option

i.e. there's just a mapping between input values and expected output values, and the test implementation is the same for all cases (you might only need different test functions if you have e.g. some crashing cases or something else where it doesn't work normally).

I find this easier to read, more maintainable and I think it can achieve greater coverage with less test code needed. This is all quite subjective though and I'm just happy if the relevant cases are tested at all, regardless of the style.

I agree with you, so I went ahead and updated the test code to make use of this.

I started down the path of adding a dependency on delimiter so that cases 1, 3, 4, and 5 in your example would use the given delimiter instead of "," e.g. a,,b, a--b, a//b, and a..b. The code started getting messy, so I elected for a second simple test function to test that SplitAndExtend can handle different delimiters for a basic case.

rohanpm · 2021-08-17T22:26:13Z

pubtools/_pulp/tasks/clear_repo.py

@@ -73,6 +73,8 @@ def add_args(self):
            "--content-type",
            help="remove only content of these comma-separated type(s)",
            type=str,


Didn't notice this until after submit - I wonder does the 'type' argument do anything now?

If it has any effect now, it's probably meant to be list. But maybe it just does nothing once you start using action.

It looks like using type=str isnt needed since values are read in as strings by default, but I don't mind it. It doesn't hurt to be explicit about the type of the parsed values.

Well, the point is that parsed.content_type used to return a str, so that was consistent with type, but now it returns a list, so that's inconsistent with type.

Is it correct that type now means "type of elements within list"? Docs don't say anything about how type and action work together as far as I can tell.

If you find for example that setting type=int doesn't cause any tests to fail, I'd take that as evidence that the argument has no effect now and in that case I'd be in favor of removing it, because I think it just misleads the reader into thinking that it has some effect.

I think I see what you mean. ClearRepo.content_type returns a list or None as it did before, but ClearRepo.args.content_type now returns a list instead of a comma-separated str.

I did not check for access to ClearRepo.args.content_type directly, but no tests failed with that change so we might be safe.

I'd be a fan of adding an additional property to the ClearRepo API to return the comma-separated string if it's a common need. But I can also understand the case to remove the type argument all together for the reason you mentioned.

AvlWx2014 requested review from rajulkumar and rohanpm as code owners August 10, 2021 15:03

rohanpm requested changes Aug 10, 2021

View reviewed changes

AvlWx2014 added 6 commits August 11, 2021 08:23

Add .idea/ directory to gitignore for PyCharm users

5ce55f6

Add SplitAndExtend Action for argparse

e868182

Add an argparse Action subclass to help accumulate multiple instances of a command-line option in to a list.

Add test cases for new argparse Action

ce5d340

Incorporate SplitAndExtend to support multiple content-type options

41075fe

Add test case for clear-repo using multiple content-type instances

7a3061c

Fix SplitAndExtend constructor to work for Python 2 and 3 syntax

d3ec452

AvlWx2014 force-pushed the clear-repo-content-type-multi-instance branch from abb3051 to d3ec452 Compare August 11, 2021 12:24

rohanpm reviewed Aug 16, 2021

View reviewed changes

AvlWx2014 added 3 commits August 16, 2021 12:25

Update test cases to make use of paramtrization rather than loops

3460b06

Remove filtering of empty strings in SplitAndExtend

8d747c9

Consolidate test_arguments code using parametrization

f16e5fa

AvlWx2014 force-pushed the clear-repo-content-type-multi-instance branch from 18cbc32 to f16e5fa Compare August 17, 2021 14:28

rohanpm approved these changes Aug 17, 2021

View reviewed changes

rohanpm merged commit 0289c44 into release-engineering:master Aug 17, 2021

rohanpm reviewed Aug 17, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear repo content type multi instance #40

Clear repo content type multi instance #40

AvlWx2014 commented Aug 10, 2021

AvlWx2014 commented Aug 10, 2021

rohanpm left a comment

AvlWx2014 commented Aug 11, 2021

rohanpm Aug 16, 2021

AvlWx2014 Aug 16, 2021

rohanpm Aug 16, 2021

AvlWx2014 Aug 17, 2021 •

edited

rohanpm Aug 16, 2021

AvlWx2014 Aug 16, 2021

rohanpm Aug 16, 2021

AvlWx2014 Aug 16, 2021

rohanpm Aug 16, 2021

AvlWx2014 Aug 17, 2021 •

edited

rohanpm Aug 17, 2021

AvlWx2014 Aug 17, 2021

rohanpm Aug 17, 2021

AvlWx2014 Aug 18, 2021

		assert args.option == FAKE_OPTION_VALUES


		def test_split_and_extend_multiple_instances_with_trailing_delimiters(parser):

Clear repo content type multi instance #40

Clear repo content type multi instance #40

Conversation

AvlWx2014 commented Aug 10, 2021

Overview

Approach

AvlWx2014 commented Aug 10, 2021

rohanpm left a comment

Choose a reason for hiding this comment

AvlWx2014 commented Aug 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AvlWx2014 Aug 17, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AvlWx2014 Aug 17, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AvlWx2014 Aug 17, 2021 •

edited

AvlWx2014 Aug 17, 2021 •

edited