-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting up new flag --no-target-config to skip use of local config files if required #473
Conversation
Correct a number of defects with configuration file handling: * We never said anything one way or the other, but users assumed `--config` could be specified multiple times on the command line. This didn't generate warnings or errors, but only the last option was processed. Mark this option with `multiple=True` * Adjust `read_pyproject_toml()` to accept multiple config file inputs. * Put nearly entire function body into a loop on `value`, which now is a tuple instead of a string; early `return`s now are `continue`s * Function output, which used to be the name of the configuration file, now is a comma-delimited list of files (still a string); nothing actually uses this so it probably doesn't matter * When saving configuration to `ctx.default_map`, we concatenate list-valued members instead of just overwriting them * Also in `read_pyproject_toml()`, `target_args` did not include `"target"` (used by `scan-folder`) and consequently did not discover configuration files in the target folder. This has been fixed and commented a little more liberally to clarify that usage. Existing unit tests still pass (after adjusting to pass a tuple instead of a string) -- new unit tests need to be written but I have verified interactively that we now get the expected behavior.
Add tests to verify the asserted behavior of parameter loading from multiple configuration files
Explicitly describe the behavior when multiple configuration files are specified, and clarify (somewhat) the search behavior
The Good: use correct annotation for `value` parameter to indicate it is a tuple consisting of an indeterminate number of strings, instead of a tuple consisting of exactly one string. The Bad: fix some unit tests where I forgot to convert the `value` parameter into a tuple in the first place. The Ugly: `read_pyproject_toml()` has been systematically abusing click's context `default_map` by treating it like a dict even though it actually is defined as an (immutable) `Mapping`. It's been doing this for a long time, and evidently it works. A proper fix (probably using `params` instead of `default_map`) is not obvious to me -- I tried just switching to params, and it ate my updates. Therefore, we continue the existing abusive practice, document the problem, and add an annotation to silence the (valid) mypy error.
Eliminate an unused variable from one of the new unit tests
After some poking around, it is clear that the path to "local" configuration files lies from the scanner instances, which call `load_config_from_path()` directly, bypassing `read_pyproject_toml()` entirely. The latter appears to be used only for files specified on the command line with `--config`. Folder scanning (using class `FolderScanner`) did not ever look for configuration files. This class has no `load_repo()` method like the git scanners, so clone the relevant logic directly into the `__init__()` method. This appears after the parent init call, because it appears to be most congruent to the path to `load_repo()`, which is defined in the specific classes but called from the `ScannerBase` class init method. Somewhat similarly, pre-commit scanning (using class `GitPreCommitScanner`) did not ever look for configuration files. This class has a `load_repo()` method, but it didn't contain the appropriate logic. Clone logic into that, also.
Now that we know this is called only in response to use of `--config`, we can perform some logic reductions. * Because `ctx.params` is always empty, the `target_args` search always fails and `config_path` always gets set to the current directory. Just do that initially before entering the loop and remove the distracting pointless tests. * Because we are called only when the user specifies a file, the "fail silently on implicit search" logic never is triggered; remove it. We always raise an exception if the (exact as-specified) file is not found. * Similarly, if we barf on a file containing garbage, we know the name of the file, so no name construction is required. No change to unit tests except to eliminate a test that was evaluating behavior for one of the "will never happen" conditions.
Because we now know that `read_pyproject_toml()` only gets called once and does not need to accumulate data across multiple calls, we can clean up the context `default_map` usage. * Assume map is empty (`None`) on entry * Accumulate settings in a local `config_data` dict * Just before exiting, stuff everything into `default_map` as a complete replacement. Neither pylint nor mypy complain about this Replaces long "FIXME" comment about (immutable) mapping updates.
Why is every class doing almost identically the same thing to load configuration? Bleah. Add a `load_config()` method to the base `ScannerBase` class; this does the configuration lookup once, in one place, one way. For all of the derived classes, remove the open-coded crap (generally in `load_repo()` even though it has nothing to do with loading the repository) and instead make a call to `self.load_config()` from the end of the `__init__()` method. Note that we can't simply delegate this to the base class's initializer, because it needs to happen after the files exist, which is after `load_repo()` in the case of remote repositories.
It's not clear this really is a problem, but we had logic that preventing reading a "local" config file if we'd already processed it globally. However that logic was broken, and it got even more broken when we starting allowing multiple global configuration files. First, get rid of return value from `read_pyproject_toml()` -- which was unrecoverable -- in favor of stashing a set of processed configuration files globally. Make these fully-resolved Path objects to avoid any ambiguity in what was or wasn't referenced. While we're at it, junk the return value which provided nothing useful. Also while we're at it, use this ourselves and avoid reusing a file if the user says something stupid like `--config foo --config foo`. I almost added a test to avoid re-reading a file in the case where the user says `--config foo --config foo`, but I didn't do that so that the scenario `--config a --config b --config a` preserves "last reference wins" semantics, which wouldn't be the case if we short-circuited the second read of a because we'd seen it already. Second, replace the test against the global options -- which had the wrong type, and which had what the user typed instead of the resolved name, and test for membership in the processed set instead.
Significant extend and reword (again) the material on configuration processing, hopefully clarifying how configuration discovery works and how options from multiple files are merged.
The call to `load_confg()` must occur *BEFORE* calling `load_repo()`. That's because typically the latter routine will call `filter_submodules()`, which as it hamhandedly generates new filter expressions, causes processing and memoization of some of the configuration settings (i.e. `exclude_path_patterns`) with the result that confguration loads after this point basically get ignored. Move up the configuration load to avoid this problem, and put a hateful FIXME comment in the submodule kludging. I didn't do more because that would have involved fixing a bunch of related unit tests. :-p
2ceaf75
to
d3cc509
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see this tweaked so the flag is "local_config" with default True; more notes below where the definition is made. Obviously other references to it would change with respect to the name and the sense.
tartufo/cli.py
Outdated
@click.option( | ||
"--no-local-config", | ||
is_flag=True, | ||
default=False, | ||
show_default=True, | ||
help="Skipping auto discovery and use of local config file", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would look more like the other options if you structured this as --local-config/--no-local-config
with the default being True
(i.e. --local-config
) -- look at the immediately following quiet definition for an example.
Which config file exactly does this prevent using? For example in this case: I ask because looking at #469 (that I believe this is meant to address) , the command is |
13f5980
to
9292f6f
Compare
It prevents processing of config file in the directory that is being scanned whether it be local or remote |
i.e. config files like tartufo.toml or pyproject.toml | ||
setup in local working directory | ||
setup in the repository being scanned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this flag apply to the scan-folder
subcommand as well, or only scan-local-repo
and scan-remote-repo
? What about pre-commit
? Depending on which it applies to, the wording could potentially be improved here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should apply to both scan-folder
and pre-commit
along with scan-local-repo
and scan-remote-repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I would argue that this should not say the repository being scanned
then, since it applies to scan-folder
too. Maybe just the repository/folder
being scanned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setup in the repository being scanned | |
setup in the repository or folder being scanned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just beating on a dead horse at this point. It's not "config files like tartufo.toml or pyproject.toml", it's exactly "tartufo.toml or pyproject.toml in the target root directory". And I am not necessarily a fan of the phrase "target root directory" and would be happy to see a better term. But it'll never be anything except one of those two files, whichever is found first, in the order listed.
Thanks, yeah I find that wording much more clear. I'm curious if you have thought about there being another (separate) flag that prevents processing the current-directory config as well? For |
As far as I understand there are only two ways of processing config files one in the target directory that is being scanned which is enabled by default and now can be disabled by using |
If you don't specify I have no objections at all with that being addressed in a separate PR though, if it even is something that should have a flag (I'd say yes). Just thought it seemed related enough to mention. (To be clear, I'm just an outside observer anyway, I'm not a maintainer, so even if I did have objections they wouldn't necessarily mean much). |
@@ -87,6 +87,12 @@ Options: | |||
specified multiple times. | |||
--config FILE Read configuration from specified file. | |||
[default: tartufo.toml] | |||
--target-config/--no-target-config BOOL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--target-config/--no-target-config BOOL | |
--target-config/--no-target-config |
I think this better aligns with the usage documentation for eg --scan-filenames / --no-scan-filenames
(in that it doesn't imply that a boolean value is to be specified after the argument name)
repository being scanned | ||
i.e. config files like tartufo.toml or pyproject.toml | ||
setup in the repository being scanned | ||
[default: local-config] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[default: local-config] | |
[default: target-config] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the twiddly stuff, I think this would be a lot clearer if we could filter out the garbage from the "multiple config" PR that's already been merged. Rebasing (against the squash merge) might be nasty -- can you perhaps just cherry-pick your final commits into a separate clean branch and submit a new PR based on that? Whatever is easiest for you.
@@ -2,8 +2,21 @@ vX.X.X - Mar 3 2023 | |||
-------------------- | |||
|
|||
Features: | |||
* [#473](https://github.com/godaddy/tartufo/pull/473) - Introduces new flag `-target-config/--no-target-config` to process or skip using config file in the repository being scanned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* [#473](https://github.com/godaddy/tartufo/pull/473) - Introduces new flag `-target-config/--no-target-config` to process or skip using config file in the repository being scanned | |
* [#473](https://github.com/godaddy/tartufo/pull/473) - Introduces new flag `--target-config/--no-target-config` to process or skip using config file in the repository being scanned |
i.e. config files like tartufo.toml or pyproject.toml | ||
setup in local working directory | ||
setup in the repository being scanned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just beating on a dead horse at this point. It's not "config files like tartufo.toml or pyproject.toml", it's exactly "tartufo.toml or pyproject.toml in the target root directory". And I am not necessarily a fan of the phrase "target root directory" and would be happy to see a better term. But it'll never be anything except one of those two files, whichever is found first, in the order listed.
|
||
tartufo --config myconfig.toml scan-local-repo . | ||
|
||
``tartufo`` will look for ``myconfig.toml`` in the current directory, and then | ||
in the parent directory if it does not exist in ``.``, etc., and then look for | ||
either ``tartufo.toml`` or (if not found) ``pyproject.toml`` in the current | ||
directory (only) because that is the target of the scan. Directives in, say, | ||
``tartufo.toml`` would supersede settings in ``myconfig.toml``. | ||
|
||
For purposes of this scan, empty files are considered not to exist. | ||
|
||
However: | ||
|
||
.. code-block:: shell | ||
|
||
tartufo --config tartufo.toml --config myconfig.toml scan-local-repo . | ||
|
||
will cause ``tartufo`` to read ``tartufo.toml`` in the current directory | ||
(assuming it exists, and in parent directories otherwise), and then ``myconfig.toml`` | ||
as above, and then either ``tartufo.toml`` or ``pyproject.toml`` in the current | ||
directory (only) because that is the target of the scan. | ||
|
||
If ``tartufo.toml`` exists in the current directory, the effect is that | ||
``tartufo.toml`` is read first, ``myconfig.toml`` is read second (possibly | ||
overriding directives), and ``tartufo.toml`` is not read again because it was | ||
processed already (and any ``pyproject.toml`` is ignored because ``tartufo.toml`` | ||
was found). | ||
|
||
If ``tartufo.toml`` is not present but exists in the parent directory, then the | ||
effect is that ``../tartufo.toml`` is read first, ``myconfig.toml`` is read next, | ||
and ``pyproject.toml`` (if it exists) would be processed last because it is in | ||
the scan target and ``tartufo.toml`` does not exist in the scan root. | ||
|
||
Note this behavior is not applicable to remote repository scans, because the | ||
remote repository will be cloned to a scratch directory and neither that directory | ||
nor the configuration files within it are available to specify with ``-config``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection, I think this example would be better if we specified some other target directory that is not .
and then replace instances of "current directory" with "target" or similar. What I wrote here (from previous PR) is technically correct, but I think a lot of people see "current directory" and think the current directory is magic, without getting the point that it's true here only because the current directory is also the target directory.
Closing this PR as it was branched out of a feature branch and rebase seems to be difficult now. |
To help us get this pull request reviewed and merged quickly, please be sure to include the following items:
PR Type
What kind of change does this PR introduce?
Backward Compatibility
Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?
Issue Linking
fixes #469
What's new?
--no-target-config
to skip processing of repository's config file if needed.