Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting up new flag --no-target-config to skip use of local config files if required #473

Closed
wants to merge 18 commits into from

Conversation

emayuri-godaddy
Copy link
Contributor

@emayuri-godaddy emayuri-godaddy commented Mar 23, 2023

To help us get this pull request reviewed and merged quickly, please be sure to include the following items:

  • Tests (if applicable)
  • Documentation (if applicable)
  • Changelog entry
  • A full explanation here in the PR description of the work done

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Backward Compatibility

Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?

  • Yes (backward compatible)
  • No (breaking changes)

Issue Linking

fixes #469

What's new?

  • Adding a new flag --no-target-config to skip processing of repository's config file if needed.

rscottbailey and others added 15 commits March 21, 2023 10:16
Correct a number of defects with configuration file handling:

* We never said anything one way or the other, but users assumed
  `--config` could be specified multiple times on the command line.
  This didn't generate warnings or errors, but only the last option
  was processed. Mark this option with `multiple=True`
* Adjust `read_pyproject_toml()` to accept multiple config file inputs.
  * Put nearly entire function body into a loop on `value`, which now
    is a tuple instead of a string; early `return`s now are `continue`s
  * Function output, which used to be the name of the configuration
    file, now is a comma-delimited list of files (still a string);
    nothing actually uses this so it probably doesn't matter
  * When saving configuration to `ctx.default_map`, we concatenate
    list-valued members instead of just overwriting them
* Also in `read_pyproject_toml()`, `target_args` did not include
  `"target"` (used by `scan-folder`) and consequently did not discover
  configuration files in the target folder. This has been fixed and
  commented a little more liberally to clarify that usage.

Existing unit tests still pass (after adjusting to pass a tuple instead
of a string) -- new unit tests need to be written but I have verified
interactively that we now get the expected behavior.
Add tests to verify the asserted behavior of parameter loading from
multiple configuration files
Explicitly describe the behavior when multiple configuration files are
specified, and clarify (somewhat) the search behavior
The Good: use correct annotation for `value` parameter to indicate it is
a tuple consisting of an indeterminate number of strings, instead of a
tuple consisting of exactly one string.

The Bad: fix some unit tests where I forgot to convert the `value`
parameter into a tuple in the first place.

The Ugly: `read_pyproject_toml()` has been systematically abusing
click's context `default_map` by treating it like a dict even though it
actually is defined as an (immutable) `Mapping`. It's been doing this
for a long time, and evidently it works. A proper fix (probably using
`params` instead of `default_map`) is not obvious to me -- I tried just
switching to params, and it ate my updates. Therefore, we continue the
existing abusive practice, document the problem, and add an annotation
to silence the (valid) mypy error.
Eliminate an unused variable from one of the new unit tests
After some poking around, it is clear that the path to "local"
configuration files lies from the scanner instances, which call
`load_config_from_path()` directly, bypassing `read_pyproject_toml()`
entirely. The latter appears to be used only for files specified on the
command line with `--config`.

Folder scanning (using class `FolderScanner`) did not ever look for
configuration files. This class has no `load_repo()` method like the git
scanners, so clone the relevant logic directly into the `__init__()`
method. This appears after the parent init call, because it appears to
be most congruent to the path to `load_repo()`, which is defined in the
specific classes but called from the `ScannerBase` class init method.

Somewhat similarly, pre-commit scanning (using class
`GitPreCommitScanner`) did not ever look for configuration files. This
class has a `load_repo()` method, but it didn't contain the appropriate
logic. Clone logic into that, also.
Now that we know this is called only in response to use of `--config`,
we can perform some logic reductions.

* Because `ctx.params` is always empty, the `target_args` search always
  fails and `config_path` always gets set to the current directory. Just
  do that initially before entering the loop and remove the distracting
  pointless tests.
* Because we are called only when the user specifies a file, the "fail
  silently on implicit search" logic never is triggered; remove it. We
  always raise an exception if the (exact as-specified) file is not
  found.
* Similarly, if we barf on a file containing garbage, we know the name
  of the file, so no name construction is required.

No change to unit tests except to eliminate a test that was evaluating
behavior for one of the "will never happen" conditions.
Because we now know that `read_pyproject_toml()` only gets called once
and does not need to accumulate data across multiple calls, we can clean
up the context `default_map` usage.

* Assume map is empty (`None`) on entry
* Accumulate settings in a local `config_data` dict
* Just before exiting, stuff everything into `default_map` as a complete
  replacement. Neither pylint nor mypy complain about this

Replaces long "FIXME" comment about (immutable) mapping updates.
Why is every class doing almost identically the same thing to load
configuration? Bleah.

Add a `load_config()` method to the base `ScannerBase` class; this does
the configuration lookup once, in one place, one way.

For all of the derived classes, remove the open-coded crap (generally in
`load_repo()` even though it has nothing to do with loading the
repository) and instead make a call to `self.load_config()` from the end
of the `__init__()` method. Note that we can't simply delegate this to
the base class's initializer, because it needs to happen after the files
exist, which is after `load_repo()` in the case of remote repositories.
It's not clear this really is a problem, but we had logic that
preventing reading a "local" config file if we'd already processed it
globally. However that logic was broken, and it got even more broken
when we starting allowing multiple global configuration files.

First, get rid of return value from `read_pyproject_toml()` -- which was
unrecoverable -- in favor of stashing a set of processed configuration
files globally. Make these fully-resolved Path objects to avoid any
ambiguity in what was or wasn't referenced.

While we're at it, junk the return value which provided nothing useful.
Also while we're at it, use this ourselves and avoid reusing a file if
the user says something stupid like `--config foo --config foo`.

I almost added a test to avoid re-reading a file in the case where the
user says `--config foo --config foo`, but I didn't do that so that the
scenario `--config a --config b --config a` preserves "last reference
wins" semantics, which wouldn't be the case if we short-circuited the
second read of a because we'd seen it already.

Second, replace the test against the global options -- which had the
wrong type, and which had what the user typed instead of the resolved
name, and test for membership in the processed set instead.
Significant extend and reword (again) the material on configuration
processing, hopefully clarifying how configuration discovery works and
how options from multiple files are merged.
The call to `load_confg()` must occur *BEFORE* calling `load_repo()`.
That's because typically the latter routine will call
`filter_submodules()`, which as it hamhandedly generates new filter
expressions, causes processing and memoization of some of the
configuration settings (i.e. `exclude_path_patterns`) with the result
that confguration loads after this point basically get ignored.

Move up the configuration load to avoid this problem, and put a hateful
FIXME comment in the submodule kludging. I didn't do more because that
would have involved fixing a bunch of related unit tests. :-p
Copy link
Contributor

@rbailey-godaddy rbailey-godaddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see this tweaked so the flag is "local_config" with default True; more notes below where the definition is made. Obviously other references to it would change with respect to the name and the sense.

tartufo/cli.py Outdated
Comment on lines 255 to 261
@click.option(
"--no-local-config",
is_flag=True,
default=False,
show_default=True,
help="Skipping auto discovery and use of local config file",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would look more like the other options if you structured this as --local-config/--no-local-config with the default being True (i.e. --local-config) -- look at the immediately following quiet definition for an example.

@jsimmons1-godaddy
Copy link

jsimmons1-godaddy commented Mar 24, 2023

Which config file exactly does this prevent using?

For example in this case: tartufo --no-local-config scan-local-repo foo/bar/baz/ , assuming that there is a tartufo.toml both in my current directory, and in the foo/bar/baz directory, which config is now being skipped that would have been used before?

I ask because looking at #469 (that I believe this is meant to address) , the command is tartufo --config /path/to/config scan-local-repo . , it's only using the config file in the current directory because . , the current directory, is the repo that scan-local-repo was given. So if this is meant to prevent using the tartufo.toml from the target, I think the wording of the flag and documentation should be different.

@emayuri-godaddy
Copy link
Contributor Author

Which config file exactly does this prevent using?

For example in this case: tartufo --no-local-config scan-local-repo foo/bar/baz/ , assuming that there is a tartufo.toml both in my current directory, and in the foo/bar/baz directory, which config is now being skipped that would have been used before?

I ask because looking at #469 (that I believe this is meant to address), the command is tartufo --config /path/to/config scan-local-repo . , it's only using the config file in the current directory because . , the current directory, is the repo that scan-local-repo was given. So if this is meant to prevent using the tartufo.toml from the target, I think the wording of the flag and documentation should be different.

It prevents processing of config file in the directory that is being scanned whether it be local or remote
In your example tartufo --no-local-config scan-local-repo foo/bar/baz/ it should skip processing of toml file in foo/bar/baz.
I agree the --local/--no-local-config might be a little miss leading. We are planning on re-naming it to --target-config/--no-target-config and updating documentation to be more clear.

i.e. config files like tartufo.toml or pyproject.toml
setup in local working directory
setup in the repository being scanned

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this flag apply to the scan-folder subcommand as well, or only scan-local-repo and scan-remote-repo ? What about pre-commit ? Depending on which it applies to, the wording could potentially be improved here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should apply to both scan-folder and pre-commit along with scan-local-repo and scan-remote-repo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I would argue that this should not say the repository being scanned then, since it applies to scan-folder too. Maybe just the repository/folder being scanned?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
setup in the repository being scanned
setup in the repository or folder being scanned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just beating on a dead horse at this point. It's not "config files like tartufo.toml or pyproject.toml", it's exactly "tartufo.toml or pyproject.toml in the target root directory". And I am not necessarily a fan of the phrase "target root directory" and would be happy to see a better term. But it'll never be anything except one of those two files, whichever is found first, in the order listed.

@jsimmons1-godaddy
Copy link

jsimmons1-godaddy commented Mar 24, 2023

It prevents processing of config file in the directory that is being scanned whether it be local or remote
In your example tartufo --no-local-config scan-local-repo foo/bar/baz/ it should skip processing of toml file in foo/bar/baz.
I agree the --local/--no-local-config might be a little miss leading. We are planning on re-naming it to --target-config/--no-target-config and updating documentation to be more clear.

Thanks, yeah I find that wording much more clear.

I'm curious if you have thought about there being another (separate) flag that prevents processing the current-directory config as well? For tartufo --no-target-config scan-local-repo foo/bar/baz/ , one that would skip processing the tartufo.toml in the current directory in which this is run (or in parent directories of the current directories if it hasn't found one yet).

@emayuri-godaddy emayuri-godaddy changed the title Setting up new flag --no-local-config to skip use of local config files if required Setting up new flag --no-target-config to skip use of local config files if required Mar 24, 2023
@emayuri-godaddy
Copy link
Contributor Author

As far as I understand there are only two ways of processing config files one in the target directory that is being scanned which is enabled by default and now can be disabled by using --no-target-config and the other option is to provide config file using --config which traverses the directory to find required file. The traversal of the directory when --config is provided will be disabled as part of another issue #475

@jsimmons1-godaddy
Copy link

As far as I understand there are only two ways of processing config files one in the target directory that is being scanned which is enabled by default and now can be disabled by using --no-target-config and the other option is to provide config file using --config which traverses the directory to find required file. The traversal of the directory when --config is provided will be disabled as part of another issue #475

If you don't specify --config at all, it will assume you mean tartufo.toml or pyproject.toml in the current directory, where you're running this command (or parent directories, if it doesn't find either there). Which I don't believe there is currently any way to turn that off, even after this PR.

I have no objections at all with that being addressed in a separate PR though, if it even is something that should have a flag (I'd say yes). Just thought it seemed related enough to mention. (To be clear, I'm just an outside observer anyway, I'm not a maintainer, so even if I did have objections they wouldn't necessarily mean much).

@@ -87,6 +87,12 @@ Options:
specified multiple times.
--config FILE Read configuration from specified file.
[default: tartufo.toml]
--target-config/--no-target-config BOOL
Copy link

@jsimmons1-godaddy jsimmons1-godaddy Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--target-config/--no-target-config BOOL
--target-config/--no-target-config

I think this better aligns with the usage documentation for eg --scan-filenames / --no-scan-filenames (in that it doesn't imply that a boolean value is to be specified after the argument name)

repository being scanned
i.e. config files like tartufo.toml or pyproject.toml
setup in the repository being scanned
[default: local-config]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[default: local-config]
[default: target-config]

Copy link
Contributor

@rbailey-godaddy rbailey-godaddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the twiddly stuff, I think this would be a lot clearer if we could filter out the garbage from the "multiple config" PR that's already been merged. Rebasing (against the squash merge) might be nasty -- can you perhaps just cherry-pick your final commits into a separate clean branch and submit a new PR based on that? Whatever is easiest for you.

@@ -2,8 +2,21 @@ vX.X.X - Mar 3 2023
--------------------

Features:
* [#473](https://github.com/godaddy/tartufo/pull/473) - Introduces new flag `-target-config/--no-target-config` to process or skip using config file in the repository being scanned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [#473](https://github.com/godaddy/tartufo/pull/473) - Introduces new flag `-target-config/--no-target-config` to process or skip using config file in the repository being scanned
* [#473](https://github.com/godaddy/tartufo/pull/473) - Introduces new flag `--target-config/--no-target-config` to process or skip using config file in the repository being scanned

i.e. config files like tartufo.toml or pyproject.toml
setup in local working directory
setup in the repository being scanned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just beating on a dead horse at this point. It's not "config files like tartufo.toml or pyproject.toml", it's exactly "tartufo.toml or pyproject.toml in the target root directory". And I am not necessarily a fan of the phrase "target root directory" and would be happy to see a better term. But it'll never be anything except one of those two files, whichever is found first, in the order listed.

Comment on lines +48 to +84

tartufo --config myconfig.toml scan-local-repo .

``tartufo`` will look for ``myconfig.toml`` in the current directory, and then
in the parent directory if it does not exist in ``.``, etc., and then look for
either ``tartufo.toml`` or (if not found) ``pyproject.toml`` in the current
directory (only) because that is the target of the scan. Directives in, say,
``tartufo.toml`` would supersede settings in ``myconfig.toml``.

For purposes of this scan, empty files are considered not to exist.

However:

.. code-block:: shell

tartufo --config tartufo.toml --config myconfig.toml scan-local-repo .

will cause ``tartufo`` to read ``tartufo.toml`` in the current directory
(assuming it exists, and in parent directories otherwise), and then ``myconfig.toml``
as above, and then either ``tartufo.toml`` or ``pyproject.toml`` in the current
directory (only) because that is the target of the scan.

If ``tartufo.toml`` exists in the current directory, the effect is that
``tartufo.toml`` is read first, ``myconfig.toml`` is read second (possibly
overriding directives), and ``tartufo.toml`` is not read again because it was
processed already (and any ``pyproject.toml`` is ignored because ``tartufo.toml``
was found).

If ``tartufo.toml`` is not present but exists in the parent directory, then the
effect is that ``../tartufo.toml`` is read first, ``myconfig.toml`` is read next,
and ``pyproject.toml`` (if it exists) would be processed last because it is in
the scan target and ``tartufo.toml`` does not exist in the scan root.

Note this behavior is not applicable to remote repository scans, because the
remote repository will be cloned to a scratch directory and neither that directory
nor the configuration files within it are available to specify with ``-config``.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reflection, I think this example would be better if we specified some other target directory that is not . and then replace instances of "current directory" with "target" or similar. What I wrote here (from previous PR) is technically correct, but I think a lot of people see "current directory" and think the current directory is magic, without getting the point that it's true here only because the current directory is also the target directory.

@emayuri-godaddy
Copy link
Contributor Author

Closing this PR as it was branched out of a feature branch and rebase seems to be difficult now.
The new PR for this work #480

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tartufo looks for config in current directory even when using --config
5 participants