Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Add a skip_cache argument to file.managed to suppress caching #64373

Open
nf-brentsaner opened this issue May 27, 2023 · 3 comments
Labels
Feature new functionality including changes to functionality and code refactors, etc. Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged Performance security issues and PRs for the Security Working Group

Comments

@nf-brentsaner
Copy link

nf-brentsaner commented May 27, 2023

Is your feature request related to a problem? Please describe.
I'd argue this is technically a bug, as the way file.managed uses hash comparison is kind of unexpected.

So, file.managed will always download a remote file to the cache to compare checksums.
This is very much not ideal for very large files, and the following scenario is impossible to define via file.managed:

  1. A file, foo.bar.dat, should be located at /srv/http/foo.bar.dat. This is stateful.
  2. This file should have a checksum of abc012.... This is stateful.
  3. The file exists and has that checksum. This is a state.

Being that source_hash is required for non-saltfs (salt://) files, a source is already provided (unless skip_verify is explicitly provided). However, this is not used in any way for name if it exists on the filesystem already and is instead used solely for the cache for transmit integrity validation.

The option should be present to have it available for both transmit and local validation, thus:

  • decreasing the need for unnecessary source fetches
    • and reducing state run times
  • giving the opportunity to bypass a local disk's cache directory for a destination that resides on e.g. network storage

Describe the solution you'd like
I propose a new boolean argument, skip_cache, with a default value of False or None.

IF:

  • skip_cache is provided/True, the cache will be bypassed entirely for all operations for this name/source pair (unless a salt:// source).
  • The name does not exist on the filesystem, the file will be fetched to the destination, but removed if it does not match source_hash.
  • name does exist on the filesystem but does not match source_hash, the file will be downloaded directly to name.

Describe alternatives you've considered
There are no alternatives; I am managing many multiple-gigabyte ISO files and stuck in a hellscape of an ever-growing minion cache, please help.

Additional context
N/A

Please Note

If this feature request would be considered a substantial change or addition, this should go through a SEP process here https://github.com/saltstack/salt-enhancement-proposals, instead of a feature request.

Unnecessary; this change should be completely backwards-compat with all existing usages of file.managed.

@nf-brentsaner nf-brentsaner added Feature new functionality including changes to functionality and code refactors, etc. needs-triage labels May 27, 2023
@nf-brentsaner
Copy link
Author

ADDENDUM: This argument should/must be ignored if template is provided/not None.

@dwoz dwoz added Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged and removed needs-triage labels May 27, 2023
@OrangeDog
Copy link
Contributor

OrangeDog commented Jun 14, 2023

What if skip_hash is True but source_hash is not provided?

the file will be downloaded directly to name

What if, after it's overwritten the current file, it turns out the source_hash doesn't match? There's no way to rollback and you've probably just trojaned yourself.

There are very complex problems here, if you're trying to take a security feature and use it for performance instead.

There are no alternatives

- keep_source: False ?

The only part of this that I think is a good idea, is checking whether the current file already matches source_hash, and no templating is specified, and then do nothing. That (I think) can safely be the default behaviour, with no new arguments needed.

@OrangeDog OrangeDog added Performance security issues and PRs for the Security Working Group labels Jun 14, 2023
@nf-brentsaner
Copy link
Author

nf-brentsaner commented Jun 22, 2023

Just seeing this now, sorry.

What if skip_hash is True but source_hash is not provided?

Dunno, did you mean skip_verify? Unless you mean the suggested skip_cache, in which case it'd behave just as if source_hash was None but skip_verify is False, no? skip_verify would be the operative in that, I'd imagine.

What if, after it's overwritten the current file, it turns out the source_hash doesn't match? There's no way to rollback and you've probably just trojaned yourself.

Presumably it'd be a fail state, and the file could be deleted in such a condition (possibly controlled by another new argument; say, skip_cache_safe that defaults to True). This lets implementation determine the risk - if it's being fetched from an infra/internal resource, that risk is pretty low. That's why skip_verify is an option.

There are very complex problems here, if you're trying to take a security feature and use it for performance instead.

I am trying to not make Salt store a file twice whatsoever (and am trying to avoid huge many-hundreds-of-gebibytes files being written to the cache whatsoever).

  • keep_source: False ?+

Nope. It still downloads to the cache, it just gets deleted after the move to name. There is no way to avoid downloading a file to cache entirely and downloading direct to name instead. And, further, keep_source: False STILL then RE-DOWNLOADS the file each and every time, even if the source_hash matches on name.

The only part of this that I think is a good idea, is checking whether the current file already matches source_hash, and no templating is specified, and then do nothing. That (I think) can safely be the default behaviour, with no new arguments needed.

Sure, as long as the file is not re-downloaded to the cache if the source_hash DOES match name hash. That is the core issue here - there is no way to avoid downloading huge files when there is no need to, even with keep_source: False (though the option to selectively skip caching entirely solves other problems, like the salt cache by default being in /var, which is recommended to exist on its own partition/separate disk, or may not exist on storage capable of handling large files in the first place, etc.).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature new functionality including changes to functionality and code refactors, etc. Pending-Discussion The issue or pull request needs more discussion before it can be closed or merged Performance security issues and PRs for the Security Working Group
Projects
None yet
Development

No branches or pull requests

3 participants