Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] saltutil.refresh_grains clearing out pillar cache in salt 3005.x+ #64081

Closed
cheburakshu opened this issue Apr 14, 2023 · 8 comments · Fixed by #64176
Closed

[BUG] saltutil.refresh_grains clearing out pillar cache in salt 3005.x+ #64081

cheburakshu opened this issue Apr 14, 2023 · 8 comments · Fixed by #64176
Assignees
Labels
Bug broken, incorrect, or confusing behavior Pillar Regression The issue is a bug that breaks functionality known to work in previous releases. severity-high 2nd top severity, seen by most users, causes major problems VMware

Comments

@cheburakshu
Copy link

salt 3005 and later are clearing out the pillar cache and refreshing the grains, as saltutil.refresh_grains calls saltutil.refresh_pillar internally, and refresh_pillar is called with clean_cache=True by default. Request to change the signature of refresh_pillar to clean_cache=False.

Below output is from salt v3005.0 salt-call

[root@rocky-minion-75c9d897b7-zd68f /]# salt-call --local saltutil.refresh_grains -l debug
[DEBUG   ] Missing configuration file: /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/minion.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/minion.conf
[DEBUG   ] Using cached minion ID from /etc/salt/minion_id: rocky-minion-75c9d897b7-zd68f
[WARNING ] Insecure logging configuration detected! Sensitive data may be logged.
[DEBUG   ] Configuration file path: /etc/salt/minion
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Missing configuration file: /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Including configuration from '/etc/salt/minion.d/minion.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/minion.conf
[DEBUG   ] The functions from module 'core' are being loaded by dir() on the loaded module
[DEBUG   ] The functions from module 'disks' are being loaded by dir() on the loaded module
[DEBUG   ] The functions from module 'extra' are being loaded by dir() on the loaded module
[DEBUG   ] The functions from module 'lvm' are being loaded by dir() on the loaded module
[DEBUG   ] The functions from module 'mdadm' are being loaded by dir() on the loaded module
[DEBUG   ] The functions from module 'minion_process' are being loaded by dir() on the loaded module
[DEBUG   ] The functions from module 'opts' are being loaded by dir() on the loaded module
[DEBUG   ] Override  __utils__: <module 'salt.loaded.int.grains.zfs' from '/app/salt/salt/grains/zfs.py'>
[DEBUG   ] The functions from module 'zfs' are being loaded by dir() on the loaded module
[DEBUG   ] Elapsed time getting FQDNs: 0.011265039443969727 seconds
[DEBUG   ] Could not determine init system from command line: (python3 /app/salt/scripts/salt-minion)
[DEBUG   ] The `lspci` binary is not available on the system. GPU grains will not be available.
[DEBUG   ] The functions from module 'zfs' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded zfs.is_supported
[DEBUG   ] Determining pillar cache
[DEBUG   ] The functions from module 'jinja' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] The functions from module 'yaml' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] The functions from module 'jinja' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] The functions from module 'yaml' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] The functions from module 'saltutil' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded saltutil.refresh_grains
[DEBUG   ] The functions from module 'direct_call' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded direct_call.execute
[DEBUG   ] The functions from module 'event' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded event.fire
[DEBUG   ] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_85aff00540_pub.ipc
[DEBUG   ] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_85aff00540_pull.ipc
[DEBUG   ] Sending event: tag = pillar_refresh; data = {'clean_cache': True, '_stamp': '2023-04-13T22:13:08.694398'}
[DEBUG   ] Closing IPCMessageClient instance
[DEBUG   ] The functions from module 'nested' are being loaded by dir() on the loaded module
[DEBUG   ] LazyLoaded nested.output
local:
    True```
@cheburakshu cheburakshu added Bug broken, incorrect, or confusing behavior needs-triage labels Apr 14, 2023
@Ch3LL Ch3LL added the severity-high 2nd top severity, seen by most users, causes major problems label Apr 14, 2023
@Ch3LL Ch3LL added this to the Chlorine v3007.0 milestone Apr 14, 2023
@Ch3LL Ch3LL added Pillar and removed needs-triage labels Apr 14, 2023
@anilsil anilsil added the VMware label Apr 18, 2023
@cmcmarrow cmcmarrow assigned cmcmarrow and unassigned Ch3LL and MKLeb Apr 26, 2023
@cmcmarrow
Copy link
Contributor

We will add an arg to clean pillar cache in saltutil.refresh_grains

@cheburakshu
Copy link
Author

Right now saltutil.refresh_pillar cleans out the pillar cache without taking into account pillar_cache_ttl in the salt master config (https://docs.saltproject.io/en/latest/ref/configuration/master.html#pillar-cache-ttl).
May be we can add note to the documentation reg the new behavior when using refresh_pillar

@cmcmarrow
Copy link
Contributor

cmcmarrow commented Apr 27, 2023

People expect clean_cache=True to make it false now will causal breaks for many salt users. Looking at git blame real fast this behavior has been in place as of 6/2021 and probably even earlier in salts development life.

What we can do is add clean_pillar_cache=True to saltutil.refresh_grains so you can use saltutil.refresh_grains with the desired behavior VIA saltutil.refresh_grains(clean_pillar_cache=False).

As for documentations there are many places in salt were the pillar cache is cleared. pillar_cache_ttl does not stop the cache from being cleared. All it does is refresh the pillar catch after a set period of time.

How does this sound

If and only if a master has set pillar_cache: True, the cache TTL controls the amount of time, in seconds, before the cache is considered invalid by a master and a fresh pillar is recompiled and stored. 
The cache TTL does not prevent pillar cache from being refreshed before TTL expires. 

@cmcmarrow cmcmarrow added Feature new functionality including changes to functionality and code refactors, etc. and removed Bug broken, incorrect, or confusing behavior labels Apr 27, 2023
@cheburakshu
Copy link
Author

cheburakshu commented Apr 27, 2023

this behavior started happening after this PR: #60975 (salt 3005).

The specific bug that the above PR addressed was to clear the cache, but the cache is always cleared, even when the intent of the user isn't to clear the cache (see force_refresh that existed before the PR was merged). Ideally, clearing the pillar cache (a very resource intensive op to rebuild) should be done with user intent, respecting the pillar_cache_ttl, not by default.

I believe this is a bug, but could be addressed with a documentation update, if users would be fine expending compute.

This is a regression which also affects state.apply, not a "Feature" request 🤷 .

@cmcmarrow cmcmarrow added Bug broken, incorrect, or confusing behavior Regression The issue is a bug that breaks functionality known to work in previous releases. and removed Feature new functionality including changes to functionality and code refactors, etc. labels Apr 27, 2023
@cmcmarrow
Copy link
Contributor

cmcmarrow commented Apr 28, 2023

@cheburakshu now that you bring up state.apply I believe to see the full scope of the problem now. This is a performance problem that many will probably see. I add it clear_pillar_cache to serval functions in saltutil and made the default false this should fix salt.apple and saltutil.refresh_grains.

What do you think of that solution?

@cheburakshu
Copy link
Author

yeah, the proposal sounds good to me 👍

@garethgreenaway
Copy link
Contributor

Fix available in upcoming 3006.1 release, closed via #64176

@tysonnorris
Copy link

Will this be backported to a 3005.x release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior Pillar Regression The issue is a bug that breaks functionality known to work in previous releases. severity-high 2nd top severity, seen by most users, causes major problems VMware
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants