Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Corrupted /var/cache/salt/master/salt_vault_token #59361

Closed
RomanKrasavtsev opened this issue Jan 26, 2021 · 8 comments · Fixed by #62195
Closed

[BUG] Corrupted /var/cache/salt/master/salt_vault_token #59361

RomanKrasavtsev opened this issue Jan 26, 2021 · 8 comments · Fixed by #62195
Assignees
Labels
Bug broken, incorrect, or confusing behavior severity-high 2nd top severity, seen by most users, causes major problems Vault VMware

Comments

@RomanKrasavtsev
Copy link

RomanKrasavtsev commented Jan 26, 2021

Description
After a certain amount of time, Salt incorrectly overwrites /var/cache/salt/master/salt_vault_token

Setup

/etc/salt/master.d/vault.conf

vault:
  url: https://vault-url
  auth:
    uses: 0
    ttl: 300
    token_backend: disk
    method: token
    token: s.0yg..REPLACED_TOKEN
  policies:
    - saltmaster

Steps to Reproduce the behavior

  $ salt "*" state.apply
...
 Data failed to compile:
----------
    Pillar failed to render with the following messages:
----------
    Rendering SLS 'base' failed. Please see master log for details.
----------
    Rendering SLS 'base.consul' failed. Please see master log for details.
...
ERROR: Minions returned with non-zero exit code

less /var/log/salt/master

---
2021-01-26 14:58:19,408 [salt.pillar      :1208][CRITICAL][1669] Pillar render error: Rendering SLS 'base' failed. Please see master log for details.
2021-01-26 14:58:19,408 [salt.pillar      :1208][CRITICAL][1669] Pillar render error: Rendering SLS 'base.consul' failed. Please see master log for details.
2021-01-26 14:58:19,408 [salt.pillar      :1208][CRITICAL][1669] Pillar render error: Rendering SLS 'staging.REPLACED_PILLAR' failed. Please see master log for details.
...skipping...
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 12, in top-level template code
  File "/usr/lib/python3/dist-packages/salt/modules/vault.py", line 199, in read_secret
    version2 = __utils__["vault.is_v2"](path)
  File "/usr/lib/python3/dist-packages/salt/utils/vault.py", line 406, in is_v2
    path_metadata = _get_secret_path_metadata(path)
  File "/usr/lib/python3/dist-packages/salt/utils/vault.py", line 473, in _get_secret_path_metadata
    cache_content = _read_cache_file()
  File "/usr/lib/python3/dist-packages/salt/utils/vault.py", line 244, in _read_cache_file
    return salt.utils.json.load(contents)
  File "/usr/lib/python3/dist-packages/salt/utils/json.py", line 82, in load
    return kwargs.pop("_json_module", json).load(fp, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 342, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 564 (char 563)

; line 12

---

Expected behavior
Salt should regenerate vault token.

File
As you can see the file was incorrectly overwritten:

/var/cache/salt/master/salt_vault_token

{"url": "https://vault-URL", "token": "token", "verify": null, "uses": 0, "lease_duration": 300, "issued": 1611319989, "unlimited_use_token": true, "vault_secret_path_metadata": {"secret/all/saltmaster/noldap": {"accessor": "kv_1d13e977", "config": {"default_lease_ttl": 0, "force_no_cache": false, "max_lease_ttl": 0}, "description": "", "external_entropy_access": false, "local": false, "options": {"version": "2"}, "path": "secret/", "seal_wrap": false, "type": "kv", "uuid": "c6a2b5eb-a0c1-0965-180d-c3e30cca5ff6"}}}80d-c3e31cca5ff6"}}}

Versions Report

salt --versions-report (Provided by running salt --versions-report. Please also mention any differences in master/minion versions.)
Salt Version:
          Salt: 3002.2

Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.6.1
     docker-py: Not Installed
         gitdb: 2.0.3
     gitpython: 2.1.8
        Jinja2: 2.10
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.5.6
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: Not Installed
  pycryptodome: 3.4.7
        pygit2: Not Installed
        Python: 3.6.8 (default, Jan 14 2019, 11:02:34)
  python-gnupg: 0.4.1
        PyYAML: 3.12
         PyZMQ: 17.1.2
         smmap: 2.0.3
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.2.5

System Versions:
          dist: ubuntu 18.04 Bionic Beaver
        locale: UTF-8
       machine: x86_64
       release: 4.15.0-54-generic
        system: Linux
       version: Ubuntu 18.04 Bionic Beaver

Workaround
Manually remove /var/cache/salt/master/salt_vault_token and salt regenerates it.

@RomanKrasavtsev RomanKrasavtsev added the Bug broken, incorrect, or confusing behavior label Jan 26, 2021
@welcome
Copy link

welcome bot commented Jan 26, 2021

Hi there! Welcome to the Salt Community! Thank you for making your first contribution. We have a lengthy process for issues and PRs. Someone from the Core Team will follow up as soon as possible. In the meantime, here’s some information that may help as you continue your Salt journey.
Please be sure to review our Code of Conduct. Also, check out some of our community resources including:

There are lots of ways to get involved in our community. Every month, there are around a dozen opportunities to meet with other contributors and the Salt Core team and collaborate in real time. The best way to keep track is by subscribing to the Salt Community Events Calendar.
If you have additional questions, email us at core@saltstack.com. We’re glad you’ve joined our community and look forward to doing awesome things with you!

@furkalor
Copy link

We have faced the same issue. File /var/cache/salt/master/salt_vault_token is constantly growing and sometimes JSON structure inside it became broken:

"path": "infra-kv/", "seal_wrap": false, "type": "kv", "uuid": "64850942-d1c6-18a7-4785-929955f74b7e"}}}e"}}}

You can see excess symbols ( }}}e ) and this is the reason why salt goes broken:

  File "/usr/lib/python3.6/json/decoder.py", line 347, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 141624 (char 141623)
[INFO    ] Runner completed: 20210412084350734796

@sagetherage sagetherage added severity-high 2nd top severity, seen by most users, causes major problems and removed needs-triage labels Apr 27, 2021
@sagetherage sagetherage added this to the Silicon milestone Apr 27, 2021
@sagetherage sagetherage added the Silicon v3004.0 Release code name label Apr 27, 2021
@AliakseiKorneu
Copy link

We had exactly the same issue. It's extremely hard to apply states to nodes that extract pillar items from the Vault. Removing of /var/cache/salt/master/salt_vault_token file can help sometimes, but it's not a permanent solution.

Logs:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/salt/utils/templates.py", line 500, in render_jinja_tmpl
    output = template.render(**decoded_context)
  File "/usr/lib/python3/dist-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 31, in top-level template code
  File "/etc/salt/pillar/defaults/***.sls", line 8, in top-level template code
    {{ salt["sdb.get"]("sdb://vlt/***/ca_cert") | indent(6) }}
  File "/usr/lib/python3/dist-packages/jinja2/sandbox.py", line 438, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__
    return self.loader.run(run_func, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run
    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as
    return _func_or_method(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/modules/sdb.py", line 26, in get
    return salt.utils.sdb.sdb_get(uri, __opts__, __utils__, strict)
  File "/usr/lib/python3/dist-packages/salt/utils/sdb.py", line 60, in sdb_get
    return loaded_db[fun](query, profile=profile)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__
    return self.loader.run(run_func, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run
    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as
    return _func_or_method(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/sdb/vault.py", line 89, in get
    version2 = __utils__["vault.is_v2"](path)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 1235, in __call__
    return self.loader.run(run_func, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 2268, in run
    return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/contextvars/__init__.py", line 38, in run
    return callable(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/loader.py", line 2283, in _run_as
    return _func_or_method(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/salt/utils/vault.py", line 411, in is_v2
    path_metadata = _get_secret_path_metadata(path)
  File "/usr/lib/python3/dist-packages/salt/utils/vault.py", line 484, in _get_secret_path_metadata
    cache_content = _read_cache_file()
  File "/usr/lib/python3/dist-packages/salt/utils/vault.py", line 246, in _read_cache_file
    return salt.utils.json.load(contents)
  File "/usr/lib/python3/dist-packages/salt/utils/json.py", line 82, in load
    return kwargs.pop("_json_module", json).load(fp, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 299, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 347, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 11969 (char 11968)

@AliakseiKorneu
Copy link

AliakseiKorneu commented Jul 5, 2021

As a temporary solution we put immutable flag on file /var/cache/salt/master/salt_vault_token

@sagetherage sagetherage modified the milestones: Silicon, Approved Aug 12, 2021
@dwoz dwoz added Phosphorus v3005.0 Release code name and version and removed Silicon v3004.0 Release code name labels Aug 30, 2021
@waynew
Copy link
Contributor

waynew commented Oct 21, 2021

Discussed this a bit during open hour, dropping some info here: based on your output it looks like Salt is possibly hitting a race condition here. We should be doing an atomic write to this file instead (i.e. open a tmp file, write to that, and then rename that to the salt_vault_token file)

A bit weird, though, as I'm not able to actually produce the error just by using threading with the same filehandle. Race condition might be extra "fun" 😬

@waynew
Copy link
Contributor

waynew commented Oct 21, 2021

@dwoz a thought I had for when you look into this: we might be able to just allow an atomic=True keyword in utils.files.fopen - use the atomic opener if so, otherwise use the default open. That would avoid having to add extra imports or change stuff around to take advantage of the functionality. Might not be a good idea, just something to consider.

@vanveele
Copy link

vanveele commented Nov 2, 2021

Wouldn't it be enough in this case to gracefully handle the symptom and move on since it is only a cache file? For example, just catch the exception and wipe the file.

modified from https://github.com/saltstack/salt/blob/master/salt/utils/vault.py#L249

def _read_cache_file():
    """
    Return contents of cache file
    """
    try:
        cache_file = os.path.join(__opts__["cachedir"], "salt_vault_token")
        with salt.utils.files.fopen(cache_file, "r") as contents:
            return salt.utils.json.load(contents)
    except FileNotFoundError:
        return {}
    except JSONDecodeError:
        del_cache()
        return {}

@anilsil anilsil added the VMware label Apr 6, 2022
@rumenv
Copy link

rumenv commented Apr 13, 2022

How we can work around this issue? The salt is unusable at the moment - we either have:
2022-04-13 13:03:46,478 [salt.pillar :1220][CRITICAL][76394] Pillar render error: Failed to load ext_pillar vault: HTTPSConnectionPool(host='vault.tools.starsops.com', port=443): Max retries exceeded with url: / v1/secret/us-east-2/common (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fb81dbdb1c0>: Failed to establish a new connection: [Errno 110] Connection timed out')) if we do not use token_backend: disk or if we use it, we have the issue described here?

@dwoz dwoz removed the Phosphorus v3005.0 Release code name and version label Apr 18, 2022
@dwoz dwoz removed this from the Phosphorus v3005.0 milestone Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior severity-high 2nd top severity, seen by most users, causes major problems Vault VMware
Projects
None yet
Development

Successfully merging a pull request may close this issue.