Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar caching causes EOFError and race condition when used as pre-commit hook #1164

Closed
a-gardner1 opened this issue Oct 10, 2023 · 5 comments · Fixed by #1243
Closed

Grammar caching causes EOFError and race condition when used as pre-commit hook #1164

a-gardner1 opened this issue Oct 10, 2023 · 5 comments · Fixed by #1243

Comments

@a-gardner1
Copy link

Thanks to pre-commit#851, hooks can now be executed in parallel for each file. This feature is enabled by default.

When a large number of files need to be formatted at once, this concurrency introduces a race condition in wherein one process can conclude that the pickled grammar file does not exist and start to create it. Meanwhile, another process sees the newly created pickle file before it is done being written, concludes that it is newer than the raw unpickled grammar, and then attempts to load it.
The result is the following stack trace:

Traceback (most recent call last):
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/bin/yapf", line 5, in <module>
    from yapf import run_main
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/__init__.py", line 41, in <module>
    from yapf.yapflib import yapf_api
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/yapflib/yapf_api.py", line 38, in <module>
    from yapf.pyparser import pyparser
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/pyparser/pyparser.py", line 44, in <module>
    from yapf.yapflib import format_token
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/yapflib/format_token.py", line 23, in <module>
    from yapf.pytree import pytree_utils
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf/pytree/pytree_utils.py", line 30, in <module>
    from yapf_third_party._ylib2to3 import pygram
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf_third_party/_ylib2to3/pygram.py", line 29, in <module>
    python_grammar = driver.load_grammar(_GRAMMAR_FILE)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf_third_party/_ylib2to3/pgen2/driver.py", line 252, in load_grammar
    g.load(gp)
  File "~/.cache/pre-commit/repo73fphdkg/py_env-python3.11/lib/python3.11/site-packages/yapf_third_party/_ylib2to3/pgen2/grammar.py", line 95, in load
    d = pickle.load(f)
        ^^^^^^^^^^^^^^
EOFError: Ran out of input

Since the process writing the file is not impacted by this error, the pickled grammar will get cached. Thus, subsequent runs will succeed. However, if one is using yapf in a continuous integration context, this would cause a failed pipeline with potentially high probability (depending on the number of files) with no obvious recourse.

A workaround is to add require_serial to the yapf hook in your project's .pre-commit-config.yaml, but this comes at the cost of losing the advantages of pre-commit#851. I believe the logic in _load_grammar could be refactored to avoid the race condition with some more careful checks.

@kamahen
Copy link
Contributor

kamahen commented Oct 10, 2023

The problem seems to be in Grammar.dump() in file yapf/third_party/yapf_third_party/_ylib2to3/pgen2/grammar.py - it opens the file directly instead of creating a temporary name, writing to the file, then renaming it (this works on Unix, where rename is an atomic operation (as are open, close, linke, etc.), so it's safe if two processes do the same thing at the same time). I don't know if this technique works on Windows.

Something like this:

def dump(self, filename):
  """Dump the grammar tables to a pickle file."""
  with tempfile.NamedTemporaryFile(mode='wb') as f:
    pickle.dump(self.__dict__, f, pickle.HIGHEST_PROTOCOL)
    os.link(f.name, filename)

@a-gardner1
Copy link
Author

Agreed, I think that would work on Unix systems. To my knowledge, there is no documented, guaranteed atomic write operation on Windows. The python-atomicwrites library provides a best-effort atomic write operation for Windows; there may or may not be something better out there.

@kamahen
Copy link
Contributor

kamahen commented Oct 11, 2023

There's an alternative method that runs a small risk of leaving a temporary file lying around, but might work better on Windows [this code is untested; and it probably should have a try/except to deal with cleanup on error]:

  with tempfile.NamedTemporaryFile(mode='wb', delete=False, delete_on_close=False) as f:
    tmp_file_name = f.name
    pickle.dump((self.__dict__, f, pickle.HIGHEST_PROTOCOL)
  os.rename(tmp_file_name, filename)

@jwwangchn
Copy link

Same problem, how to solve it?

pbchekin added a commit to intel/intel-xpu-backend-for-triton that referenced this issue Jan 7, 2024
pbchekin added a commit to intel/intel-xpu-backend-for-triton that referenced this issue Jan 8, 2024
pbchekin added a commit to intel/intel-xpu-backend-for-triton that referenced this issue Jan 8, 2024
* GitHub workflow for new runners

* Ignore first yapf failure

See google/yapf#1164.

* run test_line_info.py separately

* triton-runner-base:0.0.2

* Include cmake/llvm-hash.txt to a cache key for packages

* Run pre-commit checks in parallel

* Use pip cache for pre-commit checks

* Redirect first failing yapf to /dev/null

* Use jobs.<job_id>.defaults.run to initialize oneapi
pbchekin added a commit to intel/intel-xpu-backend-for-triton that referenced this issue Jan 10, 2024
We have to run yapf twice because in a clean environment the first run
most likely fails because of the race condition in yapf, see
google/yapf#1164. If, however, the first run
was successful and yapf has modified files, then we need to reset the
changes, so the second run can detect them again.

Note that the whole block (ignore the first run and reset the tree) will
not be necessary and will be removed after fixed
google/yapf#1164.
whitneywhtsang pushed a commit to intel/intel-xpu-backend-for-triton that referenced this issue Jan 10, 2024
We have to run yapf twice because in a clean environment the first run
most likely fails because of the race condition in yapf, see
google/yapf#1164. If, however, the first run
was successful and yapf has modified files, then we need to reset the
changes, so the second run can detect them again.

Note that the whole block (ignore the first run and reset the tree) will
not be necessary and will be removed after fixed
google/yapf#1164.
@whlook
Copy link

whlook commented Feb 27, 2024

Same problem, how to solve it?

similar issue and the way to fix it: #1204

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants