Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix No such file or directory: /tmp/mdkatex/... #17

Merged
merged 7 commits into from
Jun 24, 2024

Conversation

tovrstra
Copy link
Contributor

This fixes issue #16 (So far, I've not run into this error anymore with this fix.)

I also had to correct the requirements file to make pip install -e . work.
See https://pip.pypa.io/en/stable/reference/requirement-specifiers/#examples

@mbarkhau
Copy link
Owner

I'm quite sure this change has the side effect of eliminating any caching between runs.

If the issue is that files are being accessed concurrently, I think the more appropriate change would be to make sure all files are first written with a suffix f"{temp_filename}_tmp_{unique_nonce}" and then do an atomic rename to temp_filename.

@tovrstra
Copy link
Contributor Author

I hadn't realized that caching was used in this way. I agree that this change breaks caching.

Is the goal to enable caching within one process, or also cache between different invocations?

@mbarkhau
Copy link
Owner

Caching across multiple invocations / separate processes is the idea yes.

@tovrstra
Copy link
Contributor Author

Sorry for coming back late to this.

In order to realize caching between multiple runs safely, one could use separate directories: (i) for storing cached results and (ii) for running _write_tex2html. When _write_tex2html completes successfully, it can copy its result to the cache directory. Typically, such cached results are stored in a subdirectory of ~/.cache/ instead of under /tmp

- make file write operations atomic
- use "cache" instead of "tmp" as that better reflects
  the nature of the files (whereas /tmp/ is only incidentally
  the directory we use
@mbarkhau
Copy link
Owner

mbarkhau commented Apr 9, 2024

@tovrstra Is it possible for you to verify that my most recent changes fix the issue?

@tovrstra
Copy link
Contributor Author

Sorry for the delay. I've repeated the build process, which involves about 1200 Markdown to PDF conversions (feedback with equations for my students). These conversions run in parallel on a machine with 16 cores. With the latest version of this pull request, I get the following error:

...
  File "/.../venv/lib64/python3.12/site-packages/markdown/core.py", line 354, in convert
    self.lines = prep.run(self.lines)
                 ^^^^^^^^^^^^^^^^^^^^
  File "/.../pkgs/markdown-katex/src/markdown_katex/extension.py", line 268, in run
    return list(self._iter_out_lines(lines))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pkgs/markdown-katex/src/markdown_katex/extension.py", line 257, in _iter_out_lines
    marker_tag = self._make_tag_for_inline(code.inline_text)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pkgs/markdown-katex/src/markdown_katex/extension.py", line 214, in _make_tag_for_inline
    math_html = md_inline2html(inline_text, self.ext.options)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pkgs/markdown-katex/src/markdown_katex/extension.py", line 119, in md_inline2html
    return tex2html(inline_text, options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pkgs/markdown-katex/src/markdown_katex/extension.py", line 83, in tex2html
    result = wrapper.tex2html(tex, options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.../pkgs/markdown-katex/src/markdown_katex/wrapper.py", line 281, in tex2html
    _cleanup_cache_dir()
  File "/.../pkgs/markdown-katex/src/markdown_katex/wrapper.py", line 288, in _cleanup_cache_dir
    mtime = fpath.stat().st_mtime
            ^^^^^^^^^^^^
  File "/usr/lib64/python3.12/pathlib.py", line 840, in stat
    return os.stat(self, follow_symlinks=follow_symlinks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/mdkatex/f920f6800f2e6d5be7baf1a9de0359d6c1bf6f4b5702053ec213b12481bb111e.tex_tmp_0b5556b35d01ccd492ff252035b63855cf070838'

I cannot share this specific test case due to privacy concerns. (The traceback is also edited.) To facilitate fixing the issue, a simple example would be better. I'll see if I can cook up something that can be shared.

@mbarkhau
Copy link
Owner

Perfect, I appreciate the help.

When multiple threads are running, they may both
cleanup the cache directory at the same time and
one thread will provoke FileNotFound errors in the
other thread.
@mbarkhau
Copy link
Owner

With my most recent commit, I'm reasonably sure, that your issue should be fixed. A test would of course be nice, regardless, but I'll take your word for it if you say everything is working now.

@tovrstra
Copy link
Contributor Author

Thanks for the additional commits! These do indeed fix the issue as far as my testing goes.

I also created a Python script that reproduces the bug (before your latest commits). To see the error message, you need to remove all files under /tmp/mdkatex before running the script:

#!/usr/bin/env python

import concurrent.futures
import random

import markdown


TEXT_SNIPPETS = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
""".split()


EQ_SNIPPETS = r"""
x y \sin(\theta) \hat{H} \vec{B} \int_0^\infty f(x)\,dx
\{i^2\}_{i=1}^N e^{2it\pi\omega} \tan(x)
""".split()


def generate_md() -> str:
    parts = []
    for _ in range(100):
        parts.append(random.choice(TEXT_SNIPPETS))
        parts.append(f"$`{random.choice(EQ_SNIPPETS)}`$")
    return " ".join(parts)


def convert(md: str) -> str:
    md_ctx = markdown.Markdown(
        extensions=[
            "fenced_code",
            "markdown_katex",
            "tables",
        ],
        extension_configs={
            "markdown_katex": {"insert_fonts_css": True}
        },
    )
    return md_ctx.convert(md)


def main():
    with concurrent.futures.ProcessPoolExecutor(max_workers=30) as executor:
        mds = [generate_md() for _ in range(30)]
        for html in executor.map(convert, mds):
            print(html)
            print()


if __name__ == '__main__':
    main()

It may be possible to simplify the script further and still get the error.

@tovrstra
Copy link
Contributor Author

@mbarkhau Would it be possible to make a release with this fix? With the current version of pip (24.1), markdown-katex cannot be installed, due to issues fixed in this PR. I get the following error when installing markdown-katex with the latest pip:

  WARNING: Ignoring version 202112.1034 of markdown-katex since it has invalid metadata:
  Requested markdown-katex from https://files.pythonhosted.org/packages/a2/18/f54ce298ddda160e9443fd68e47c2a677ea6320ddbe08e10cd40d54c2df4/markdown_katex-202112.1034-py2.py3-none-any.whl (from stepup-reprep==1.2.1->-r requirements.in (line 8)) has invalid metadata: Expected matching RIGHT_PARENTHESIS for LEFT_PARENTHESIS, after version specifier
      Markdown (>=3.0<3.3) ; python_version < "3.6"
               ~~~~~~^
  Please use pip<24.1 if you need to use this version.
  WARNING: Ignoring version 202109.1033 of markdown-katex since it has invalid metadata:
  Requested markdown-katex from https://files.pythonhosted.org/packages/1c/d8/3a38317d1ad5b3bd1428167e28a9dd353c1fffc29408500ec27f5061cf16/markdown_katex-202109.1033-py2.py3-none-any.whl (from stepup-reprep==1.2.1->-r requirements.in (line 8)) has invalid metadata: Expected matching RIGHT_PARENTHESIS for LEFT_PARENTHESIS, after version specifier
      Markdown (>=3.0<3.3) ; python_version < "3.6"
               ~~~~~~^
  Please use pip<24.1 if you need to use this version.
    error: subprocess-exited-with-error
    
    × python setup.py egg_info did not run successfully.
    │ exit code: 1
    ╰─> [3 lines of output]
        error in markdown-katex setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Expected end or semicolon (after version specifier)
            Markdown>=3.0<3.3;python_version<"3.6"
                    ~~~~~^
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.

Thanks!

@mbarkhau mbarkhau merged commit 3365450 into mbarkhau:master Jun 24, 2024
2 checks passed
@mbarkhau
Copy link
Owner

I just pushed v202406.1035

@tovrstra
Copy link
Contributor Author

Thank you so much! I've tested it once more with the pip-installed version. It all works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants