Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Regression) pandoc leaves temporary directory behind #9460

Closed
sboukortt opened this issue Feb 14, 2024 · 13 comments
Closed

(Regression) pandoc leaves temporary directory behind #9460

sboukortt opened this issue Feb 14, 2024 · 13 comments
Labels

Comments

@sboukortt
Copy link

sboukortt commented Feb 14, 2024

As mentioned in #2288 (comment), pandoc 3.1.11.1 on Windows fails to clean up the tex2pdf temporary directory it creates if a document that includes PDF graphics is converted to PDF. (I haven’t tried whether other image formats trigger the bug.) I have offered to try and bisect the issue and am in the process of doing so. I can already confirm that 3.1.2 did not exhibit the problem (it cleaned up all temporary files, as expected).

I’ll update this once I find the revision that introduced the regression.

I also plan to test whether the issue still reproduces on HEAD, and if it doesn’t, I’ll likely “reverse-bisect” the revision that fixed it, if only for my own curiosity.

@sboukortt sboukortt added the bug label Feb 14, 2024
@sboukortt
Copy link
Author

sboukortt commented Feb 14, 2024

It seems that HEAD (e43ab9e) leaves more files behind. (input.tex, input.log, input.pdf; in addition to the input.aux that 3.1.11.1 also left.)

@jgm
Copy link
Owner

jgm commented Feb 14, 2024

Can anybody else running pandoc on Windows reproduce this? It seems to be Windows-specific.

@sboukortt
Copy link
Author

sboukortt commented Feb 14, 2024

Interestingly, my attempt at creating a minimal reproducible example is failing. Not sure what it is about my not-minimal example that is causing the issue to occur… But maybe finding the culprit revision will shed some light on it.

@sboukortt
Copy link
Author

It seems that HEAD (e43ab9e) leaves more files behind. (input.tex, input.log, input.pdf; in addition to the input.aux that 3.1.11.1 also left.)

A bisection between 3.1.11.1 and HEAD suggests that this increase was introduced by 2dd98b9, which doesn’t seem so informative at first glance.

(I did that bisection first because I expected it to be faster.)

My next step is going to be the bisection between 3.1.2 and 3.1.11.1 (likely tomorrow as it’s getting a bit late here).

@jgm
Copy link
Owner

jgm commented Feb 14, 2024

One thing to check is how deterministic your results are. Are the same files always left behind? Are files always left behind with that version, or just sometimes? Does 3.1.2 never leave them behind?

@sboukortt
Copy link
Author

It seems quite deterministic as far as I can tell, and the first bisection did provide a clue after all, at least as far as creating a repro case is concerned: it seems that adding a section name does the trick, and adding it actually makes the image unnecessary – even without it, pandoc still leaves input.aux behind. (But the image makes for another one, just to be sure.)

Here is therefore a repro case:

tmpandoc.zip

$ cd tmpandoc
$ pandoc document.md -o document.pdf

@jgm
Copy link
Owner

jgm commented Feb 14, 2024

Also: what --pdf-engine are you using (or just the default)? OK: the default.

@jgm
Copy link
Owner

jgm commented Feb 14, 2024

My hunch is that this may have to do with lazy IO, and I note that there are a couple of readFileLazy's in the runTeXProgram code. I would expect that withSystemTempDir would still clean up, but maybe there is a bug in the Windows implementation? We could try replacing all the lazy IO with strict IO in this context -- I remember doing that before for another issue on Windows.

@jgm
Copy link
Owner

jgm commented Feb 15, 2024

If you want to try it, you could, in Text.Pandoc.PDF.hs, remove the readFileLazy from the imports from Text.Pandoc.Class and add this function definition:

readFileLazy :: (PandocMonad m, MonadIO m) => FilePath -> m BL.ByteString
readFileLazy fp = BL.fromStrict <$> readFileStrict fp

Then recompile and see if the problem is still there.

@jgm
Copy link
Owner

jgm commented Feb 15, 2024

Commit bd8e317
adds a readFileLazy.
Dec. 15 2023 pandoc 3.1.11

jgm added a commit that referenced this issue Feb 15, 2024
My hunch is that this is causing improperly cleaned up temp
directory on Windows (#9460), but this will have to be confirmed.
@jgm
Copy link
Owner

jgm commented Feb 15, 2024

OK, try with current HEAD, which incorporates the idea I had above.

@sboukortt
Copy link
Author

That does seem to fix it, thanks!

git bisect found 3c17869 as the proximate cause, but it seems plausible that it only surfaced the issue, not caused it.

@jgm
Copy link
Owner

jgm commented Feb 15, 2024

Great, I'm glad to have found this. I remember running into a similar issue with lazy IO on Windows over a decade ago.

@jgm jgm closed this as completed Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants