Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reuse caches between two jobs? #252

Closed
hmemcpy opened this issue Feb 13, 2020 · 5 comments
Closed

How to reuse caches between two jobs? #252

hmemcpy opened this issue Feb 13, 2020 · 5 comments
Labels

Comments

@hmemcpy
Copy link

@hmemcpy hmemcpy commented Feb 13, 2020

(sorry for raising it as an issue, this is more of a question)

I have two nearly-identical documents with slightly different settings (one with 11pt font, and the other with 10pt), say book-reader.tex and book-print.tex. They import a code snippet with minted using \inputminted.

I want to reuse the caches produced from the first book in the subsequent books.

I've configured the cachedir to output to the same directory, but it seems that this produces two .pygtex files:

$ ls
FCA5498F9EAF9E235804E47AA988230D31F590DAF3C4999269D16864F0C9105B.pygtex
FCA5498F9EAF9E235804E47AA988230D796938E665113EC976EFD1EDB1C66E95.pygtex
...

Their contents are identical, but the file names are different (the first half of the hash is the same, but second differs)

After reading the source, I understand that the jobname is taken to an account when computing the filename hash. So my question is: is there a way to configure the filename, so it hashes just the content, regardless of the jobname?

N.B. I tried using \finalizecache but it doesn't work for my situation, because some books use a macro that imports snippets in a different language instead.

@muzimuzhi
Copy link

@muzimuzhi muzimuzhi commented Feb 14, 2020

minted stores the value of \jobname in \minted@jobname with some special characters substituted, see

minted/source/minted.dtx

Lines 1654 to 1662 in 5d72859

% \begin{macro}{\minted@jobname}
% At various points, temporary files and directories will need to be named after the main |.tex| file. The typical way to do this is to use |\jobname|. However, if the file name contains spaces, then |\jobname| will contain the name wrapped in quotes (older versions of MiKTeX replace spaces with asterisks instead, and \texttt{XeTeX} apparently \href{http://tex.stackexchange.com/a/93829/10742}{allows double quotes within file names}, in which case names are wrapped in single quotes). While that is perfectly fine for working with \LaTeX\ internally, it causes problems with |\write18|, since quotes will end up in unwanted locations in shell commands. It would be possible to strip the wrapping quotation marks when they are present, and maintain any spaces in the file name. But it is simplest to create a ``sanitized'' version of |\jobname| in which spaces and asterisks are replaced by underscores, and double quotes are stripped. Single quotes are also replaced, since they can cause quoted string errors, or become double quotes in the process of being passed to the system through |\write18|.
% \begin{macrocode}
\StrSubstitute{\jobname}{ }{_}[\minted@jobname]
\StrSubstitute{\minted@jobname}{*}{_}[\minted@jobname]
\StrSubstitute{\minted@jobname}{"}{}[\minted@jobname]
\StrSubstitute{\minted@jobname}{'}{_}[\minted@jobname]
% \end{macrocode}
% \end{macro}

Caches can be reused as long as the definition of \minted@jobname in different root tex files are identical, if the minted config, pygments config, and included code files are already identical.

For example, with the following directory structure,

.
├── code.py
├── main1.tex
└── main2.tex

and the tex file contents

% main1.tex
\documentclass{article}
\usepackage[cachedir=cache]{minted}

\begin{document}
\inputminted{python}{code.py}
\end{document}
% main2.tex
\documentclass{article}
\usepackage[cachedir=cache]{minted}

\makeatletter
\def\minted@jobname{main1}
\makeatother

\begin{document}
\inputminted{python}{code.py}
\end{document}

By changing the \minted@jobname to main1 in file main2.tex, the caches are reused.

@hmemcpy
Copy link
Author

@hmemcpy hmemcpy commented Feb 14, 2020

This is great, thank you! I've done something similar, by setting the -jobname argument from the command line to the same name. The book I'm working on loads a lot of snippets from external files, so it can be translated to some other languages.

One last question before I close the issue - is there any reason not to commit the caches to git? The books are built on CI, and while I managed to shave off a few minutes by reusing the cache, the initial build time is still significant.

Thanks!

@muzimuzhi
Copy link

@muzimuzhi muzimuzhi commented Feb 14, 2020

I've done something similar, by setting the -jobname argument from the command line to the same name.

Oh this is even nicer.

is there any reason not to commit the caches to git?

The minted caches are generated files. Some may argue that any generated files, if not used for tests, should not be commit to git. I am open to this. In your case, apart from enlarging the size of git repo, you have to pay extra work to update the corresponding caches each time any code snippets is changed.

Maybe the minted options finalizecache and frozencache are useful in your case: run with finalizecache=true for the first time, then use frozencache for all the following runs.

The books are built on CI, and while I managed to shave off a few minutes by reusing the cache, the initial build time is still significant.

Observed that build #395 runs xelatex 5 times and takes 7min+, and build #400 runs xelatex 9 times and takes 13min+, the average time for each xelatex run is not that much for generating a 300pp.+ book.

By the way, substitute -pdf to -pdfxe in your config to latexmk might save some time since with -pdfxe latexmk will only generate pdf file for the last run, and generate intermediate xdv file (hence save the time of xdv to pdf conversion) for other runs.

Generating new format file to accelerate latex is more latex-nique but less commonly used, see https://tex.stackexchange.com/q/49388 and https://tex.stackexchange.com/q/51972 for help.

@hmemcpy
Copy link
Author

@hmemcpy hmemcpy commented Feb 14, 2020

Ooh those are very nice tips, thank you! I completely forgot about that Travis CI etc supports caching, so I've enabled caching on the server, hopefully, this will help.

The reason the latest build takes (currently) 13+ minutes is that I've changed it to build all editions (it currently builds 6 PDF in total :))
I've also done something nasty, to prevent minted from cleaning the caches of unused files between runs, I've overridden \minted@cleancache to do nothing.

Anyway, I am closing this issue. Thank you very much for your help! I will play with the settings you suggested. 🙏

@hmemcpy hmemcpy closed this Feb 14, 2020
@hmemcpy
Copy link
Author

@hmemcpy hmemcpy commented Feb 14, 2020

Sorry, last bump! I just tried it with -pdfxe, and it now takes HALF THE TIME on the CI! 6 minutes vs 13! And a large chunk of it is setting up TeXLive with Nix!

Thank you again so much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants