Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default cache directory? Remove default cache directory from .gitignore? #412

Open
vincerubinetti opened this issue Feb 8, 2021 · 6 comments

Comments

@vincerubinetti
Copy link
Collaborator

vincerubinetti commented Feb 8, 2021

For as long as I've used Manubot I know it's had a cache of some sort, but I don't know how it works under the hood. Does it just keep it in memory? In some OS-level cache folder?

Anyway, it would be useful for very large manuscripts or other large citation jobs if Manubot could check and create something like a .manubot-cache file. I think it'd be most convenient if it were able to be checked and created in the current working directory, or wherever Manubot was called from. That way, the file could be tracked with git, and in scenarios like gh-actions, Manubot could read it to skip generating citations.

Lab Website Template basically implemented its own version of this method of caching.

@vincerubinetti vincerubinetti changed the title Cache citations a file? Cache citations in a file? Feb 8, 2021
@vincerubinetti
Copy link
Collaborator Author

Nevermind, I just realized that I think the problem is that the default cache directory is /output, and that is in .gitignore by default.

So perhaps I should change this issue to the question, should we change the default cache directory and/or should we include the cache file in git by default.

Also this issue should probably be transferred to the manubot repo.

@agitter
Copy link
Member

agitter commented Feb 8, 2021

Is this similar to what you're looking for manubot/manubot#258?

The pull request linked in that issue added a new type of caching.

@vincerubinetti
Copy link
Collaborator Author

Is there documentation for manubot-bibliography-cache yet? Having trouble understanding how it's different from --cache-directory and the files in there.

Either way I'm not too picky about the format the cache is in like that person was. I honestly didn't know about the cache directory option which is why I made the issue. That should solve the problem.

@vincerubinetti vincerubinetti changed the title Cache citations in a file? Change default cache directory? Remove default cache directory from .gitignore? Feb 8, 2021
@agitter
Copy link
Member

agitter commented Feb 8, 2021

There is some documentation of manubot-bibliography-cache here https://manubot.github.io/manubot/reference/manubot/pandoc/cite_filter/

@agitter
Copy link
Member

agitter commented Feb 8, 2021

To follow up, for large manuscripts I believe the best way to cache references is to copy the output references.json to a manual references file in the content directory (as in greenelab/covid19-review#847). Manubot will look for files matching the content/manual-references*.* pattern.

manubot-bibliography-cache is more appropriate for users using the Manubot Pandoc filter directly, i.e., outside of a rootstock-derived manuscript repository.

@dhimmel
Copy link
Member

dhimmel commented Feb 15, 2021

The manubot process --cache-directory option specifies a cache directory, which is used to write binary cache files. We set this to a directory that is ignored by git because the files are binary and are not intended to be cached forever. The two things that use --cache-directory (as far as I remember) are the requests cache (a sqlite db) and OpenTimestamps. These caches are meant to speed up subsequent runs on the same system or CI provider.

manubot-bibliography-cache is a cache of references in CSL JSON/YAML formatted references that is intended to be tracked with git. It's a newer feature based on a user suggestion in manubot/manubot#258. Currently, rootstock does not use manubot-bibliography-cache, but perhaps it should. The main benefit is that the entire cache of references is visible and editable to the user. It combines manual references and cached references into a single file.

It's not entirely clear to me how we would enable manubot-bibliography-cache for rootstock. But I think if we found nice implementation, it would simplify manual references and reference caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants