Support for offline/disconnected operations? #824

lowell80 · 2018-09-06T19:21:35Z

Has anyone given thought to using 'pre-commit' on a system without Internet connectivity? I haven't been able to find anything on the topic thus far, so I figured posting a question here was the next logical step.

The use case

I'm working with a client who has content managed in git. Content changes are made both within the client environment and externally by my consulting group. The internal and external repos are kept in sync using the git bundle feature and file copies. It's awkward, but it works.

I'm using pre-commit to run a handful of standard hooks and a few custom ones (in my own public repo) when I'm on the external side, but my goal is to run the same hooks on the internal side as well. The technical challenge is that without internet access public repos can't be fetched.

I'm only using 2 hook repos (using language: python only) and they don't change that often. So I'm looking at ways to cache the repos offline or prebuild the virtual environment.

Current workaround

I actually came up with a workaround myself and considered writing it up as a blog post or contributing it to the docs somewhere, but the approach is a bit klugy and tedious. I suspect that a few minor changes to the core code base would make the process more practical (or at least less painful.)

Setup:

Make 2 pre-commit config files, one for online and one for offline operations. In theory, the only difference between the two is the repo (URL). For the offline version, these point to the local (cached) git bundle file. For this to work, the .pre-commit-config.yaml file is replaced with a symlink (and blacklisted with gitignore). So I keep two files in git: .pre-commit-config.yaml-bundle and .pre-commit-config-online, symlink to one of them, and try to keep them in sync.

Initial installation:

Run pip download to capture all the requirements for installing pre-commit as well as any external requirements of any of the hook repositories.
For all hook repos: Do a bare clone and export the entire history into a single git bundle file.
Transfer all bundles and pip downloaded files from the external (open) machine to the client-facing (internal) machine.
Install pre-commit using the local download folder: pip --find-links=/tmp/mydownloads --no-index pre-commit
Enable hooks on the repo. This requires setting PIP_FIND_LINKS=file:///tmp/mydownloads and PIP_NO_INDEX=1 and running pre-commit install --hooks so that all necessary hooks repos are setup immediately (because those environmental variables may not be set later.)

Updates are more or less the same as the installation.

Points of frustration:

Maintaining two .pre-commit-config.yaml files is error prone.
Pre-commit provides no way to pass-in pip command line args.

Alternatives

The above approach may be as good as it gets without modifying some part of pre-commit.
But given that it's open source, here's two alternate approaches I've considered.

Create a repo substitution mapping: Create a global (or user home-level) config file that would allow the "repo" path to be swapped out at runtime. This could either be a 1:1 mapping, or possibly a dynamic rewrite of some kind. (Like URL encode the 'repo' value.) This would allow me to substitute real repo URL with a path the locally cached "bundle" file, for example. And of course, this functionality would only be applied to the offline hosts.
Add migration functionality to cached repos: Essentially, the idea is to make a mechanism to export the ~/.cache/pre-commit/repoXXXXX directory structure and ship it off to another machine. The import process on the disconnected machine would extract and create the necessary entry in the db.db tracking system. There is certainly some potential for compatibility issues between machines, but that's already true in general and a fairly acceptable risk.

Another option that just occurred to me as I'm writing this is possibly using git's url.<base>.insteadOf config trick to handle the URL rewrite. Essentially implementing option #1 listed above without any changes to pre-commit.... Hmm, need to test that out, but seems promising. But in any case, I'd like to get feedback from others more knowledgeable than me.

I know this is outside of the typical use case, but it is possible to make this work. Any comments, suggestions, or ideas are welcome. I'm willing to code some stuff if need be, but wanted to check in with the community first.

Thanks so much for your time and consideration. Pre-commit has been so helpful to me in the relatively short time I've used it so far!

The text was updated successfully, but these errors were encountered:

asottile · 2018-09-06T19:41:17Z

In theory as long as your $XDG_CACHE_DIR is consistent and your host is consistent, you can zip up the ${XDG_CACHE_DIR:~/.cache}/pre-commit directory and move it to another machine.

If all of the repositories are set up, pre-commit will not access the network.

At lyft we do this via docker images where we essentially bake by doing (not exactly since we use source-to-image, but you can imagine):

WORKDIR /code
COPY .pre-commit-config.yaml .
RUN : \
    && git init . \
    && git add --all :/ \
    && pre-commit install-hooks

# ...

COPY . .

We then later run this image with docker run --net=none mysrv:precommit pre-commit run --all-files

In general though, the contents of ~/.cache/pre-commit aren't portable across machines unless those machines have identical package setups

Note also that pre-commit is not just an installer of python packages and so a pass through of "pip" options is insufficient.

asottile · 2018-09-06T19:48:04Z

But yeah to answer your question, if the host and remote machines are running at the same path you can essentially do:

export XDG_CACHE_DIR="/constant/path/to/cache"  # or if your user has the same name
pre-commit install-hooks
zip -r cache.zip "$XDG_CACHE_DIR"

# scp / rsync / whatever to remote machine
unzip ...

But anyway, yeah I don't have any interest implementing something like this in core, especially given how special cased and error prone it is.

lowell80 · 2018-09-07T02:02:48Z

I'll give that approach a try. I don't think I'll hurt too much by moving XDG_CACHE_DIR to a static location. (Not 100% sure what all uses that directory, but there isn't much running on the destination server.) One issue I have is that I'm running on a Mac and the server (disconnected) is on Linux. So even trivial things like case sensitivity and how git does checkout could come into play, let alone binaries, and so on. I may need to fire up a VM.

I think I'll also look further into the git url.<base>.insteadOf option as well. May end up being less data to transfer across into the restricted zone, and while being more work, it seems like a slightly cleaner approach.

I agree with your thoughts around the pip install pass through options not being sufficient. (I do tend to forget that there's more than just Python supported.) The fact that pip's behavior can be controlled via PIP_* environmental variables is probably good enough. I'm thinking that it may make sense to just permanently set these in the user's profile / bashrc scripts.

asottile · 2018-09-07T02:49:42Z

docker might be another choice for a build environment. And yeah the PIP_ environment variables are a good point. pre-commit doesn't touch those so you should be good to use them.

ah yeah url.<base>.insteadOf is probably an idea too.

Even if you don't set / change XDG_CACHE_DIR the important bit would be to make sure that the paths are consistent since virtualenvs contain absolute symlinks and full paths in shebangs. Moving from macos to linux also won't work as the compile targets are pretty different.

If setting XDG_CACHE_DIR is problematic, there's an undocumented (but imo unlikely to change) environment variable that you can use to point pre-commit directly at a directory you control: PRE_COMMIT_HOME.

So some full solution ideas:

redirect `git` / `pip` / etc.

tell git to clone from /path/to/repo/mirrors/... using url.<base>insteadOf (can probably set this in /etc/gitconfig
tell pip to install from /path/to/wheelhouse/... with PIP_NO_INDEX + PIP_FIND_LINKS (can probably set these in /etc/environ so interactive session gets them)
The other languages might be more difficult to redirect, but if you're just working with python this should be sufficient

prebuild the pre-commit cache on a remove machine and copy that in

XDG_CACHE_DIR=... pre-commit install-hooks or PRE_COMMIT_HOME=... pre-commit install-hooks (the latter will be less likely to interfere with other programs)
tar / zip / whatever the cache directory
untar / unzip / whatever the cache directory on the remote in the same location
source and destination have to be mostly equivalent machines, probably use docker or a VM or something to make sure they're the same

use a containerization solution

moving docker images around is much more likely to be portable, you can install the hooks into a docker image and run with --net=none in the DMZ

asottile · 2018-09-29T18:49:46Z

closing this for now -- @lowell80 if there's anything additional please comment and I can reopen!

thanks again for the issue 🎉

boholder · 2024-03-20T09:42:36Z

Note for who come here via search:
If you're going to use this solution (ensure same OS, same pre-commit cache path via PRE_COMMIT_HOME or XDG_CACHE_DIR):

prebuild the pre-commit cache on a remove machine and copy that in

Make sure the two python versions that contain pre-commit tool on online & offline computers are same, even the patch version.
I've tested that installing via 3.11.0 and running (pre-commit run) on 3.11.5 will make the tool starts to install environments again.
After aligned the python version, this solution worked for me.
I can observe that the content of two cache directories are different (different repo hashes).

asottile added the question label Sep 6, 2018

asottile closed this as completed Sep 29, 2018

lowell80 mentioned this issue Nov 20, 2019

enable installation in airgapped environments Kintyre/ksconf#67

Closed

UnknownPlatypus mentioned this issue Jun 7, 2023

does it require internet access? adamchainz/django-upgrade#355

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for offline/disconnected operations? #824

Support for offline/disconnected operations? #824

lowell80 commented Sep 6, 2018

asottile commented Sep 6, 2018

asottile commented Sep 6, 2018

lowell80 commented Sep 7, 2018

asottile commented Sep 7, 2018

asottile commented Sep 29, 2018

boholder commented Mar 20, 2024

Support for offline/disconnected operations? #824

Support for offline/disconnected operations? #824

Comments

lowell80 commented Sep 6, 2018

The use case

Current workaround

Alternatives

asottile commented Sep 6, 2018

asottile commented Sep 6, 2018

lowell80 commented Sep 7, 2018

asottile commented Sep 7, 2018

redirect git / pip / etc.

prebuild the pre-commit cache on a remove machine and copy that in

use a containerization solution

asottile commented Sep 29, 2018

boholder commented Mar 20, 2024

redirect `git` / `pip` / etc.