Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for offline/disconnected operations? #824

Closed
lowell80 opened this issue Sep 6, 2018 · 6 comments
Closed

Support for offline/disconnected operations? #824

lowell80 opened this issue Sep 6, 2018 · 6 comments
Labels

Comments

@lowell80
Copy link

lowell80 commented Sep 6, 2018

Has anyone given thought to using 'pre-commit' on a system without Internet connectivity? I haven't been able to find anything on the topic thus far, so I figured posting a question here was the next logical step.

The use case

I'm working with a client who has content managed in git. Content changes are made both within the client environment and externally by my consulting group. The internal and external repos are kept in sync using the git bundle feature and file copies. It's awkward, but it works.

I'm using pre-commit to run a handful of standard hooks and a few custom ones (in my own public repo) when I'm on the external side, but my goal is to run the same hooks on the internal side as well. The technical challenge is that without internet access public repos can't be fetched.

I'm only using 2 hook repos (using language: python only) and they don't change that often. So I'm looking at ways to cache the repos offline or prebuild the virtual environment.

Current workaround

I actually came up with a workaround myself and considered writing it up as a blog post or contributing it to the docs somewhere, but the approach is a bit klugy and tedious. I suspect that a few minor changes to the core code base would make the process more practical (or at least less painful.)

Setup:

  • Make 2 pre-commit config files, one for online and one for offline operations. In theory, the only difference between the two is the repo (URL). For the offline version, these point to the local (cached) git bundle file. For this to work, the .pre-commit-config.yaml file is replaced with a symlink (and blacklisted with gitignore). So I keep two files in git: .pre-commit-config.yaml-bundle and .pre-commit-config-online, symlink to one of them, and try to keep them in sync.

Initial installation:

  • Run pip download to capture all the requirements for installing pre-commit as well as any external requirements of any of the hook repositories.
  • For all hook repos: Do a bare clone and export the entire history into a single git bundle file.
  • Transfer all bundles and pip downloaded files from the external (open) machine to the client-facing (internal) machine.
  • Install pre-commit using the local download folder: pip --find-links=/tmp/mydownloads --no-index pre-commit
  • Enable hooks on the repo. This requires setting PIP_FIND_LINKS=file:///tmp/mydownloads and PIP_NO_INDEX=1 and running pre-commit install --hooks so that all necessary hooks repos are setup immediately (because those environmental variables may not be set later.)

Updates are more or less the same as the installation.

Points of frustration:

  1. Maintaining two .pre-commit-config.yaml files is error prone.
  2. Pre-commit provides no way to pass-in pip command line args.

Alternatives

The above approach may be as good as it gets without modifying some part of pre-commit.
But given that it's open source, here's two alternate approaches I've considered.

  1. Create a repo substitution mapping: Create a global (or user home-level) config file that would allow the "repo" path to be swapped out at runtime. This could either be a 1:1 mapping, or possibly a dynamic rewrite of some kind. (Like URL encode the 'repo' value.) This would allow me to substitute real repo URL with a path the locally cached "bundle" file, for example. And of course, this functionality would only be applied to the offline hosts.
  2. Add migration functionality to cached repos: Essentially, the idea is to make a mechanism to export the ~/.cache/pre-commit/repoXXXXX directory structure and ship it off to another machine. The import process on the disconnected machine would extract and create the necessary entry in the db.db tracking system. There is certainly some potential for compatibility issues between machines, but that's already true in general and a fairly acceptable risk.

Another option that just occurred to me as I'm writing this is possibly using git's url.<base>.insteadOf config trick to handle the URL rewrite. Essentially implementing option #1 listed above without any changes to pre-commit.... Hmm, need to test that out, but seems promising. But in any case, I'd like to get feedback from others more knowledgeable than me.

I know this is outside of the typical use case, but it is possible to make this work. Any comments, suggestions, or ideas are welcome. I'm willing to code some stuff if need be, but wanted to check in with the community first.

Thanks so much for your time and consideration. Pre-commit has been so helpful to me in the relatively short time I've used it so far!

@asottile
Copy link
Member

asottile commented Sep 6, 2018

In theory as long as your $XDG_CACHE_DIR is consistent and your host is consistent, you can zip up the ${XDG_CACHE_DIR:~/.cache}/pre-commit directory and move it to another machine.

If all of the repositories are set up, pre-commit will not access the network.

At lyft we do this via docker images where we essentially bake by doing (not exactly since we use source-to-image, but you can imagine):

WORKDIR /code
COPY .pre-commit-config.yaml .
RUN : \
    && git init . \
    && git add --all :/ \
    && pre-commit install-hooks

# ...

COPY . .

We then later run this image with docker run --net=none mysrv:precommit pre-commit run --all-files

In general though, the contents of ~/.cache/pre-commit aren't portable across machines unless those machines have identical package setups

Note also that pre-commit is not just an installer of python packages and so a pass through of "pip" options is insufficient.

@asottile
Copy link
Member

asottile commented Sep 6, 2018

But yeah to answer your question, if the host and remote machines are running at the same path you can essentially do:

export XDG_CACHE_DIR="/constant/path/to/cache"  # or if your user has the same name
pre-commit install-hooks
zip -r cache.zip "$XDG_CACHE_DIR"

# scp / rsync / whatever to remote machine
unzip ...

But anyway, yeah I don't have any interest implementing something like this in core, especially given how special cased and error prone it is.

@lowell80
Copy link
Author

lowell80 commented Sep 7, 2018

I'll give that approach a try. I don't think I'll hurt too much by moving XDG_CACHE_DIR to a static location. (Not 100% sure what all uses that directory, but there isn't much running on the destination server.) One issue I have is that I'm running on a Mac and the server (disconnected) is on Linux. So even trivial things like case sensitivity and how git does checkout could come into play, let alone binaries, and so on. I may need to fire up a VM.

I think I'll also look further into the git url.<base>.insteadOf option as well. May end up being less data to transfer across into the restricted zone, and while being more work, it seems like a slightly cleaner approach.

I agree with your thoughts around the pip install pass through options not being sufficient. (I do tend to forget that there's more than just Python supported.) The fact that pip's behavior can be controlled via PIP_* environmental variables is probably good enough. I'm thinking that it may make sense to just permanently set these in the user's profile / bashrc scripts.

@asottile
Copy link
Member

asottile commented Sep 7, 2018

docker might be another choice for a build environment. And yeah the PIP_ environment variables are a good point. pre-commit doesn't touch those so you should be good to use them.

ah yeah url.<base>.insteadOf is probably an idea too.

Even if you don't set / change XDG_CACHE_DIR the important bit would be to make sure that the paths are consistent since virtualenvs contain absolute symlinks and full paths in shebangs. Moving from macos to linux also won't work as the compile targets are pretty different.

If setting XDG_CACHE_DIR is problematic, there's an undocumented (but imo unlikely to change) environment variable that you can use to point pre-commit directly at a directory you control: PRE_COMMIT_HOME.

So some full solution ideas:

redirect git / pip / etc.

  • tell git to clone from /path/to/repo/mirrors/... using url.<base>insteadOf (can probably set this in /etc/gitconfig
  • tell pip to install from /path/to/wheelhouse/... with PIP_NO_INDEX + PIP_FIND_LINKS (can probably set these in /etc/environ so interactive session gets them)
  • The other languages might be more difficult to redirect, but if you're just working with python this should be sufficient

prebuild the pre-commit cache on a remove machine and copy that in

  • XDG_CACHE_DIR=... pre-commit install-hooks or PRE_COMMIT_HOME=... pre-commit install-hooks (the latter will be less likely to interfere with other programs)
  • tar / zip / whatever the cache directory
  • untar / unzip / whatever the cache directory on the remote in the same location
  • source and destination have to be mostly equivalent machines, probably use docker or a VM or something to make sure they're the same

use a containerization solution

  • moving docker images around is much more likely to be portable, you can install the hooks into a docker image and run with --net=none in the DMZ

@asottile
Copy link
Member

closing this for now -- @lowell80 if there's anything additional please comment and I can reopen!

thanks again for the issue 🎉

@boholder
Copy link

Note for who come here via search:
If you're going to use this solution (ensure same OS, same pre-commit cache path via PRE_COMMIT_HOME or XDG_CACHE_DIR):

prebuild the pre-commit cache on a remove machine and copy that in

Make sure the two python versions that contain pre-commit tool on online & offline computers are same, even the patch version.
I've tested that installing via 3.11.0 and running (pre-commit run) on 3.11.5 will make the tool starts to install environments again.
After aligned the python version, this solution worked for me.
I can observe that the content of two cache directories are different (different repo hashes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants