Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup other directories in .shiv #88

Closed
rsyring opened this issue Dec 16, 2018 · 11 comments
Closed

Cleanup other directories in .shiv #88

rsyring opened this issue Dec 16, 2018 · 11 comments

Comments

@rsyring
Copy link

rsyring commented Dec 16, 2018

I'd like to be able to have a shiv option that would instruct the bootstrap code to clean out old directories in .shiv for the current app. Maybe:

shiv --delete-other-builds

With that option, when the first bootstrap of a build finished, the boostrap code would essentially run:

rm -rf ~/.shiv/the_app_name_*

Excluding the folder that contains the files for the currently running app.

@lorencarvalho
Copy link
Contributor

hey @rsyring!

The lack of robustness in how shiv handles it's cache directory is something I think about regularly. I think there's a lot of room for improvement but it's a dangerous area because I think that folks (myself included) make a lot of assumptions about environments.

I like your idea a lot, how would you feel about this being implemented as an environment variable as opposed to a build-time argument? The reason I prefer environment variable is that it makes it opt-in on the runtime rather than locked in at build time.

Let me know what you think!

@rsyring
Copy link
Author

rsyring commented Dec 18, 2018

@lorencarvalho thanks for the quick reply. I would prefer that to nothing. However, I could see that be pretty annoying to communicate to users who might be using a binary, especially if they are non-technical. That is, in some scenarios, mine included, I actually want to lock it in at build time.

Maybe allow both:

shiv --delete-other-builds=always
shiv --delete-other-builds=never
shiv --delete-other-builds=environ

I think the first two options are obvious. The last one would be what you proposed and dependent at run time on some pre-determined environment var: SHIV_CLEAN_BUILDS=1. The presence of the variable triggers the cleaning functionality regardless of value.

Feedback welcome. Thanks again.

@lorencarvalho
Copy link
Contributor

hey @rsyring,

Sorry for the delay in getting back to you -- been thinking on how best to solve this. How would you feel about a generic --preamble option? This is something that pex has that is easy to implement and allows the pyz creator to supply some python code that would get executed at invocation time, effectively a hook. You could use this hook to do the kind of cleaning up that you are interested in. Here's an example of a proof-of-concept I have locally:

(shiv) linux ~/src/shiv git:power-tools ± $ echo "print('\n hello \n world \n')" > preamble.py
(shiv) linux ~/src/shiv git:power-tools ± $ shiv requests -o r --preamble preamble.py
Collecting requests
  Using cached https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl
Collecting idna<2.9,>=2.5 (from requests)
  Using cached https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests)
  Using cached https://files.pythonhosted.org/packages/60/75/f692a584e85b7eaba0e03827b3d51f45f571c2e793dd731e598828d380aa/certifi-2019.3.9-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests)
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting urllib3<1.25,>=1.21.1 (from requests)
  Using cached https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl
Installing collected packages: idna, certifi, chardet, urllib3, requests
Successfully installed certifi-2019.3.9 chardet-3.0.4 idna-2.8 requests-2.21.0 urllib3-1.24.1
(shiv) linux ~/src/shiv git:power-tools ± $ ./r

 hello
 world

Python 3.7.0 (default, Aug 11 2018, 03:49:56)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-17)] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import requests
>>> requests
<module 'requests' from '/home/lcarvalh/.shiv/r_45497010-ac3e-4b04-9a15-ea607093fba1/site-packages/requests/__init__.py'>

Currently the point at which this preamble is executed doesn't have access to the packed environment, but that's just a matter of improving the implementation. Let me know what you think!

@rsyring
Copy link
Author

rsyring commented Apr 1, 2019

@lorencarvalho Thanks for the write up. I'd be happy to use the preamble to do the environment cleanup. Although, I still think there is a good argument for having this particular feature built-in since it's how shiv works that creates the "mess" shiv should make it easy to also clean things up.

Although, I'm sympathetic to the concern you have about making environment assumptions, I'd like to point out:

  • You already make some assumptions about where to put the shiv expanded files and what to name them. You are just building on these assumptions when you remove them.
  • No one has to use the flag. If there is a special environment need, write a preamble for the customized cleanup routine.
  • You can extend the "No one has to use it" idea further by using an environment variable to always override the cleanup activity: SHIV_NO_CLEAN_BUILDS=1 That way, the end user still has final control of that feature if they need it.
  • Mark this feature experimental initially and figure out if it does actually bring a bunch of edge cases that make it a problem to maintain.

Thanks for your reply, not trying to be difficult. If you decide to go the preamble route, maybe we could give an example preamble which does cleanup.

@diefans
Copy link

diefans commented Apr 22, 2020

Is there any progress with this issue? Saw that shiv was changing cache signatures, which makes it harder to follow up...

I am using following for cleanup:

(cd ~/.shiv && for pyz in $(ls -1d * | grep "_" | sed 's/_[^\_]*//' | uniq); do for oldpyz in $(ls -1drt ${pyz}_* | head -n -1); do rm -rf $oldpyz; done; done)

@lorencarvalho
Copy link
Contributor

hey @diefans, yes we did change the dir names recently (from being a uuid to a sha256 of the pyz's contents), sorry about that!

I don't think we'll be implementing cleanup of the ~/.shiv directory in shiv itself, but I would certainly be open to an additional entry point (similar to the shiv-info utility that the shiv package provides, maybe a shiv-cleanup entry point would be welcome?)

@rsyring
Copy link
Author

rsyring commented Apr 27, 2020

FWIW, I still feel like if shiv is creating the problem, shiv is responsible for providing a good solution for doing the cleanup. And when I say shiv, I really mean the binaries that shiv is building.

Keep in mind that we are creating the binaries often b/c we want to ship them to systems where we don't have full python dev environments. So some kind of shiv-cleanup command doesn't help me when I'm shipping the binary to a system that doesn't have shiv installed on it.

Was there any particular objection to something like:

Maybe allow both:

shiv --delete-other-builds=always
shiv --delete-other-builds=never
shiv --delete-other-builds=environ

I think the first two options are obvious. The last one would be what you proposed and dependent at run time on some pre-determined environment var: SHIV_CLEAN_BUILDS=1. The presence of the variable triggers the cleaning functionality regardless of value.

Default to "never" and mark this feature as experimental and see how it pans out? If it works well, maybe you set the default to "always" eventually.

FWIW, I created a script to do the cleanup:

import json
import logging
import pathlib
import shutil
import os

from shiv.bootstrap import cache_path, current_zipfile
from shiv.bootstrap.environment import Environment

log = logging.getLogger(__name__)


def simulate_bootstrap():
    # get a handle of the currently executing zip file
    archive = current_zipfile()

    # create an environment object (a combination of env vars and json metadata)
    env = Environment.from_json(archive.read("environment.json").decode())

    # Store information into the environment for "shiv aware" applications
    os.environ['_SHIV_INFO_JSON'] = '''{{
        "cache_path": "{cache_dpath}",
        "environ": {env_json_str}
    }}'''.strip().format(
        cache_dpath=cache_path(archive, env.root, env.build_id),
        env_json_str=env.to_json()
    )


def cleanup_shivs():
    # Simulate what shiv.bootstrap:bootstrap() could do to help put pertient information in the
    # environment to help "shiv aware" applications save time by not having to reprocess what shiv
    # has already done.  Not needed if shiv does the cleanup.
    simulate_bootstrap()

    shiv_info_json = os.environ.get('_SHIV_INFO_JSON')
    if not shiv_info_json:
        log.debug('Not running from shiv bootstrap')
        return

    shiv_info = json.loads(shiv_info_json)
    env = shiv_info['environ']
    cache_dpath = pathlib.Path(shiv_info['cache_path'])

    build_id = env['build_id']

    dname_prefix = cache_dpath.name[0:-64]
    dname_length = len(cache_dpath.name)
    cache_root_dpath = cache_dpath.parent

    for dpath in cache_root_dpath.iterdir():
        dir_name = dpath.name

        if build_id in dir_name \
                or len(dir_name) != dname_length \
                or dir_name[0:-64] != dname_prefix \
                or not dpath.is_dir():
            continue

        log.debug(f'Deleting {dpath} and lock file')
        shutil.rmtree(dpath)

        lock_fname = f'.{dpath.stem}_lock'
        pathlib.Path(cache_root_dpath, lock_fname).unlink(missing_ok=True)

If you are interested, I can do some legwork to get this tested, documented, and put up a PR.

Thanks.

@lorencarvalho
Copy link
Contributor

Hey @rsyring,
sorry that it's taken me literal months to revisit this issue, but I think the --preamble idea I posed earlier will satisfy this request, take a look at my diff https://github.com/linkedin/shiv/compare/preamble?expand=1

I included an example script that cleans up old execution dirs:

(venv) darwin ~/src/shiv $ cat hello.py 
#!/usr/bin/env python3

import shutil

current = site_packages.parent
cache_path = current.parent
name, build_id = current.name.split('_')

if __name__ == "__main__":
    for path in cache_path.iterdir():
        if path.name.startswith(f"{name}_") and not path.name.endswith(build_id):
            print(f"deleting {path}")
            shutil.rmtree(path)

(venv) darwin ~/src/shiv git:preamble ± $ shiv -c http httpie setuptools --preamble hello.py -o http.pyz -q
(venv) darwin ~/src/shiv git:preamble ± $ ./http.pyz 
deleting /Users/lcarvalh/.shiv/http_22c7a1d09432cd2ee573ef360f7db8da966bd38ed3e6d8fb0a695d2fb562611d
deleting /Users/lcarvalh/.shiv/http_af27dd58fb73b60b31de79476d65ffe1e31a1f7f1efd4003fe4e864dc3512a6c
deleting /Users/lcarvalh/.shiv/http_905979cc227c81ac567e77271ede9f2394edb2182755eefb55aabae12e150c55
deleting /Users/lcarvalh/.shiv/http_edf9df1d0d243def92569ec0ae99705d26f9172fd0cb051ccdc2198acdd6f002
deleting /Users/lcarvalh/.shiv/http_4ae89d7b5745993c647e0df69e4846af2570729aa59a552f877f83a4e9c3781e
deleting /Users/lcarvalh/.shiv/http_27a78fe3ed4e9ef3c8e83d0f58d4f258f6704c66e3a784c451715729565b494c
deleting /Users/lcarvalh/.shiv/http_4925643d60a2c442c663cdd067af44ed3d9c1baaeb891111293fe706170a43b4
deleting /Users/lcarvalh/.shiv/http_41c4395c1f212521f2279831db24eeccbd1597023bd7834066b2398ad80c439c
deleting /Users/lcarvalh/.shiv/http_02eddecbc4ac31a08d0eb280294fa2698d9f6dd1ac7267058fcbef0d9643c0fa
deleting /Users/lcarvalh/.shiv/http_98e8bf10a09c5c9f4382b7e6373c3e1aa68fa2b7c27f128138e957b0057ea1eb
deleting /Users/lcarvalh/.shiv/http_77119414c923305efc4deeb168a88aaa8e6cb7dad28b7e337166a39d191dbf5e
deleting /Users/lcarvalh/.shiv/http_742c182a86c95b5a71b834367454b2abf86336f653069e3dceb5b3fc7cad1951
usage: http [--json] [--form] [--compress] [--pretty {all,colors,format,none}]
            [--style STYLE] [--unsorted] [--sorted]
            [--format-options FORMAT_OPTIONS] [--print WHAT] [--headers]
            [--body] [--verbose] [--all] [--history-print WHAT] [--stream]
            [--output FILE] [--download] [--continue]
            [--session SESSION_NAME_OR_PATH | --session-read-only SESSION_NAME_OR_PATH]
            [--auth USER[:PASS]] [--auth-type {basic,digest}] [--ignore-netrc]
            [--offline] [--proxy PROTOCOL:PROXY_URL] [--follow]
            [--max-redirects MAX_REDIRECTS] [--max-headers MAX_HEADERS]
            [--timeout SECONDS] [--check-status] [--path-as-is]
            [--verify VERIFY] [--ssl {ssl2.3,ssl3,tls1,tls1.1,tls1.2}]
            [--ciphers CIPHERS] [--cert CERT] [--cert-key CERT_KEY]
            [--ignore-stdin] [--help] [--version] [--traceback]
            [--default-scheme DEFAULT_SCHEME] [--debug]
            [METHOD] URL [REQUEST_ITEM [REQUEST_ITEM ...]]
http: error: the following arguments are required: URL

Theoretically you could even pack another pyz into the pyz (though I haven't tested this). Let me know what you think, if it works for you I will add some tests and cut a PR.

@luislew
Copy link
Contributor

luislew commented Aug 8, 2020

Hey @rsyring,
sorry that it's taken me literal months to revisit this issue, but I think the --preamble idea I posed earlier will satisfy this request, take a look at my diff https://github.com/linkedin/shiv/compare/preamble?expand=1

We're currently testing out shiv at NerdWallet (in place of pex), and two of the biggest outstanding issues for us with shiv are:

  • Lack of preamble support
  • Unchecked growth of ~/.shiv

It hadn't occurred to me that a preamble could solve both problems 💡 We'd love to see this feature added, thanks!

@lorencarvalho
Copy link
Contributor

@luislew that's awesome to hear! say hi to @borgstrom for me :)

I went ahead and cut a PR with this feature, take it for a spin and let me know what you think :)

@lorencarvalho
Copy link
Contributor

Closing this issue, the --preamble feature solves this pretty nicely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants