Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pexrc support #128

Merged
merged 5 commits into from Oct 9, 2015
Merged

pexrc support #128

merged 5 commits into from Oct 9, 2015

Conversation

lorencarvalho
Copy link
Contributor

Hi!

Big fan of pex. We are starting to face an issue with our deployments where setting custom PEX_ROOT and PEX_MODULE in some entry scripts. I'm not a huge fan of having to write a wrapper for our pex files and I'd rather drop a .pexrc with my deployment so the pex file itself can see these vars (rather than having to explicitly set them prior to invoking pex).

This PR just adds the ability to drop a ~/.pexrc or .pexrc (relative to the pex itself, in that precedence). Variables() reads this file and updates the environment before looking in os.environ, so anything in .pexrc is overwritten by runtime environment variables.

Let me know what you think, thanks for looking!

@@ -268,7 +268,8 @@ Tailoring PEX execution at build time

There are a few options that can tailor how PEX environments are invoked. These can be found
by running ``pex --help``. Every flag mentioned here has a corresponding environment variable
that can be used to override the runtime behavior.
that can be used to override the runtime behavior which can be set directly in your environment,
or sourced from a ``~/.pexrc`` file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't mention anything about the .pexrc file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true 💀

@sholsapp
Copy link
Contributor

@sixninetynine I like this change. It solves deployment problems that I have too when I have PEX files that invoke other PEX files.

@jsirois
Copy link
Member

jsirois commented Jun 12, 2015

I'm trying to understand the use case a bit better.

Could you not just do this:

$ PEX_ROOT=/tmp/a a.pex

Or the equivalent in your launching environment, say passing an env dict to subprocess.Popen if the launching environment is python (launching a pex from a pex), etc.

@lorencarvalho
Copy link
Contributor Author

@jsirois

This is exactly the behavior I'm trying to avoid.

One use case might be doing local dev vs. production deploy. I want to set PEX_ALWAYS_CACHE=true when I'm iterating on my code because otherwise I'd have to manually blow away my ~/.pex dir since the versions will always be the same every build. Once I deploy I don't need to do this anymore, so having that in my local ~/.pexrc but not on prod servers would be a nice clean way of solving this.

In addition to this, right now I'm having to have a wrapper script in front of my pex invocation setting the environment variables in the way you describe, I'd much rather be able to call the pex itself and have it pick up the vars from a file rather than wrapping.. It's cleaner, no need for a shell script or exec, just call pex directly.

@jsirois
Copy link
Member

jsirois commented Jun 12, 2015

@sixninetynine Makes sense. The use case you describe could just be solved by modifying your shell startup to include that env, but I can see where you might want to run some pexes in production mode on your machine (tools), but dev in a repo or 2 in --always-write-cache mode.

@sholsapp
Copy link
Contributor

@jsirois

Another use case that is hard to accommodate with existing solution is invoking PEX from PEX. Consider this example.

# Build a.pex, don't specify an entry point
pex ... -o a.pex
# Invoke a.pex, specify the entry point
PEX_MODULE=entry_a a.pex

# Build b.pex, burn in an entry point
pex ... -e entry_b -o b.pex

# Run some function of a.pex that subprocesses to b.pex
PEX_MODULE=entry_a a.pex

This last case should fail because entry_a -- specified by the original commands environment -- is not present in b.pex.

This is a contrived example, but I'm finding in my environment that is composed of many tool writers, that this is happening quite frequently. I think that if I had the ability to set ~/.pexrc and .pexrc I could solve the environment inheritance problem between these two types of pex files. Then my launching environment could always drop the right environment variables into the rc file instead of passing them along as environments.

What do you all call PEX files that have a -e burned into them versus one that requires PEX_MODULE? :)

@jsirois
Copy link
Member

jsirois commented Jun 12, 2015

@sholsapp Given an environment where both styles of pex are allowed, your example also makes sense.

Thanks to both of you for spelling these cases out.

(I call a pex that requires PEX_MODULE a python interpreter! (you could also pass -m [module]))

@sholsapp
Copy link
Contributor

@jsirois @wickman so any thoughts on this one? We see these types of interactions quite often. If using an rc file isn't the right or desired approach, do you have any pointers to solve these types of problems?

@wickman
Copy link
Contributor

wickman commented Jun 23, 2015

I've pondered this off-and-on. Can you elaborate more on the sorts of things that would go into pexrc? Some of them make sense e.g. PEX_ALWAYS_CACHE, PEX_VERBOSE, PEX_ROOT, PEX_PYTHON, PEX_PATH and so on. Others that dictate the entry point i.e. PEX_SCRIPT and PEX_MODULE seem like they would only be problematic as the pexrc would taint subprocesses. Perhaps explicitly disallow entry-point altering variables within pexrc?

The other thing that concerns me is the choice of precedence. It almost seems backwards to me. I think we'd want to prefer environment -> .pexrc -> ~/.pexrc. If I've set PEX_ANYTHING in the environment, I'd for sure not want it to be overridden by any files on disk. Also .pexrc in the cwd seems a bit too magical and prone to error. I've got a bunch of tools in my $PATH that are built as pexes and I'm not sure I'd want their behavior to be altered if I happened to land in a repo or directory with a .pexrc. Perhaps the .pexrc should be relative to the currently-invoked pex binary itself?

@lorencarvalho
Copy link
Contributor Author

Hiya @wickman, here's my thoughts

Some of them make sense e.g. PEX_ALWAYS_CACHE, PEX_VERBOSE, PEX_ROOT, PEX_PYTHON, PEX_PATH and so on. Others that dictate the entry point i.e. PEX_SCRIPT and PEX_MODULE seem like they would only be problematic as the pexrc would taint subprocesses. Perhaps explicitly disallow entry-point altering variables within pexrc?

The way I have it implemented currently is that it will just self._environ.update() anything parsed from the pexrc (meaning: any environment variable). This is kind of selfish of me, because our use case requires a couple other environment variables to deploy/run our pex'd apps (BASEDIR, for example).

For PEX_MODULE/PEX_SCRIPT, it would actually be super beneficial for our use case since we give users the option of building a "thin pex" a la, a virtualenv+interpreter pex, vs building a "fat pex" or as @sholsapp put it, a pex with an entry-point burned in. For the former case, we need to specify PEX_MODULE at runtime, and we currently do so via a bash script that invokes the interpreter with the variables set, such as PEX_MODULE=foo /path/to/pex $@. I'd much rather be able to have my deployment system just drop the pex and a relative .pexrc then just execute the pex file itself, and avoid this bash wrapping nonsense altogether.

The other thing that concerns me is the choice of precedence. It almost seems backwards to me. I think we'd want to prefer environment -> .pexrc -> ~/.pexrc. If I've set PEX_ANYTHING in the environment, I'd for sure not want it to be overridden by any files on disk.

I think perhaps I wasn't clear, the precedence is how you describe it, environment variables beats relative .pexrc beats ~/.pexrc. Because yeah, I agree, a file overriding an environment variable would be wacky as hell.

Also .pexrc in the cwd seems a bit too magical and prone to error. I've got a bunch of tools in my $PATH that are built as pexes and I'm not sure I'd want their behavior to be altered if I happened to land in a repo or directory with a .pexrc. Perhaps the .pexrc should be relative to the currently-invoked pex binary itself?

Annnnd this is a bug in my PR. It was my intention to have the 'relative .pexrc' be relative to the pex file itself (like you said). Fixed & updated PR 10d215e.

@wickman
Copy link
Contributor

wickman commented Jun 23, 2015

OK, this is looking better. Two last things:

  1. Please add a test. I'd like to see at least two: a) ensure key-value parsing works correctly; this probably means splitting out a _kv_from_string() method and then calling that _from_rc, and b) ensuring that rcfiles are parsed in the order that you expect.

  2. Add a short-circuit, e.g. PEX_IGNORE_RCFILES. This just means in init after self._environ is initialized from the environment, just do if not self.PEX_IGNORE_RCFILES: self._environ.update(...from_rc...).

@lorencarvalho
Copy link
Contributor Author

@wickman,

Hey, got a chance to look into these. I added a few tests, one for k/v parsing, as well as one that makes sure the precedence works and one to make sure rc file parsing works. I cleaned up _from_rc and added _get_kv in variables.py.

As far a short-circuit goes, I totally agree with the sentiment, but I'm not sure how to implement it cleanly. Right now I parse the rc files first, in order for them to be overwritten by the environment variables. If there were to be a short circuit env variable, we'd have to blow away the whole self._environ object and re-build it without the _from_rc, ya know?

For example, my status quo:

  def __init__(self, environ=None, rc='~/.pexrc'):
    self._environ = self._from_rc(rc)
    self._environ.update(environ.copy() if environ else os.environ)

In order to short-circuit, I'd have to do something (gross and horrible) along the lines of:

  def __init__(self, environ=None, rc='~/.pexrc'):
    tmp_environ = self._from_rc(rc)
    other_tmp_environ = environ.copy() if environ else os.environ
    if not other_temp_environ['PEX_IGNORE_RCFILES']:
      self._environ = other_tmp_environ
    else:
      self._environ = temp_environ.update(other_tmp_environ)

(obviously those variable names are garbage)
I'd rather not create an _initialize_env method or something magical (I tend to avoid magic on initialization of objects). Thoughts or suggestions?

@wickman wickman added this to the 1.1 milestone Jul 6, 2015
@lorencarvalho
Copy link
Contributor Author

@wickman @jsirois @Yasumoto

Would you mind if I rebased this on my fork and submitted a new PR? Pretty ugly commit history here :neckbeard:

edit: annnnnnnnnd I accidentally closed it.

@Yasumoto
Copy link
Contributor

I believe if you also rebase on master you can push to clean up this PR too

@lorencarvalho
Copy link
Contributor Author

@Yasumoto thanks for the tip!

@c17r
Copy link

c17r commented Sep 20, 2015

This would greatly help with a deployment situation I have. Right now I have the pex file called ._ex and a small shell script called .pex to set some variables and call the actual pex. Hacky but works.

Is there a sense of when this might land in master and be in a release?

@sholsapp
Copy link
Contributor

sholsapp commented Oct 2, 2015

Hey @wickman and @sixninetynine are there any outstanding issues with this PR that we need to be aware of/help out with? What do you you think of what @c17r asked about when this might land in master and be in a release?

@wickman
Copy link
Contributor

wickman commented Oct 2, 2015

I don't think there's a reason other than lack of free time that this can't be merged and released. I'm extra super busy at the moment with Periscope so I'd love to recruit one or two other hapless souls to manage releases. @jsirois and @Yasumoto are both folks I trust handing over the keys to do this; I just need to set up the proper group permissions on pypi (the permissions are already there on github since it's under the pantsbuild organization.)

@jsirois
Copy link
Member

jsirois commented Oct 2, 2015

I'd be happy to help review and release. I'm john.sirois (PGP key 67B5C626) on pypi.

@wickman
Copy link
Contributor

wickman commented Oct 2, 2015

thanks @jsirois, i've added you as a maintainer.

@jsirois
Copy link
Member

jsirois commented Oct 7, 2015

I'll be devoting this Friday 10/9 to getting this change in and the 1.1 release released. Github shows the branch as conflicting, so please spruce it up before Friday, I'll take a last look then and then get the ball rolling.

@jsirois jsirois mentioned this pull request Oct 7, 2015
* upstream/master:
  Migrate to the new travis-ci infra.
  [docs] update header in index.rst
  Add docs, change default behavior to use namesake command as pex.
  bdist_pex: Nicer output filename
  Don't choke if pkg has no console_scripts
  Allows --pex-args to take an argument
  Initial implementation of bdist_pex.
  Fix missed mock of safe_mkdir.  Pin pytest to 2.5.2.
  Add pex-identifying User-Agent to requests sessions.
  Fix the docs release headers.
  Normalize all names in ResolvableSet.  Fixes pex-tool#147.
  Release 1.0.3
  Fix pex-tool#139: PEX_SCRIPT fails for scripts from not-zip-safe eggs.
  Fix a logging typo when determining the minimum sys.path
  Remove unnecessary stderr print on SystemExit().code == None
  Bump the pre-release version and update the change log.
  Accomodate OSX `Python` python binaries.
  Release 1.0.2
  Address pex-tool#141.  Update version.py to reflect 1.0.2.dev0.
  Fix from_env to capture only explicitly set values.
pexrc.write('HELLO=FORTYTWO')
pexrc.flush()
v = Variables(environ={'PEX_IGNORE_RC_FILES': True, 'HELLO': 42}, rc=pexrc.name)
assert v._get_int('HELLO') == 42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not show ignore worked since 42 also wins by precedence if not ignored.

Fixing this test and optionally adding 2 more tests, one for .pexrc and one for ~/.pexrc .pexrc precedence would be great if you're around and have time to do this today. If not I'll merge this in at 1pm Mountain and follow up with these before release later in the afternoon.

All the rest lgtm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - this test is doubly broken atm. The IGNORE env var name is typoed and the value is invalid - must be a string. Its an accident ints work as environ= values, the intent is int([str]) in the parse routine and it happens that int(int) works just fine.

So this works:

$ git diff -U1
diff --git a/tests/test_variables.py b/tests/test_variables.py
index 124ec6d..447efb5 100644
--- a/tests/test_variables.py
+++ b/tests/test_variables.py
@@ -106,4 +106,4 @@ def test_rc_ignore():
     pexrc.flush()
-    v = Variables(environ={'PEX_IGNORE_RC_FILES': True, 'HELLO': 42}, rc=pexrc.name)
-    assert v._get_int('HELLO') == 42
+    v = Variables(environ={'PEX_IGNORE_RCFILES': 'True'}, rc=pexrc.name)
+    assert 'HELLO' not in v._environ

Since the fixed up test works - ie the prod code works - I'll merge this in and follow up with a fix to make the test valid.

jsirois added a commit that referenced this pull request Oct 9, 2015
@jsirois jsirois merged commit 6949d2a into pex-tool:master Oct 9, 2015
@sholsapp
Copy link
Contributor

sholsapp commented Oct 9, 2015

@jsirois Thanks a lot for your awesome support. 😄 I'm excited to use this in our deployment system.

@lorencarvalho lorencarvalho deleted the pexrc branch October 10, 2015 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants