[mrg] Per spec configuration #888

choldgraf · 2019-07-03T19:25:08Z

In #887 we talked about how it'd be cool to have per-repo configuration. This is a first attempt at implementing it.

The basic idea is that we operate similar to how "banned" and "quota increase" functionality works, with one difference:

Instead of a list of specification regexes to match to "quota increase" or "banned", we have a dictionary where keys are specification regexes, and values are dictionaries of "key:value" pairs that can over-ride configuration on a repo-specific basis.

For example, you could do something like:

config:
    GitHubRepoProvider:
      # Add banned repositories to the list below
      # They should be strings that will match "^<org-name>/<repo-name>.*"
      spec_configuration_override:
        ^ines/spacy-binder.*:
		    quota: 1337
        ^binder-examples/requirements.*:
   			banned: true

Questions to answer

I agree w/ @betatim that "spec_configuration_override" isn't great, any suggestions for other names?
What's a clean way to document what key:value configuration pairs are meaningful in this case? We'll need to add checks for them one-by-one in the code, and for now I guess we'll just start with a quota: <int> key/value. In the future, should we just expose this with good documentation for what the accepted key/values are? (we could also imagine some key/values being repoprovider-specific)

Need to add

Docs
Tests
Better naming

betatim · 2019-07-04T05:09:35Z

binderhub/builder.py

@@ -424,9 +424,14 @@ def escape(s):

    async def launch(self, kube, provider):
        """Ask JupyterHub to launch the image."""
+        # Load the spec-specific configuration if it has been overridden
+        spec_config = provider.spec_configuration_override
+
        # check quota first
        if provider.has_higher_quota():


Should we retire this at the same time as we introduce the new functionality?

Probably good to have a deprecation transition, in which case include a warning if the old way is used.

I'm fine w/ a transition (though we've still only ever had one release of BinderHub so the "higher_quota" never actually made it into a release lol)

True on paper and in practice we release several times a week and people actually use BinderHub so I think we should be nice to people. The "higher quota" thing has only existed for a week or two so :-/

warnings.warn( "XXX is deprecated, use YYY instead", DeprecationWarning )

is the generic Python warnings thing. It isn't ideal (most people won't see deprecation warnings except in their tests). Does traitlets have a tool for deprecations?

Due to the invisible-by-default deprecation warnings, I think that application-level deprecation warnings should actually not use DeprecationWarning, so they are visible by default. It's a tricky choice since deprecation warnings are always safely ignored "for now" but if folks never see them, they may as well not exist. As for a transition, I was mostly thinking of ourselves at mybinder.org, since it would allow us to decouple "adopt the new way" from "bump binderhub" PRs without config changes.

FWIW, I don't think the deprecation needs to be a blocker here, as the old way still works. Cleanly deprecating the old way can be a follow-up PR.

Does traitlets have a tool for deprecations?

Traitlets doesn't have any helpers for deprecations, but the way traitlets works can make writing deprecations handy since deprecations can go in observers, and completely removed from the bodies of methods, etc. for example:

@observe('old_way') def _old_way_changed(self, change): warnings.warn("old way is deprecated as of...new way is...") self.new_way = modified_form(change.new) # never need to look directly at self.old_way after this ... def method(self): only_use(self.new_way)

The simplest version of this can lead to undefined behavior (which wins?) if both the old and new way are specified in the same config, but I think that's not a big deal if the old-way warning is visible.

I think using the traitlets observe together with printing a warning (via warnings.warn()) is a good way to do this.

betatim · 2019-07-04T05:13:00Z

binderhub/builder.py

        # check quota first
        if provider.has_higher_quota():
            quota = self.settings.get('per_repo_quota_higher')
+        elif "per_repo_quota" in spec_config:
+            quota = spec_config.get('per_repo_quota')


Suggested change

quota = spec_config.get('per_repo_quota')

quota = spec_config.get('quota')

(also on the line above)

Though you could write this as:

quota = spec_config.get('quota', self.settings.get('per_repo_quota'))

which is a well used pattern in Python for saying "get this from the dictionary and if it doesn't exist use this thing as default"

binderhub/repoproviders.py

betatim · 2019-07-04T05:19:29Z

How about spec_config as name? With a comment in the doc string of the traitlet to say that you only have to use this to override settings per spec/if no entry is found defaults are used.

For documentation of which keys/options are valid I'd put a pointer in the traitlets' doc that says "repo providers are free to define their own options, see each repo provider for a list of valid keys and their meaning". Then add appropriate docs to the repo providers/base repo provider.

betatim · 2019-07-04T05:32:13Z

binderhub/repoproviders.py

@@ -111,6 +121,21 @@ def has_higher_quota(self):
                return True
        return False

+    def spec_configuration_override(self):


What do you think of naming this spec_config() and taking care of finding and populating all the default values in this function? Then in the builder we wouldn't need to have any checks for "do we override this or not, what is the default". We have one place that is already ready to use.

The reason to call it spec_config is that it would be for one particular spec and not a whole bunch any more. Maybe we could even call it repo_config? Depends a bit on what words the builder uses (does it refer to building a spec or a repo). But this might be getting a bit OCD on naming :)

TL;DR: should we compute the configuration (combining over rides and defaults) for the spec here?

I like it - that seems reasonable to me, lemme give it a stab and see how it looks

see the latest push for an attempt at what this would look like, is this the kinda thing you had in mind?

1399d58#diff-c5688934f1e6dc3e932b6c84c1bbbd5dR133

In this case, spec_config would be defined at the helm chart config level, and repo_config is a method of RepoProvider that returns a dictionary of configuration values for that repository (which might have been updated from the spec_config setting)

minrk · 2019-07-04T13:10:00Z

binderhub/builder.py

@@ -424,9 +424,14 @@ def escape(s):

    async def launch(self, kube, provider):
        """Ask JupyterHub to launch the image."""
+        # Load the spec-specific configuration if it has been overridden
+        spec_config = provider.spec_configuration_override
+
        # check quota first
        if provider.has_higher_quota():


Probably good to have a deprecation transition, in which case include a warning if the old way is used.

minrk · 2019-07-04T13:14:40Z

binderhub/repoproviders.py

+        for spec, config in self.spec_configuration_override:
+            # Ignore case, because most git providers do not
+            # count DS-100/textbook as different from ds-100/textbook
+            if re.match(spec, self.spec, re.IGNORECASE):


FWIW, using re.match compiles the regex on each call, which means we are compiling every regex here on every repo provider. Compiling them on application load would make this a bit more efficient.

Related to this: dicts are in a random order, but overrides might want to have a priority if more than one pattern matches the repo. With a dict, the behavior is undefined, with a list it would be consistent. So perhaps a list of tuples is more appropriate than a dict?

mmmm, I am not sure how to do this :-) feel free to suggest the code that would still allow for config patterns like the one in #888 (comment) and I'll give it a shot

Good point about the random order! I had pondered that one might have multiple patterns and if they should be additive or "last one wins" or what, then decided that for a first pass we should ignore that. However the random order thing is something we need to address (or inform the user that they have more than one pattern that matches and that this is an error (for now)).

I think what we want is a list like:

specs = [{"pattern": "some-pattern", "quota": 10}, {"pattern": "some-other-pattern", "quota": 12}]

in YAML it would look like:

# not super sure about the first one but the second two should work - {pattern: "yet another pattern", quota: 33} - pattern: some-pattern quota: 10 - pattern: some-other-pattern quota: 12

Ah I like this pattern because it’s more explicit! @minrk does it look good to you as well?

yes, that looks great to me! The only question is that 'pattern' is in the same namespace as the config overrides. This is nice since it's more concise, but could allow collisions or confusion. The more rigorous, but slightly more tedious would be to explicitly separate the match from the override config:

- pattern: some-pattern config: quota: 10 other_config: "x"

This would allow adding other siblings to pattern if we e.g. had some other options that should influence the selection rather than the overridden config. I don't have strong feelings, but I've occasionally regretted not separating things like this in JupyterHub in the past (see spawner options in the REST API).

You can see the difference in code where 'pattern' needs to be handled specially:

# copy because we need to modify the dict update_config = config.copy() # remove pattern before updating to config because pattern is not part of the config: pattern = config.pop('pattern') if pattern.match(...): config.update(update_config)

vs:

pattern = item['pattern'] update_config = item['config'] if pattern.match(spec): config.update(update_config)

betatim · 2019-07-09T09:28:37Z

FWIW, I don't think the deprecation needs to be a blocker here, as the old way still works. Cleanly deprecating the old way can be a follow-up PR.

That is a good point. Let's follow that suggestion. Being able to bump binderhub on mybinder.org without having to also change the config makes life easier (and lets us use henchbot!).

choldgraf · 2019-07-09T21:52:46Z

See the latest push for the following updates:

The API should now follow what @minrk suggested, how does that look?
A buncha tests added (let's see if they pass!)
Some minor docs added

binderhub/repoproviders.py

minrk

Left a few comments in-line, but this is looking good to me. I think the only decision left is what do we want to happen when two patterns match:

both config overrides are applied, later in the list takes priority for any multi-matches (current behavior)
only first match is applied (break after match)
only last match is applied (break after match, reverse iteration order)
both applied, but earlier entries have higher priority (reverse iteration order)

I think any choice is reasonable (current behavior is most powerful, but I suspect folks who don't think carefully about the implementation could expect items higher in their yaml config to have higher priority), but we should mention which one we chose in the docs.

For example, the current behavior:

If multiple patterns match a given repository, all matching overrides will be applied in the order they appear in the list, meaning that overrides in the last item in the list will have highest priority.

binderhub/repoproviders.py

minrk · 2019-07-16T14:30:08Z

binderhub/tests/test_repoproviders.py

+
+    provider = GitHubRepoProvider(
+        spec='jupyterhub/zero-to-jupyterhub-k8s/v0.4',
+        config=[base_config]


spec_config?

betatim · 2019-07-16T17:12:22Z

I'd got with option (1) for the moment in terms of priority/when to stop.

choldgraf · 2019-07-16T18:09:45Z

thanks for the feedback on tests etc

The latest commit:

Fixes the tests so they pass
Adds a test for default behavior w/o a matching spec
Adds a note to documentation about configuration override

jupyterhub/binderhub#888

betatim · 2019-07-17T05:59:12Z

Nice work! Tests, docs and a new feature!

What is the etiquette for marking conversations as "resolved"? Can I do it in a PR that isn't mine? Should the author do it? When I am the author I tend to mark things are solved (if they don't auto collapse) when I have implemented the feedback or the discussion has somehow ended. In this PR there were several people involved so it felt weird to do that but I also found myself scrolling through each discussion several times trying to not miss new comments.

choldgraf · 2019-07-17T15:33:55Z

Re: "resolved" that's a good question, I'm generally happy for people to resolve things even if I'm the author on the PR, as it often means that my to-do list has shrunk :-)

as with the other team compass guidelines, I trust the judgment of other folks on the project so I'm happy for them to take the initiative there

per repo configuratoin

63e1d6d

betatim reviewed Jul 4, 2019

View reviewed changes

binderhub/repoproviders.py Outdated Show resolved Hide resolved

betatim reviewed Jul 4, 2019

View reviewed changes

betatim changed the title ~~per repo configuration~~ [WIP] Per spec configuration Jul 4, 2019

minrk reviewed Jul 4, 2019

View reviewed changes

adding more generic configuration per repo

1399d58

choldgraf force-pushed the repo_config branch from 9c74f11 to d0585fb Compare July 9, 2019 21:50

updating the repo-specific configuration

2922ee8

choldgraf force-pushed the repo_config branch from d0585fb to 2922ee8 Compare July 9, 2019 21:53

choldgraf commented Jul 9, 2019

View reviewed changes

binderhub/repoproviders.py Show resolved Hide resolved

minrk reviewed Jul 16, 2019

View reviewed changes

fixing tests and adding docs about override behavior

9c3045e

choldgraf changed the title ~~[WIP] Per spec configuration~~ [mrg] Per spec configuration Jul 17, 2019

betatim merged commit 137ec19 into jupyterhub:master Jul 17, 2019

yuvipanda pushed a commit to jupyterhub/helm-chart that referenced this pull request Jul 17, 2019

[binderhub] Automatic update for commit 9c3045e

ab17734

jupyterhub/binderhub#888

henchbot mentioned this pull request Jul 17, 2019

binderhub: 5ca42ec...9c3045e jupyterhub/mybinder.org-deploy#1085

Merged

henchbot mentioned this pull request Jul 28, 2019

binderhub: 9c3045e...f5cf428 jupyterhub/mybinder.org-deploy#1095

Merged

choldgraf added the enhancement label Oct 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mrg] Per spec configuration #888

[mrg] Per spec configuration #888

choldgraf commented Jul 3, 2019 •

edited

Loading

betatim Jul 4, 2019

minrk Jul 4, 2019

choldgraf Jul 4, 2019 •

edited

Loading

betatim Jul 5, 2019

minrk Jul 9, 2019

betatim Jul 9, 2019

betatim Jul 4, 2019

betatim commented Jul 4, 2019

betatim Jul 4, 2019

choldgraf Jul 4, 2019

choldgraf Jul 4, 2019

minrk Jul 4, 2019

minrk Jul 4, 2019

choldgraf Jul 4, 2019

betatim Jul 5, 2019

choldgraf Jul 5, 2019 •

edited

Loading

minrk Jul 9, 2019

betatim commented Jul 9, 2019

choldgraf commented Jul 9, 2019

minrk left a comment

minrk Jul 16, 2019

betatim commented Jul 16, 2019

choldgraf commented Jul 16, 2019

betatim commented Jul 17, 2019

choldgraf commented Jul 17, 2019

	quota = spec_config.get('per_repo_quota')
	quota = spec_config.get('quota')

[mrg] Per spec configuration #888

[mrg] Per spec configuration #888

Conversation

choldgraf commented Jul 3, 2019 • edited Loading

Questions to answer

Need to add

Choose a reason for hiding this comment

Choose a reason for hiding this comment

choldgraf Jul 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betatim commented Jul 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

choldgraf Jul 5, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betatim commented Jul 9, 2019

choldgraf commented Jul 9, 2019

minrk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betatim commented Jul 16, 2019

choldgraf commented Jul 16, 2019

betatim commented Jul 17, 2019

choldgraf commented Jul 17, 2019

choldgraf commented Jul 3, 2019 •

edited

Loading

choldgraf Jul 4, 2019 •

edited

Loading

choldgraf Jul 5, 2019 •

edited

Loading