New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pip installable plugins #2552

Merged
merged 104 commits into from Sep 19, 2018

Conversation

Projects
None yet
7 participants
@jbeezley
Copy link
Member

jbeezley commented Jan 4, 2018

This adds a module meant to replace the functional contents of utilities/plugin_utilities.py for the girder 3.0 migration. This module is not yet used to maintain backwards compatibility. I have added it as an independent PR to collect feedback before moving on with the larger effort to replace core girder code with this new module, which will require backwards incompatible changes and migration of existing plugins.

I don't know how useful this is as a standalone PR, but it is the most I can achieve before I start breaking compatibility. Alternatively, I can make this a WIP and start pushing breaking changes here so others can get a better idea for where this is headed.

Update: All changes (including breaking) have been added to this PR. See the comments below.

:param plugins: A list of plugins to include (defaults to all plugins)
:param dependencyGetter: A function taking a plugin object and returning a list of dependencies
"""
dependencyGetter = dependencyGetter or (lambda p: p.dependencies)

This comment has been minimized.

@kotfic

kotfic Jan 5, 2018

Contributor

This is just a nitpick, but seeing as this is a private function I would move the default value logic for plugins and dependencyGetter into the public functions. I think that also makes it more clear what implementation difference between getToposortedPlugins and getToposortedWebDependencies is.

This comment has been minimized.

@jbeezley

jbeezley Jan 8, 2018

Author Member

I wanted to avoid exposing the dependencyGetter argument publicly because it exposes too much of the internal implementation that we may want to replace.

For the plugins argument, do you mean duplicate the

if plugins is None:
    plugins = allPlugins()

in the public functions? I could do that, but I don't see a real benefit.

This comment has been minimized.

@kotfic

kotfic Jan 9, 2018

Contributor

I was not suggesting exposing dependencyGetter in getToposortedPlugins or getToposortedWebDependencies. Only that you might move the default functionality defined in _getToposortedPlugins into getToposortedPlugins:

def getToposortedPlugins(plugins=None):
     """Return a toposorted list of plugins.
 
     By default this function will return a toposorted list of all detected plugins.  If a list of
     plugin names is provided, it will only return a minimal list of plugins required for the
     given top-level plugins.
     """
     return _getToposortedPlugins(
         plugins,
         dependencyGetter=lambda p: p.dependencies
    )

Equally you might consider moving the "defaulting" functionality of plugins into the public functions. That way _getToposortedPlugin accepts plain arguments rather than keyword arguments and is only responsible for doing its work. This makes testing the "defaulting" behavior explicitly apart of the public API and an easier target for testing. But also... it is just a nitpick 😄

for each plugin and if it raises any exception, an error message will be recorded and
the walk cancelled.
:param func: A function taking the plugin configuration object.

This comment has been minimized.

@kotfic

kotfic Jan 5, 2018

Contributor

It wasn't immediately obvious to me why this was constructed this way. It might be nice to have either extended docs, or a comment that explains why (in addition to what) this 'func' is. e.g. blah blah blah a closure that collects blah blah blah


def getPlugin(name):
"""Return a plugin configuration object or None if the plugin is not found."""
_findPlugins()

This comment has been minimized.

@kotfic

kotfic Jan 5, 2018

Contributor

Does it make sense to have _findPlugins return _pluginRegistry? that way access to the global state is isolated to _findPlugins?

@property
def description(self):
"""Return the plugin description defaulting to the classes docstring."""
return self.__class__.__doc__ or ''

This comment has been minimized.

@brianhelba

brianhelba Jan 5, 2018

Member

Could this be taken from the description in setup.py?

@property
def url(self):
"""Return a url reference to the plugin (usually a readthedocs page)."""
return self.URL

This comment has been minimized.

@brianhelba

brianhelba Jan 5, 2018

Member

Could this be taken from the URL in setup.py?

import rest # register new rest endpoints
"""
URL = ''
DEPENDENCIES = []

This comment has been minimized.

@brianhelba

brianhelba Jan 5, 2018

Member

Could this be expressed as ordinary install_requires in setup.py?

This comment has been minimized.

@jbeezley

jbeezley Jan 8, 2018

Author Member

It's possible, but the implementation would be somewhat complicated. The problem is that there isn't a one-to-one relationship between python packages and girder plugins. A plugin can depend on a package that isn't a plugin, a package can contain multiple plugins, and a plugin can depend on a package that contains a plugin without actually depending on the plugin itself.

It would be possible as a default behavior to examine the entrypoints and get the packages that contain them to generate this list automatically for the typical use case. Plugins requiring more complicated behavior could override it. I'm undecided if doing this is a bad idea or not.

This comment has been minimized.

@brianhelba

brianhelba Jan 9, 2018

Member

After in-person discussion with @jbeezley today, I expressed the opinion that:

A plugin can depend on a package that isn't a plugin

This hopefully won't be an issue if we're able to delegate dependency tree resolution to pip, then only look at what ends up getting installed and having a girder plugin entry point (not necessarily caring how it was resolved).

a package can contain multiple plugins

I think this should not be allowed. It needlessly complicates the situation, without offering any unique benefit (that I can think of, but I'd be happy to hear use cases).

a plugin can depend on a package that contains a plugin without actually depending on the plugin itself

Currently, this is staticWebDependencies, and we only support it for plugins' web components. In the future, we could continue to support that same functionality by having a plugin depend on another in only its package.json (allowing it to require / ES6-import from the other), but not reference the other with a pip dependency. However, for both the Python and web worlds, it is probably best if developers reorganize their code to distribute common components in a non-Girder-requiring library package which all plugins can depend on directly.

@jbeezley jbeezley force-pushed the pip-installable-plugins branch 2 times, most recently from 267eed2 to 55dec72 Jan 10, 2018

@danlamanna

This comment has been minimized.

Copy link
Member

danlamanna commented Jan 11, 2018

Related: #2385

@jbeezley

This comment has been minimized.

Copy link
Member Author

jbeezley commented Jan 12, 2018

I'm working on deriving plugin dependency information from setup.py, and I'm finding it a little awkward with the current approach. I'm toying with the idea of making it the responsibility of plugins themselves to load dependencies as necessary. This would look something like:

# in plugin1 that depends on plugin2
def load(info):
    girder.plugin.findPlugin(plugin2).load(info)
   # load self

The load functions would include a mechanism preventing themselves from being called twice. This would allow plugins more flexibility in handling errors and even open the possibility for optional dependencies. It would also remove a lot code in this PR related topological sorting and error handling.

The main issue with this approach is that it removes the ability to get a static list of toposorted plugins which is required for our web client build as it exists now. Depending on where we go with it, this may be a non-issue. I think the approach ultimately will be that the new equivalent of girder-install web will build all installed plugins rather than attempt to decipher a minimal list of requirements. I also don't think it will require a topological sorting of the plugins because it won't involve multiple invocations of webpack for each plugin as we do now. @brianhelba does this match your vision for the refactored client build?

@danlamanna

This comment has been minimized.

Copy link
Member

danlamanna commented Jan 12, 2018

I'm working on deriving plugin dependency information from setup.py, and I'm finding it a little awkward with the current approach.

What is awkward exactly?

The main issue with this approach is that it removes the ability to get a static list of toposorted plugins which is required for our web client build as it exists now.

Doesn't this also prevent the UI from knowing which dependent plugins to enable?

@jbeezley

This comment has been minimized.

Copy link
Member Author

jbeezley commented Jan 12, 2018

What is awkward exactly?

It increases the complexity of _walkPluginTree logic because it not only needs to look for a list of plugins but also reference the packages those plugins are contained in. This add a whole new layer of failure conditions that need to be handled. The more I think about it, it seems like duplication of python's package dependency logic for no good reason. Removing this logic also reduces the size of this PR significantly.

Doesn't this also prevent the UI from knowing which dependent plugins to enable?

The UI doesn't really need to know the dependent plugins. It would only enable the single plugin in the database. When the server reloads, dependent plugins would load automatically. When rerendering the plugin admin page, it would get the list of loaded plugins rather than attempting to get the list of "enabled" plugins in the database. This could also satisfy @brianhelba's desire to distinguish between plugins that were explicitly enabled and those that were enabled by the dependency mechanism.

@jbeezley jbeezley force-pushed the pip-installable-plugins branch from 55dec72 to 690335a Jan 12, 2018

@jbeezley

This comment has been minimized.

Copy link
Member Author

jbeezley commented Jan 12, 2018

I pushed a change that implements the new behavior. I think it is dramatically better. The only complicated bit is a function wrapper to the load method that handles logging and ensuring the load method is only called once.

I still have to finish the testing, but a plugin descriptor class would now look like this:

from girder import plugin

class Cats(plugin.GirderPlugin):
   def load(self, info):
        plugin.getPlugin('Pets').load(info)  # load dependency

        try:
            plugin.getPlugin('Food').load(info)  # try to load an optional dependency
        except Exception:
             pass

        # get loaded plugins to generate optional behavior
        pluginList = plugin.loadedPlugins()

        # register events and endpoints as usual...

For now, all plugin information is determined through the python package, but we might need to add a cross reference to an npm package name for the client build.

@brianhelba

This comment has been minimized.

Copy link
Member

brianhelba commented Jan 13, 2018

I like where @jbeezley is going, but let me restate some things to help clarify my own thinking. It seems like there are 3 cases here:

  1. A plugin depends on another's for assets (code, files, etc.), but doesn't need the other to actually run
  2. A plugin depends on another to also run, but the load ordering doesn't matter
  3. A plugin depends on another to run, and requires the other to load first

To satisfy 1, we:

  • In general, refactor common assets into a non-Girder-plugin library package, and depend on that.
  • In the specific case of client-side plugins (what we now call staticWebDependencies), depend on the other in package.json only. However, the library package approach should be encouraged where possible.

To satisfy 2, we:

  • To the extent necessary, do the steps in 1
  • Add the other plugin as a requirement in setup.py.

To satisfy 3, we:

  • Do the steps in 2
  • Add plugin.getPlugin('OtherPlugin').load(info) to the depending plugin's load. This has the added benefit of allowing the actual order of dependant plugin loading to be explicitly controlled, which I believe was non-deterministic before.
  • Open question: how do we ensure a load order on the client side?
    • If we had a more traditional JS workflow, we'd just have the depending plugin import OtherPlugin; at the top of it's on entry file, forcing the dependent plugin to run first (Girder core would also have something like import OtherPlugin; / require('OtherPlugin');, but whichever ran second would be a no-op.
    • Since we rely on the HTML <script> tag ordering to determine load order, I'm not sure how we get the load order information to whatever builds the HTML (which is currently a Python-Mako loop, but will hopefully be Webpack in the future).

Note, a side effect of this is that we no longer distinguish between a plugin being "installed" and "enabled". If it's installed, its entry point triggers, and it gets enabled.


To address @jbeezley 's earlier question about my vision for the client build, my strong hope is that the client-side build process can something look like:

  • girder-init-web: A Python script to generate a package.json script, containing pip-installed plugins as dependencies.
  • npm install yarn install
  • npm run build yarn run build: Use Grunt+Webpack to build JS and CSS files.

As a further dream extension, we could consider the merits of both using Webpack to generate HTML files (allowing cache-busting, hot reloading, and other goodies). Relatedly, we could get rid of code-splitting and the Dll*Plugin (allowing case 3 from above to be more easily satisfied), but I admittedly don't understand the current rationale for code-splitting well enough to judge whether this would be an improvement (let's discuss this bit in-person, since I don't want to derail this thread).

@jbeezley

This comment has been minimized.

Copy link
Member Author

jbeezley commented Jan 13, 2018

Open question: how do we ensure a load order on the client side?

That's a good question. I'm tempted to make the claim that it girder shouldn't have to resolve the load order on the client in the same way that it isn't necessary on the server. My plan for this branch is to minimize the changes to the client, which will probably mean adding a hacky solution just to get it to work. I suspect this will involve explicitly resolving the load order in some way, but it should much simpler than what I was doing before. I think I can get away with tracking the order from load wrapper in the metaclass.

Note, a side effect of this is that we no longer distinguish between a plugin being "installed" and "enabled". If it's installed, its entry point triggers, and it gets enabled.

I'm not against this, but I don't think it is necessary to accomplish the goals of this PR. It may be controversial, so we would have to discuss doing so with the larger group.

@jbeezley

This comment has been minimized.

Copy link
Member Author

jbeezley commented Jan 13, 2018

This is about where I want it now. I think I'm going to start on breaking changes from here on.

@jbeezley jbeezley force-pushed the pip-installable-plugins branch 2 times, most recently from 71223fb to e647d3b Jan 20, 2018

@jbeezley jbeezley changed the title Add a registry for loading and querying plugins [WIP] Pip installable plugins Jan 21, 2018

@jbeezley jbeezley force-pushed the pip-installable-plugins branch from e647d3b to 698c17f Jan 21, 2018

@jbeezley

This comment has been minimized.

Copy link
Member Author

jbeezley commented Jan 21, 2018

I'm relatively happy with the current state of the plugin registry in regards to the server side infrastructure. I have migrated the jobs and worker plugins to installable packages. As of now, the biggest missing piece is related to the client build. I have started experimenting by manually constructing a webpack.config.js and package.json in a staging directory. Once I settle on a solution, I'll describe it here.

For the server components, I think things have progressed enough that we can start farming out tasks to interested parties. These are a few things I think could be started now before issues with the client are solved.

  1. Finish migrating the other plugins' server code.
  2. Fix testing server components for plugins.
  3. Think about deployment, release, and CI infrastructure. This is a little premature, but it would be useful to have someone think about answers to questions like:
    1. How does this break ansible roles across the company? Is there a simple migration strategy?
    2. Do these changes allow (or force) us to redesign how we do CI testing?
    3. How will we handle releases now that there are many more steps involved?
@danlamanna

This comment has been minimized.

Copy link
Member

danlamanna commented Jan 26, 2018

I branched off of this to see how the server-side testing for plugins could work, see pip-installable-plugins...pip-installable-plugins-pytest. Running pytest from the jobs directory there should work.

@zachmullen

This comment has been minimized.

Copy link
Member

zachmullen commented Jan 26, 2018

@danlamanna I notice you actually got the model singletons to reload via

for m in model_base._modelSingletons:
        del m.__class__._instance

When I do that I get a ton of weird errors that make it seem like the database is not getting dropped between test runs... did you encounter that also?

@jbeezley jbeezley force-pushed the pip-installable-plugins branch 3 times, most recently from 6781aef to e76423d Jan 27, 2018

@danlamanna

This comment has been minimized.

Copy link
Member

danlamanna commented Jan 29, 2018

@zachmullen

This comment has been minimized.

Copy link
Member

zachmullen commented Jan 29, 2018

I was talking about the Girder test suite, which was breaking for me when I added the model singleton reset logic. I haven't tracked down exactly why, but the symptoms seem to indicate that after the models are reinitialized, all the tests are pointing to the same database rather than getting their own unique one.

@jbeezley jbeezley force-pushed the pip-installable-plugins branch from e76423d to 8f3082a Feb 1, 2018

@jbeezley

This comment has been minimized.

Copy link
Member Author

jbeezley commented Feb 1, 2018

I just got an extremely preliminary, but minimally viable client build process working with plugins. You can try the following (probably best in a fresh virtualenv):

pip install -e git://github.com/jbeezley/girder_plugin_jobs.git#egg=girder_plugin_jobs -e git://github.com/girder/girder.git@pip-installable-plugins#egg=girder
girder-build
girder-serve

That will install a ported version of the jobs plugin I put in an external repo to show that this all works without copying plugin or symlinking the code to any specific place.

There are a couple of technical details of note:

  1. This creates a staging directory and static directory under <sys.prefix>/share/girder. When executing girder-build it generates a package.json and symlinks the build infrastructure to the staging area. It then runs npm install and grunt. No npm install step is done inside grunt any more. In a development environment, the client code is symlinked from the source directory, but in a production environment it will be fetched from npm.
  2. The static files are served from under the built path in the sys.prefix rather than from inside girder's python package.
  3. There is some ugliness in symlinking node packages as usual. I found that including the node flag --preserve-symlinks to the grunt command makes it work as expected.

There are at least a couple of issues I need to address:

  1. There needs to be a way to pass information from a plugin's client code to girder's build system to answer questions like "What is the entrypoint?" "What is the name of the plugin (as opposed to the package name)?" "Is there a webpack helper function?". I'm thinking of adding a girder key in package.json for this purpose.
  2. I want to remove grunt or at least reduce the webpack task to a single config with multiple entrypoints. This will probably mean dropping the DLL plugin in favor of more traditional code splitting, but I haven't explored it much. This will have the benefit that running webpack as a standalone script will work (including webpack watch 🎆).

I think my next step will be to try and port over candela, which has more complicated build requirements, to see if there are any specific issues with this approach.

@zachmullen

This comment has been minimized.

Copy link
Member

zachmullen commented Feb 1, 2018

@jbeezley that sounds great! One change I would suggest is that any reference to grunt should be replaced with a call to an npm script, e.g. npm run build or npm run girder_build. Whenever we do the web client rewrite, we'll be abandoning grunt if all goes well.

@jbeezley jbeezley force-pushed the pip-installable-plugins branch from eb429eb to f15ab17 Feb 1, 2018

jbeezley added some commits Jun 26, 2018

Add remote_worker setup.py to the cache checksum
This is not ideal.  For one, the cache should be invalidated if any
plugin setup.py changes.  Second, we should not be caching the
virtualenv's containing editable installs.

The issue this fixes is related to
    pypa/pip#4537

The downgrade of celery from #2761 triggers a behavior (maybe a bug?) in
pip that only occurs when the "egg-info" directory for an editable
install does *not* exist.  A second invocation of `pip install` will
run successfully.

To reproduce this problem outside of CI, in a fresh virtualenv:
```
pip install -e . -e ./plugins/jobs -e ./plugins/remote_worker
pip install celery==4.2
rm -fr ./plugins/remote_worker/*.egg-info
pip install -e ./plugins/remote_worker
```

One alternative option is to turn off caching of the virtualenv.
Automatically inspect npm package names for plugins
This change removes the NPM_PACKAGE_NAME property on the GirderPlugin
class.  Instead, plugin developers must explicitly set the path to the
web client package and the npm package name will be inferred.  Advanced
usage (such as fetching the package from npm) is still supported by
overriding the `npmPackages` method.

@jbeezley jbeezley force-pushed the pip-installable-plugins branch from c31b517 to 79d36f8 Sep 19, 2018

@danlamanna
Copy link
Member

danlamanna left a comment

best pr ever

@jbeezley jbeezley merged commit c565081 into master Sep 19, 2018

11 checks passed

ci/circleci: py2_coverage Your tests passed on CircleCI!
Details
ci/circleci: py2_integrationTests Your tests passed on CircleCI!
Details
ci/circleci: py2_serverInstall_serverTest Your tests passed on CircleCI!
Details
ci/circleci: py3_coverage Your tests passed on CircleCI!
Details
ci/circleci: py3_integrationTests Your tests passed on CircleCI!
Details
ci/circleci: py3_serverInstall Your tests passed on CircleCI!
Details
ci/circleci: py3_serverTest Your tests passed on CircleCI!
Details
ci/circleci: py3_webBuild_webTest Your tests passed on CircleCI!
Details
codecov/project/python_client 84.64% (target 80%)
Details
codecov/project/server 85.37% (target 80%)
Details
codecov/project/web_client 80.09% (target 75%)
Details

@jbeezley jbeezley deleted the pip-installable-plugins branch Sep 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment