Improve import times #3055

philippjfr · 2018-10-06T16:30:20Z

This PR attempts to improve import times by deferring imports of:

xarray
dask
datashader and therefore numba (these were accidentally imported automatically in element/graphs.py)

Here are the current import timings:

Before this PR (with iris): 2.5 seconds
Before this PR (without iris): 1.5 seconds
After this PR: 0.55 seconds

In addition to some utilities to load the backends this PR works by declaring two new methods to Interfaces. The first method Interface.loaded is called to check whether an interfaces dependencies have been loaded, if not the interface is skipped, this defaults to true for backward compatibility. The second method Interface.applies is used to check whether the Interface applies to the supplied data object. By default this method simply checks whether the data is one of the Interface.types, maintaining backward compatibility. However interfaces with heavy dependencies can subclass this method to defer the import.

Closes Optimize import startup time #1892

philippjfr · 2018-10-07T21:44:41Z

Ready to review and merge.

philippjfr · 2018-10-08T00:29:21Z

Just going to note that this PR shaved about 10 minutes off running nbsmoke on all our current notebook examples, user guides etc. (244 notebooks in total), unfortunately it's still slightly to long a job for travis.

philippjfr · 2018-10-08T01:59:08Z

Also please take a second to appreciate the coverage of 88.888% 😃

jlstevens · 2018-10-08T13:00:36Z

... unfortunately it's still slightly to long a job for travis.

So roughly how long does it take?

philippjfr · 2018-10-08T13:00:59Z

holoviews/core/data/__init__.py

+from ..ndmapping import OrderedDict
+from ..spaces import HoloMap, DynamicMap
+from ..util import (basestring, dimension_range as d_range, get_param_values,
+                    isfinite, process_ellipses, unique_iterator, wrap_tuple)


I had some weird issues when importing from .. import util getting the wrong utilities, hence I did this.

I see. I would prefer to figure out the issue rather than switching to the unqualified version...

philippjfr · 2018-10-08T13:03:04Z

So roughly how long does it take?

Travis jobs time out at 50 minutes, by that time it's about 90% through the notebooks.

jlstevens · 2018-10-08T13:03:25Z

holoviews/core/util.py

+    if 'dask' in sys.modules:
+        import dask.dataframe as dd
+    else:
+        dd = None


Could also do:

dd = None if 'dask' in sys.modules: import dask.dataframe as dd

Saves a line and maybe even more explicit that dd always exists...

This isn't a strong opinion but I do think my suggested version is slightly better. Don't you?

Basically have no preference here.

jlstevens · 2018-10-08T13:06:28Z

holoviews/core/data/util.py

+        array_types += (da.Array,)
+    return array_types
+
+def get_dask_array():


I think get_dask_array_module or dask_array_module or even da_module or get_da_module would be clearer. You aren't getting the data type, you are getting a module...

Sounds good.

jlstevens · 2018-10-08T13:08:12Z

holoviews/core/data/__init__.py

@@ -210,7 +208,7 @@ class Dataset(Element):

    def __init__(self, data, kdims=None, vdims=None, **kwargs):
        if isinstance(data, Element):
-            pvals = util.get_param_values(data)
+            pvals = get_param_values(data)


Why remove the namespace qualification? Does it affect the import times? (It shouldn't!)

See my comment above, I can have another look at this but:

I had some weird issues when importing from .. import util getting the wrong utilities, hence I did this.

jlstevens · 2018-10-08T13:09:56Z

holoviews/core/data/__init__.py

            return dim.range
        elif dim in self.dimensions() and data_range and len(self):
            lower, upper = self.interface.range(self, dim)
        else:
            lower, upper = (np.NaN, np.NaN)
        if not dimension_range:
            return lower, upper
-        return util.dimension_range(lower, upper, dim.range, dim.soft_range)
+        return d_range(lower, upper, dim.range, dim.soft_range)


I think I prefer the old version...I don't see why the name d_range should be introduced for a single use (at least in this PR diff...)

Again, see the comment above.

Well, really it is two things:

For some reason you have problems with the qualified import (the root problem which would ideally be fixed)

You renamed to d_range to avoid a local variable name clash.

This PR would be greatly simplified if 1 can be fixed!

Having nightmares with this, but I should be able to fix it.

jlstevens · 2018-10-08T13:12:04Z

holoviews/core/data/xarray.py


    datatype = 'xarray'

+    @classmethod
+    def loaded(cls):


This method makes sense but based on the way it is used, shouldn't it be called available instead of loaded? I thought the idea was that you can check sys.modules for availability and it is only loaded once actually imported...

Slightly confused here, what is the difference between the semantics of available and loaded in your mind? Loaded currently checks if the dependency has been loaded, it does not check whether the library is importable (which is my interpretation of "available") because there is no way to check whether a library is importable without importing it.

I understand it now, loaded is correct.

jlstevens · 2018-10-08T13:34:41Z

I've made a few suggestions and everything is clearer now that I understand you expect that the user will populate sys.modules (unless they use an interface with a literal constructor).

The main issue is the removal of the qualified namespace. Restoring that would greatly simplify this PR and keep things cleaner overall imho.

Otherwise, I'm very happy with the import time speed up and am happy to merge.

jlstevens · 2018-10-08T13:37:09Z

Travis jobs time out at 50 minutes, by that time it's about 90% through the notebooks.

I think things would be better if we could at least nbsmoke the user/getting started guides. If we skip the elements (or maybe only do those on doc builds?) can we get it to around 20 minutes (i.e around how long it takes to run test builds now)

philippjfr · 2018-10-08T13:40:12Z

I think things would be better if we could at least nbsmoke the user/getting started guides. If we skip the elements (or maybe only do those on doc builds?) can we get it to around 20 minutes (i.e around how long it takes to run test builds now)

I think we can probably get them all working in a travis cron job.

jlstevens · 2018-10-08T13:51:14Z

All looks good now, thanks! I'll merge when the tests pass.

jbednar · 2018-10-08T14:38:50Z

Travis jobs time out at 50 minutes, by that time it's about 90% through the notebooks.

Can we split them into two separate jobs in the .travis.yml, or is Travis counting the entire time of all the CI targets?

philippjfr · 2018-10-08T14:42:35Z

Can we split them into two separate jobs in the .travis.yml, or is Travis counting the entire time of all the CI targets?

We can do that but that'll probably wait until we've set up the new architecture. Don't really want to make the travis.yml more complex only to then have to rewrite it all.

github-actions · 2024-10-24T21:26:26Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

philippjfr force-pushed the import_times branch 5 times, most recently from 505f89e to 17188b5 Compare October 7, 2018 19:27

philippjfr added 3 commits October 7, 2018 21:10

Defer imports of xarray, dask and datashader

cb121ce

Fixed import issue

e78d46d

Fixed absolute imports

08e26ff

philippjfr force-pushed the import_times branch from 7c8de06 to 08e26ff Compare October 7, 2018 21:06

philippjfr added the tag: component: data label Oct 7, 2018

Simplified Interface.applies and types

14510e3

philippjfr force-pushed the import_times branch from 796b209 to 14510e3 Compare October 7, 2018 22:03

philippjfr commented Oct 8, 2018

View reviewed changes

jlstevens reviewed Oct 8, 2018

View reviewed changes

Addressed comments and cleaned up

feba8ea

jlstevens merged commit d6aa608 into master Oct 8, 2018

philippjfr added a commit that referenced this pull request Oct 8, 2018

Improve import times (#3055)

e79e28a

philippjfr deleted the import_times branch November 12, 2018 18:01

github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve import times #3055

Improve import times #3055

philippjfr commented Oct 6, 2018 •

edited

Loading

philippjfr commented Oct 7, 2018

philippjfr commented Oct 8, 2018

philippjfr commented Oct 8, 2018

jlstevens commented Oct 8, 2018

philippjfr Oct 8, 2018

jlstevens Oct 8, 2018

philippjfr commented Oct 8, 2018

jlstevens Oct 8, 2018 •

edited

Loading

jlstevens Oct 8, 2018

philippjfr Oct 8, 2018

jlstevens Oct 8, 2018

philippjfr Oct 8, 2018

jlstevens Oct 8, 2018

philippjfr Oct 8, 2018 •

edited

Loading

jlstevens Oct 8, 2018

philippjfr Oct 8, 2018

jlstevens Oct 8, 2018

philippjfr Oct 8, 2018

jlstevens Oct 8, 2018

philippjfr Oct 8, 2018

jlstevens Oct 8, 2018

jlstevens commented Oct 8, 2018

jlstevens commented Oct 8, 2018

philippjfr commented Oct 8, 2018

jlstevens commented Oct 8, 2018

jbednar commented Oct 8, 2018

philippjfr commented Oct 8, 2018

github-actions bot commented Oct 24, 2024

Improve import times #3055

Improve import times #3055

Conversation

philippjfr commented Oct 6, 2018 • edited Loading

philippjfr commented Oct 7, 2018

philippjfr commented Oct 8, 2018

philippjfr commented Oct 8, 2018

jlstevens commented Oct 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philippjfr commented Oct 8, 2018

jlstevens Oct 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philippjfr Oct 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens commented Oct 8, 2018

jlstevens commented Oct 8, 2018

philippjfr commented Oct 8, 2018

jlstevens commented Oct 8, 2018

jbednar commented Oct 8, 2018

philippjfr commented Oct 8, 2018

github-actions bot commented Oct 24, 2024

philippjfr commented Oct 6, 2018 •

edited

Loading

jlstevens Oct 8, 2018 •

edited

Loading

philippjfr Oct 8, 2018 •

edited

Loading