-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve import times #3055
Improve import times #3055
Conversation
505f89e
to
17188b5
Compare
7c8de06
to
08e26ff
Compare
Ready to review and merge. |
796b209
to
14510e3
Compare
Just going to note that this PR shaved about 10 minutes off running nbsmoke on all our current notebook examples, user guides etc. (244 notebooks in total), unfortunately it's still slightly to long a job for travis. |
Also please take a second to appreciate the coverage of 88.888% 😃 |
So roughly how long does it take? |
holoviews/core/data/__init__.py
Outdated
from ..ndmapping import OrderedDict | ||
from ..spaces import HoloMap, DynamicMap | ||
from ..util import (basestring, dimension_range as d_range, get_param_values, | ||
isfinite, process_ellipses, unique_iterator, wrap_tuple) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some weird issues when importing from .. import util
getting the wrong utilities, hence I did this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I would prefer to figure out the issue rather than switching to the unqualified version...
Travis jobs time out at 50 minutes, by that time it's about 90% through the notebooks. |
holoviews/core/util.py
Outdated
if 'dask' in sys.modules: | ||
import dask.dataframe as dd | ||
else: | ||
dd = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also do:
dd = None
if 'dask' in sys.modules:
import dask.dataframe as dd
Saves a line and maybe even more explicit that dd
always exists...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a strong opinion but I do think my suggested version is slightly better. Don't you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically have no preference here.
holoviews/core/data/util.py
Outdated
array_types += (da.Array,) | ||
return array_types | ||
|
||
def get_dask_array(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think get_dask_array_module
or dask_array_module
or even da_module
or get_da_module
would be clearer. You aren't getting the data type, you are getting a module...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
holoviews/core/data/__init__.py
Outdated
@@ -210,7 +208,7 @@ class Dataset(Element): | |||
|
|||
def __init__(self, data, kdims=None, vdims=None, **kwargs): | |||
if isinstance(data, Element): | |||
pvals = util.get_param_values(data) | |||
pvals = get_param_values(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the namespace qualification? Does it affect the import times? (It shouldn't!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above, I can have another look at this but:
I had some weird issues when importing
from .. import util
getting the wrong utilities, hence I did this.
holoviews/core/data/__init__.py
Outdated
return dim.range | ||
elif dim in self.dimensions() and data_range and len(self): | ||
lower, upper = self.interface.range(self, dim) | ||
else: | ||
lower, upper = (np.NaN, np.NaN) | ||
if not dimension_range: | ||
return lower, upper | ||
return util.dimension_range(lower, upper, dim.range, dim.soft_range) | ||
return d_range(lower, upper, dim.range, dim.soft_range) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the old version...I don't see why the name d_range
should be introduced for a single use (at least in this PR diff...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, see the comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, really it is two things:
- For some reason you have problems with the qualified import (the root problem which would ideally be fixed)
- You renamed to
d_range
to avoid a local variable name clash.
This PR would be greatly simplified if 1 can be fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having nightmares with this, but I should be able to fix it.
|
||
datatype = 'xarray' | ||
|
||
@classmethod | ||
def loaded(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method makes sense but based on the way it is used, shouldn't it be called available
instead of loaded
? I thought the idea was that you can check sys.modules
for availability and it is only loaded once actually imported...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly confused here, what is the difference between the semantics of available
and loaded
in your mind? Loaded currently checks if the dependency has been loaded, it does not check whether the library is importable (which is my interpretation of "available") because there is no way to check whether a library is importable without importing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it now, loaded
is correct.
I've made a few suggestions and everything is clearer now that I understand you expect that the user will populate The main issue is the removal of the qualified namespace. Restoring that would greatly simplify this PR and keep things cleaner overall imho. Otherwise, I'm very happy with the import time speed up and am happy to merge. |
I think things would be better if we could at least nbsmoke the user/getting started guides. If we skip the elements (or maybe only do those on doc builds?) can we get it to around 20 minutes (i.e around how long it takes to run test builds now) |
I think we can probably get them all working in a travis cron job. |
All looks good now, thanks! I'll merge when the tests pass. |
Can we split them into two separate jobs in the .travis.yml, or is Travis counting the entire time of all the CI targets? |
We can do that but that'll probably wait until we've set up the new architecture. Don't really want to make the travis.yml more complex only to then have to rewrite it all. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This PR attempts to improve import times by deferring imports of:
element/graphs.py
)Here are the current import timings:
In addition to some utilities to load the backends this PR works by declaring two new methods to
Interfaces
. The first methodInterface.loaded
is called to check whether an interfaces dependencies have been loaded, if not the interface is skipped, this defaults to true for backward compatibility. The second methodInterface.applies
is used to check whether the Interface applies to the supplied data object. By default this method simply checks whether the data is one of theInterface.types
, maintaining backward compatibility. However interfaces with heavy dependencies can subclass this method to defer the import.