Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve import times #3055

Merged
merged 5 commits into from
Oct 8, 2018
Merged

Improve import times #3055

merged 5 commits into from
Oct 8, 2018

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Oct 6, 2018

This PR attempts to improve import times by deferring imports of:

  • xarray
  • dask
  • datashader and therefore numba (these were accidentally imported automatically in element/graphs.py)

Here are the current import timings:

  • Before this PR (with iris): 2.5 seconds
  • Before this PR (without iris): 1.5 seconds
  • After this PR: 0.55 seconds

In addition to some utilities to load the backends this PR works by declaring two new methods to Interfaces. The first method Interface.loaded is called to check whether an interfaces dependencies have been loaded, if not the interface is skipped, this defaults to true for backward compatibility. The second method Interface.applies is used to check whether the Interface applies to the supplied data object. By default this method simply checks whether the data is one of the Interface.types, maintaining backward compatibility. However interfaces with heavy dependencies can subclass this method to defer the import.

@philippjfr philippjfr force-pushed the import_times branch 5 times, most recently from 505f89e to 17188b5 Compare October 7, 2018 19:27
@philippjfr
Copy link
Member Author

Ready to review and merge.

@philippjfr
Copy link
Member Author

Just going to note that this PR shaved about 10 minutes off running nbsmoke on all our current notebook examples, user guides etc. (244 notebooks in total), unfortunately it's still slightly to long a job for travis.

@philippjfr
Copy link
Member Author

Also please take a second to appreciate the coverage of 88.888% 😃

@jlstevens
Copy link
Contributor

... unfortunately it's still slightly to long a job for travis.

So roughly how long does it take?

from ..ndmapping import OrderedDict
from ..spaces import HoloMap, DynamicMap
from ..util import (basestring, dimension_range as d_range, get_param_values,
isfinite, process_ellipses, unique_iterator, wrap_tuple)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some weird issues when importing from .. import util getting the wrong utilities, hence I did this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I would prefer to figure out the issue rather than switching to the unqualified version...

@philippjfr
Copy link
Member Author

So roughly how long does it take?

Travis jobs time out at 50 minutes, by that time it's about 90% through the notebooks.

if 'dask' in sys.modules:
import dask.dataframe as dd
else:
dd = None
Copy link
Contributor

@jlstevens jlstevens Oct 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also do:

dd = None
if 'dask' in sys.modules:
   import dask.dataframe as dd

Saves a line and maybe even more explicit that dd always exists...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a strong opinion but I do think my suggested version is slightly better. Don't you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically have no preference here.

array_types += (da.Array,)
return array_types

def get_dask_array():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think get_dask_array_module or dask_array_module or even da_module or get_da_module would be clearer. You aren't getting the data type, you are getting a module...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@@ -210,7 +208,7 @@ class Dataset(Element):

def __init__(self, data, kdims=None, vdims=None, **kwargs):
if isinstance(data, Element):
pvals = util.get_param_values(data)
pvals = get_param_values(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the namespace qualification? Does it affect the import times? (It shouldn't!)

Copy link
Member Author

@philippjfr philippjfr Oct 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above, I can have another look at this but:

I had some weird issues when importing from .. import util getting the wrong utilities, hence I did this.

return dim.range
elif dim in self.dimensions() and data_range and len(self):
lower, upper = self.interface.range(self, dim)
else:
lower, upper = (np.NaN, np.NaN)
if not dimension_range:
return lower, upper
return util.dimension_range(lower, upper, dim.range, dim.soft_range)
return d_range(lower, upper, dim.range, dim.soft_range)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the old version...I don't see why the name d_range should be introduced for a single use (at least in this PR diff...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, see the comment above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, really it is two things:

  1. For some reason you have problems with the qualified import (the root problem which would ideally be fixed)
  2. You renamed to d_range to avoid a local variable name clash.

This PR would be greatly simplified if 1 can be fixed!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having nightmares with this, but I should be able to fix it.


datatype = 'xarray'

@classmethod
def loaded(cls):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method makes sense but based on the way it is used, shouldn't it be called available instead of loaded? I thought the idea was that you can check sys.modules for availability and it is only loaded once actually imported...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly confused here, what is the difference between the semantics of available and loaded in your mind? Loaded currently checks if the dependency has been loaded, it does not check whether the library is importable (which is my interpretation of "available") because there is no way to check whether a library is importable without importing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it now, loaded is correct.

@jlstevens
Copy link
Contributor

I've made a few suggestions and everything is clearer now that I understand you expect that the user will populate sys.modules (unless they use an interface with a literal constructor).

The main issue is the removal of the qualified namespace. Restoring that would greatly simplify this PR and keep things cleaner overall imho.

Otherwise, I'm very happy with the import time speed up and am happy to merge.

@jlstevens
Copy link
Contributor

Travis jobs time out at 50 minutes, by that time it's about 90% through the notebooks.

I think things would be better if we could at least nbsmoke the user/getting started guides. If we skip the elements (or maybe only do those on doc builds?) can we get it to around 20 minutes (i.e around how long it takes to run test builds now)

@philippjfr
Copy link
Member Author

I think things would be better if we could at least nbsmoke the user/getting started guides. If we skip the elements (or maybe only do those on doc builds?) can we get it to around 20 minutes (i.e around how long it takes to run test builds now)

I think we can probably get them all working in a travis cron job.

@jlstevens
Copy link
Contributor

All looks good now, thanks! I'll merge when the tests pass.

@jlstevens jlstevens merged commit d6aa608 into master Oct 8, 2018
@jbednar
Copy link
Member

jbednar commented Oct 8, 2018

Travis jobs time out at 50 minutes, by that time it's about 90% through the notebooks.

Can we split them into two separate jobs in the .travis.yml, or is Travis counting the entire time of all the CI targets?

@philippjfr
Copy link
Member Author

Can we split them into two separate jobs in the .travis.yml, or is Travis counting the entire time of all the CI targets?

We can do that but that'll probably wait until we've set up the new architecture. Don't really want to make the travis.yml more complex only to then have to rewrite it all.

philippjfr added a commit that referenced this pull request Oct 8, 2018
@philippjfr philippjfr deleted the import_times branch November 12, 2018 18:01
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants