Speed up NdMapping.groupby #349

philippjfr · 2015-12-11T04:24:38Z

This PR refactors the NdMapping.groupby operation into a separate function and provides an alternative implementation based on Pandas, which is significantly faster for large datasets. You can see the linear performance scaling by the old implementation and might just make out the sublinear performance of pandas, which becomes very significant for large datasets >10000 items. This is a temporary workaround until we come up with a general solution data API for NdMapping types that's being discussed in #347.

jlstevens · 2015-12-11T13:40:54Z

holoviews/core/util.py

+    import pandas
+    ndmapping_groupby = ndmapping_groupby_pandas
+except:
+    ndmapping_groupby = ndmapping_groupby_python


I would make this a parameterized function (i.e a class) that uses one of two possible bothmethods in __call__. My only other comment is that we need some docstrings here...

jlstevens · 2015-12-11T13:48:40Z

Definitely a very valuable PR: it cleans up groupby on MultiDimensionalMapping, moves useful functionality into util and most importantly offers a major performance improvement (for people with pandas installed) with very little code.

I'm happy to implement my suggestion of turning ndmapping_groupby into a parameterized function once the tests are passing...

philippjfr · 2015-12-12T03:38:46Z

I'm now done with this, I'd be happy if you refactored it into ParameterizedFunction, then we can merge.

Just plotted the improvement factor as a function of samples, huge difference for large N:

jlstevens · 2015-12-12T11:40:21Z

Great! The key thing is that all the tests are passing now...

My only comment now is whether you are happy for me to make this into a parameterized function? Or do you object to having a single parameterized function for groupby?

Shouldn't take long to do and I am happy to change it if you are busy.

philippjfr · 2015-12-12T12:55:20Z

My only comment now is whether you are happy for me to make this into a parameterized function? Or do you object to having a single parameterized function for groupby?

Must have missed my comment somehow:

I'm now done with this, I'd be happy if you refactored it into [a] ParameterizedFunction, then we can merge.

jlstevens · 2015-12-12T13:30:59Z

Sorry yes - I skimmed your reply too quickly. I'll do the final refactor now.

Avoids hardcoding the 'Index' dimension used for NdElement types

jlstevens · 2015-12-12T14:54:30Z

If the tests pass now, I'll go ahead and merge.

Significant speed up of NdMapping.groupby

jbednar · 2015-12-12T16:22:18Z

Excellent!

Added faster pandas based NdMapping.groupby helper function

b42241a

philippjfr added this to the v1.4.1 milestone Dec 11, 2015

jlstevens reviewed Dec 11, 2015
View reviewed changes

philippjfr added 5 commits December 12, 2015 02:03

Moved OrderedDict conditional import into core.util

166dcad

Fixes and cleanup of NdElement

7023605

Fixed sorting in NdMapping pandas groupby implementation

214ed15

Fixes for pure Python NdElement groupby

4698695

Disabled item_checks in NdMapping.groupby

c22a10b

philippjfr and others added 2 commits December 12, 2015 13:44

Small fix to ndmapping_groupby_pandas function

a76f744

Avoids hardcoding the 'Index' dimension used for NdElement types

Refactored ndmapping_groupby into a parameterized function

e6b9320

jlstevens added a commit that referenced this pull request Dec 12, 2015

Merge pull request #349 from ioam/ndmapping_groupby

6b9fe08

Significant speed up of NdMapping.groupby

jlstevens merged commit 6b9fe08 into master Dec 12, 2015

jlstevens deleted the ndmapping_groupby branch December 12, 2015 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up NdMapping.groupby #349

Speed up NdMapping.groupby #349

philippjfr commented Dec 11, 2015

jlstevens Dec 11, 2015

jlstevens commented Dec 11, 2015

philippjfr commented Dec 12, 2015

jlstevens commented Dec 12, 2015

philippjfr commented Dec 12, 2015

jlstevens commented Dec 12, 2015

jlstevens commented Dec 12, 2015

jbednar commented Dec 12, 2015

Speed up NdMapping.groupby #349

Speed up NdMapping.groupby #349

Conversation

philippjfr commented Dec 11, 2015

jlstevens Dec 11, 2015

Choose a reason for hiding this comment

jlstevens commented Dec 11, 2015

philippjfr commented Dec 12, 2015

jlstevens commented Dec 12, 2015

philippjfr commented Dec 12, 2015

jlstevens commented Dec 12, 2015

jlstevens commented Dec 12, 2015

jbednar commented Dec 12, 2015