You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
computation of the H-statistic takes unusually long in h2o compared to other things that are out there.
I benchmarked on a 10,000 row dataset that's generated with sklearn-gbmi (we reference this package), and it takes ~ 30 seconds with sklearn and takes 30 minutes with h2o. The results are slightly different so I'm wondering if there's some sampling that's happening on sklearn-gbmi. Would potentially be useful to expose a sampling parameter, even if that means slightly more unstable results? People typically aren't looking for the exact H anyway it's more just rough strength of interaction.
* Initial speed up
* Additional speedups
* adjust test duration to reflect further improvements
* Increase maximum duration (jenkins is sometimes slow)
* rename test file to move it to large stage
computation of the H-statistic takes unusually long in h2o compared to other things that are out there.
I benchmarked on a 10,000 row dataset that's generated with sklearn-gbmi (we reference this package), and it takes ~ 30 seconds with sklearn and takes 30 minutes with h2o. The results are slightly different so I'm wondering if there's some sampling that's happening on sklearn-gbmi. Would potentially be useful to expose a sampling parameter, even if that means slightly more unstable results? People typically aren't looking for the exact H anyway it's more just rough strength of interaction.
This takes way too long:
The text was updated successfully, but these errors were encountered: