Skip to content

Commit

Permalink
DAAL-based RandomForestRegressor is known to give non-deterministic r…
Browse files Browse the repository at this point in the history
…esults for the same inputs

Consequentially, new in sklearn 0.21 check_fit_idempotent test for all estimators
fail for daal4py.sklearn.ensemble.RandomForestRegressor

The failure is small:

```
Mismatch: 15%
Max absolute difference: 5.9604645e-08
Max relative difference: 3.3125482e-07
 x: array([ 0.316434,  0.428736, -0.397625,  0.521435,  0.150143,  0.119337,
           -0.55916 , -0.38844 ,  0.509067, -2.236972,  0.015745,  0.138834,
           -0.053688, -0.949258, -0.595061, -0.044984,  0.319812, -1.284615,
           -1.002518,  0.529771], dtype=float32)
 y: array([ 0.316434,  0.428736, -0.397625,  0.521435,  0.150143,  0.119337,
           -0.55916 , -0.38844 ,  0.509067, -2.236972,  0.015745,  0.138834,
           -0.053688, -0.949258, -0.595061, -0.044984,  0.319812, -1.284615,
           -1.002518,  0.529771], dtype=float32)
```

and goes away if comparison is slightly relaxed.

Intel(R) DAAL team is working on a fix.

For now, `check_fit_idempotent` is skipped for RandomForestRegressor.
  • Loading branch information
oleksandr-pavlyk committed May 29, 2019
1 parent c719c36 commit 7ab470b
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions tests/test_estimators.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import unittest

from sklearn.utils.estimator_checks import check_estimator
import sklearn.utils.estimator_checks

from daal4py.sklearn.neighbors import KNeighborsClassifier
from daal4py.sklearn.ensemble import RandomForestClassifier
Expand All @@ -15,7 +16,20 @@ def test_RandomForestClassifier(self):
check_estimator(RandomForestClassifier)

def test_RandomForestRegressor(self):
# check_fit_idempotent is known to fail with DAAL's decision
# forest regressor, due to different partitioning of data
# between threads from run to run.
# Hence skip that test
def dummy(**args):
pass
try:
saved = sklearn.utils.estimator_checks.check_fit_idempotent
sklearn.utils.estimator_checks.check_fit_idempotent = saved
except AttributeError:
saved = None
check_estimator(RandomForestRegressor)
if saved is not None:
sklearn.utils.estimator_checks.check_fit_idempotent = saved


if __name__ == '__main__':
Expand Down

0 comments on commit 7ab470b

Please sign in to comment.