Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] FIX Avoid accumulating forest predictions in non-threadsafe manner #9830

Merged
merged 2 commits into from Oct 16, 2017

Conversation

@jnothman
Copy link
Member

@jnothman jnothman commented Sep 26, 2017

Resolved #9393. See #9734.

Ping @ogrisel.

@jnothman jnothman changed the title FIX Avoid accumulating forest predictions in non-threadsafe manner [MRG] FIX Avoid accumulating forest predictions in non-threadsafe manner Sep 26, 2017
@jnothman
Copy link
Member Author

@jnothman jnothman commented Sep 26, 2017

The problem with this is that it makes RF prediction more memory intensive by a factor of n_estimators / n_jobs (assuming n_samples for prediction is much larger than tree size).

I could do this summation over an iterator instead, which with joblib we will only use in the n_jobs=1 case... WDYT?

@jnothman jnothman added this to the 0.19.1 milestone Sep 26, 2017
@@ -378,13 +378,16 @@ def feature_importances_(self):
# ForestClassifier or ForestRegressor, because joblib complains that it cannot
# pickle it when placed there.

def accumulate_prediction(predict, X, out):

This comment has been minimized.

@amueller

amueller Sep 27, 2017
Member

The problem is that "out" is shared among threads, right?

@@ -378,13 +378,16 @@ def feature_importances_(self):
# ForestClassifier or ForestRegressor, because joblib complains that it cannot
# pickle it when placed there.

def accumulate_prediction(predict, X, out):

This comment has been minimized.

@amueller

amueller Sep 27, 2017
Member

The problem is that "out" is shared among threads, right?

This comment has been minimized.

@amueller

amueller Sep 27, 2017
Member

If this is the case I don't understand the memory increase. Shouldn't out be n_samples x n_classes and the memory increase therefore n_samples x n_classes x n_jobs?

@jnothman
Copy link
Member Author

@jnothman jnothman commented Sep 27, 2017

At master, up to n_jobs prediction matrices are in memory at once. The accumulation is done, albeit badly, in each thread.

At this PR all prediction matrices are returned from parallel, i.e. n_estimators separate matrices. The accumulation is done after.

A mutex lock on the accumulation would be sufficient to solve this, but we've tended to avoid them. Alternatively a map that returns an iterator rather than a list (or a list of promises) would suffice to give roughly current memory consumption with correctness a assured.

@lesteve
Copy link
Member

@lesteve lesteve commented Sep 28, 2017

Before I forgot I am guessing this is partly reverting #8672. We should take a look at this PR to try to understand the motivations behind it.

@jnothman
Copy link
Member Author

@jnothman jnothman commented Sep 28, 2017

@lesteve
Copy link
Member

@lesteve lesteve commented Sep 28, 2017

Chatting with @ogrisel about this one, he thinks adding a threading.Lock is probably the best and the simplest at the same time. The summation of probabilities should not be a bottleneck so that the lock will not impact performance.

Just curious, can we actually reproduce the failure outside the Debian testing framework (mips is the failing architecture I think) ?

@jnothman
Copy link
Member Author

@jnothman jnothman commented Sep 30, 2017

I tried to reproduce it the other day and failed. Our best chance? Lots of random features, max_features=1, very few samples, many estimators, large n_jobs. Let's use threading.Lock then.

@jnothman jnothman force-pushed the jnothman:threadsafe branch from 9e1b199 to f55fd89 Sep 30, 2017
@jnothman
Copy link
Member Author

@jnothman jnothman commented Sep 30, 2017

Done

@lesteve
Copy link
Member

@lesteve lesteve commented Oct 3, 2017

The changes look fine but I am a bit uncomfortable merging this kind of blind, without having to managed to reproduce neither in the scikit-learn tests or in a simpler snippet where you update a single numpy array with parallel summations.

Also maybe it would be a good idea to run a quick and dirty benchmarks to make sure that the lock is not impacting performance?

Alternatively a map that returns an iterator rather than a list (or a list of promises) would suffice to give roughly current memory consumption with correctness a assured.

FYI I tried a little while ago of having Parallel return an generator. It was kind of working except when closing the pool before consuming all the results, in which case it hung and I never figured out why. I may have another go at it at one point. My branch is here if you want to know more details.

@jmschrei
Copy link
Member

@jmschrei jmschrei commented Oct 3, 2017

My understanding of the issue is that the issue arises when multiple trees are trying to add their individual predictions to the single out array at the same time, causing an issue where some updates are overwritten. Is this correct? It seems weird to me that the GIL is not preventing this from happening, do you know why that is the case?

@ogrisel
ogrisel approved these changes Oct 3, 2017
Copy link
Member

@ogrisel ogrisel left a comment

It seems like a reasonable fix. GIL contention should not be too visible as adding arrays it probably orders of magnitude faster than computing the predictions themselves.

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Oct 3, 2017

@jmschrei I don't know how the GIL protects the += operation on numpy arrays.

@jnothman
Copy link
Member Author

@jnothman jnothman commented Oct 3, 2017

@jnothman
Copy link
Member Author

@jnothman jnothman commented Oct 16, 2017

This reproduces the contention, showing that at master, n_jobs=4 is inconsistent and n_jobs=1 is consistent. Unfortunately, it seems to show that the current PR is also inconsistent and I cannot fathom why:

import numpy as np
from sklearn.ensemble import RandomForestRegressor
import sklearn.ensemble
print(sklearn.ensemble.__path__)

X = np.random.rand(10, 100)
y = np.random.rand(10) * 100
rfr = RandomForestRegressor(n_estimators=1000, max_features=1, n_jobs=4).fit(X, y)
ys = []
for i in range(100):
    if i % 10 == 0:
        print(i)
    ys.append(rfr.set_params(n_jobs=4).predict(X))

n_failures = sum(np.any(np.diff(ys, axis=0), axis=1))
if n_failures:
    print('Broke up to %d times!' % n_failures)
else:
    print('Consistent!')
@jnothman
Copy link
Member Author

@jnothman jnothman commented Oct 16, 2017

The answer is that the test is finding instability due to summation order not threading contention. The differences are minuscule, both at master and this PR. So the issue remains without a reliable test.

What I can say is that the effect of locking on performance is negligible.

@lesteve
Copy link
Member

@lesteve lesteve commented Oct 16, 2017

Thanks a lot @jnothman, let's merge this one!

@lesteve lesteve merged commit 4a4b711 into scikit-learn:master Oct 16, 2017
6 checks passed
6 checks passed
ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.17%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +3.82% compared to 4e6b403
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details
jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 17, 2017
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants