potentially running too many iterations in Ransac #8251

Closed
aivision2020 opened this Issue Jan 31, 2017 · 11 comments

Comments

Projects
None yet
3 participants
@aivision2020
Contributor

aivision2020 commented Jan 31, 2017

in skimage/measure/fit.py
managing the number of iterations is done with max_trials. This is compared to _dynamic_max_trials which is computed using the largest inlier group found so far. However, this comparison is done only in cases when sample_inlier_num is larger than the biggest group. In cases when a large group is found early (but not large enough to end) the best_inlier_num is updated but the max_trials is not updated and will only be checked next time a larger group is found.

the lines (926) :

            if (
                best_inlier_num >= stop_sample_num
                or best_inlier_residuals_sum <= stop_residuals_sum
                or num_trials
                    >= _dynamic_max_trials(best_inlier_num, num_samples,
                                           min_samples, stop_probability)
            ):
                break

should be

             max_trials = min(max_trials, _dynamic_max_trials(best_inlier_num, num_samples,
                                           min_samples, stop_probability))
            if (
                best_inlier_num >= stop_sample_num
                or best_inlier_residuals_sum <= stop_residuals_sum
                or num_trials > max_trials
            ):
                break 
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Feb 1, 2017

Member

in skimage/measure/fit.py

I am confused, why are you mentioning skimage?

Member

lesteve commented Feb 1, 2017

in skimage/measure/fit.py

I am confused, why are you mentioning skimage?

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Feb 1, 2017

Member

It seems like the code you mention has some similarity with the scikit-learn code though, so maybe your issue is still relevant.

Member

lesteve commented Feb 1, 2017

It seems like the code you mention has some similarity with the scikit-learn code though, so maybe your issue is still relevant.

@aivision2020

This comment has been minimized.

Show comment
Hide comment
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Feb 1, 2017

Member

I'm looking at the file:
https://github.com/scikit-image/scikit-image/blob/master/skimage/measure/fit.py

If you are using the scikit-image RANSAC code, you should open an issue in scikit-image. This is the scikit-learn tracker. They are two separate projects.

Member

lesteve commented Feb 1, 2017

I'm looking at the file:
https://github.com/scikit-image/scikit-image/blob/master/skimage/measure/fit.py

If you are using the scikit-image RANSAC code, you should open an issue in scikit-image. This is the scikit-learn tracker. They are two separate projects.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Feb 1, 2017

Member

As I tried to say though, the RANSAC code in scikit-learn and scikit-image seems to have some similarity so it could well be that some of your comment applies to the scikit-learn code as well.

Member

lesteve commented Feb 1, 2017

As I tried to say though, the RANSAC code in scikit-learn and scikit-image seems to have some similarity so it could well be that some of your comment applies to the scikit-learn code as well.

@aivision2020

This comment has been minimized.

Show comment
Hide comment
@aivision2020

aivision2020 Feb 1, 2017

Contributor

Ah I see. Sorry.
So looking at
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ransac.py
line 418

        # break if sufficient number of inliers or score is reached
        if (n_inliers_best >= self.stop_n_inliers
                or score_best >= self.stop_score
                or self.n_trials_
                   >= _dynamic_max_trials(n_inliers_best, n_samples,
                                          min_samples,
                                          self.stop_probability)):
            break

should be something like:

        # break if sufficient number of inliers or score is reached
        self.max_trials = min(self.max_trials, _dynamic_max_trials(n_inliers_best, n_samples,
                                          min_samples, self.stop_probability))
        if (n_inliers_best >= self.stop_n_inliers
                or score_best >= self.stop_score
                or self.n_trials_>=self.max_trials):
            break
Contributor

aivision2020 commented Feb 1, 2017

Ah I see. Sorry.
So looking at
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ransac.py
line 418

        # break if sufficient number of inliers or score is reached
        if (n_inliers_best >= self.stop_n_inliers
                or score_best >= self.stop_score
                or self.n_trials_
                   >= _dynamic_max_trials(n_inliers_best, n_samples,
                                          min_samples,
                                          self.stop_probability)):
            break

should be something like:

        # break if sufficient number of inliers or score is reached
        self.max_trials = min(self.max_trials, _dynamic_max_trials(n_inliers_best, n_samples,
                                          min_samples, self.stop_probability))
        if (n_inliers_best >= self.stop_n_inliers
                or score_best >= self.stop_score
                or self.n_trials_>=self.max_trials):
            break
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Feb 1, 2017

Member

I am sorry but I have to ask, are you using the scikit-learn RANSAC code or the scikit-image RANSAC code?

Member

lesteve commented Feb 1, 2017

I am sorry but I have to ask, are you using the scikit-learn RANSAC code or the scikit-image RANSAC code?

@aivision2020

This comment has been minimized.

Show comment
Hide comment
@aivision2020

aivision2020 Feb 1, 2017

Contributor

I was using scikit-image. But I looked at scikit-learn code and it has the same issue (I didn't make a unit test for it though)

Contributor

aivision2020 commented Feb 1, 2017

I was using scikit-image. But I looked at scikit-learn code and it has the same issue (I didn't make a unit test for it though)

@aivision2020

This comment has been minimized.

Show comment
Hide comment
@aivision2020

aivision2020 Feb 1, 2017

Contributor

a unit test:

from sklearn import linear_model
from sklearn.linear_model.ransac import _dynamic_max_trials
import numpy as np

X =np.linspace(0,100,100)[:,None]
print X.shape
Y = 2*X
Y[80:] = np.random.rand(20,1)
Y[:80] += np.random.rand(80,1)

for i in range(10):
    clt_ransac = linear_model.RANSACRegressor(linear_model.LinearRegression(),max_trials=100)
    clt_ransac.fit(X, Y)
    yhat_ransac = clt_ransac.predict(X)
    inlier_mask = clt_ransac.inlier_mask_
    print '*************************'
    print 'actual     trials:' , clt_ransac.n_trials_
    print 'should be dynamic:', _dynamic_max_trials(np.sum(inlier_mask), 100,
                        2, clt_ransac.stop_probability)
output:
*************************
actual     trials: 5
should be dynamic: 5.0
*************************
actual     trials: 11
should be dynamic: 5.0
*************************
actual     trials: 11
should be dynamic: 5.0
*************************
actual     trials: 10
should be dynamic: 5.0
*************************
actual     trials: 16
should be dynamic: 5.0
*************************
actual     trials: 7
should be dynamic: 5.0
*************************
actual     trials: 100
should be dynamic: 5.0
*************************
actual     trials: 8
should be dynamic: 5.0
*************************
actual     trials: 9
should be dynamic: 5.0
*************************
actual     trials: 27
should be dynamic: 5.0

Notice there are more trials then needed. Once he was so unlucky he did the max number of iters

After fix 
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80

The changes
changed the loop type to while

           self.n_trials_ = 0
           while (self.n_trials_ <= self.max_trials):
                      self.n_trials_ += 1

and the break criteria

             self.max_trials = min(self.max_trials,_dynamic_max_trials(n_inliers_best, n_samples,
                                min_samples,self.stop_probability))

            # break if sufficient number of inliers or score is reached
            if (n_inliers_best >= self.stop_n_inliers
                    or score_best >= self.stop_score):
                break
Contributor

aivision2020 commented Feb 1, 2017

a unit test:

from sklearn import linear_model
from sklearn.linear_model.ransac import _dynamic_max_trials
import numpy as np

X =np.linspace(0,100,100)[:,None]
print X.shape
Y = 2*X
Y[80:] = np.random.rand(20,1)
Y[:80] += np.random.rand(80,1)

for i in range(10):
    clt_ransac = linear_model.RANSACRegressor(linear_model.LinearRegression(),max_trials=100)
    clt_ransac.fit(X, Y)
    yhat_ransac = clt_ransac.predict(X)
    inlier_mask = clt_ransac.inlier_mask_
    print '*************************'
    print 'actual     trials:' , clt_ransac.n_trials_
    print 'should be dynamic:', _dynamic_max_trials(np.sum(inlier_mask), 100,
                        2, clt_ransac.stop_probability)
output:
*************************
actual     trials: 5
should be dynamic: 5.0
*************************
actual     trials: 11
should be dynamic: 5.0
*************************
actual     trials: 11
should be dynamic: 5.0
*************************
actual     trials: 10
should be dynamic: 5.0
*************************
actual     trials: 16
should be dynamic: 5.0
*************************
actual     trials: 7
should be dynamic: 5.0
*************************
actual     trials: 100
should be dynamic: 5.0
*************************
actual     trials: 8
should be dynamic: 5.0
*************************
actual     trials: 9
should be dynamic: 5.0
*************************
actual     trials: 27
should be dynamic: 5.0

Notice there are more trials then needed. Once he was so unlucky he did the max number of iters

After fix 
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80
*************************
actual     trials: 6
should be dynamic: 5.0
80

The changes
changed the loop type to while

           self.n_trials_ = 0
           while (self.n_trials_ <= self.max_trials):
                      self.n_trials_ += 1

and the break criteria

             self.max_trials = min(self.max_trials,_dynamic_max_trials(n_inliers_best, n_samples,
                                min_samples,self.stop_probability))

            # break if sufficient number of inliers or score is reached
            if (n_inliers_best >= self.stop_n_inliers
                    or score_best >= self.stop_score):
                break
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Feb 1, 2017

Member

I'm sorry that I do not know this algorithm well from a theoretical standpoint.

The description of this feature in the stop_probability docstring says "requires to generate at least N samples". Your unit test would have us generate exactly N samples, would you not? Your _dynamic_max_trials calculations are made after the best inliers are found. The previous iterations might have yielded fewer inliers and thus a higher _dynamic_max_trials hence requiring further iteration.

However, I think you may be right that because of the various ways a trial can fail, we are surpassing the required number of trials calculated on the basis of the previous best inlier set. Could you please submit a pull request with your changes?

Also, I have added some formatting to your last comment. Please do this yourself in the future. (Click edit to see what I did.)

Member

jnothman commented Feb 1, 2017

I'm sorry that I do not know this algorithm well from a theoretical standpoint.

The description of this feature in the stop_probability docstring says "requires to generate at least N samples". Your unit test would have us generate exactly N samples, would you not? Your _dynamic_max_trials calculations are made after the best inliers are found. The previous iterations might have yielded fewer inliers and thus a higher _dynamic_max_trials hence requiring further iteration.

However, I think you may be right that because of the various ways a trial can fail, we are surpassing the required number of trials calculated on the basis of the previous best inlier set. Could you please submit a pull request with your changes?

Also, I have added some formatting to your last comment. Please do this yourself in the future. (Click edit to see what I did.)

@aivision2020

This comment has been minimized.

Show comment
Hide comment
@aivision2020

aivision2020 Feb 2, 2017

Contributor
Contributor

aivision2020 commented Feb 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment