New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The depth formula in iforest is incorrect #8549

Closed
liuxinuestc opened this Issue Mar 7, 2017 · 6 comments

Comments

Projects
None yet
5 participants
@liuxinuestc

liuxinuestc commented Mar 7, 2017

Description

In the paper of iforest the formula of average path length of unsuccessful search in BST as

avgPathLen = 2. * (np.log(n_samples_leaf   - 1.) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

but in code sklearn.ensemble.iforest._average_path_length the formula is

2 * (np.log(n_samples_leaf ) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

Steps/Code to Reproduce

Actual Results

if isinstance(n_samples_leaf, INTEGER_TYPES):
        if n_samples_leaf <= 1:
            return 1.
        else:
            return 2. * (np.log(n_samples_leaf) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

average_path_length[not_mask] = 2. * (
            np.log(n_samples_leaf[not_mask]) + 0.5772156649) - 2. * (
                n_samples_leaf[not_mask] - 1.) / n_samples_leaf[not_mask]

-->

Expected Results

if isinstance(n_samples_leaf, INTEGER_TYPES):
        if n_samples_leaf <= 1:
            return 1.
        else:
            return 2. * (np.log(n_samples_leaf - 1.0) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

average_path_length[not_mask] = 2. * (
            np.log(n_samples_leaf[not_mask] - 1.) + 0.5772156649) - 2. * (
                n_samples_leaf[not_mask] - 1.) / n_samples_leaf[not_mask]

Versions

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Mar 7, 2017

Member

Readability counts, a lot! Please use triple back-quotes aka fenced code blocks to format error messages code snippets. Bonus points if you use syntax highlighting with py for python snippets and pytb for tracebacks.

Member

lesteve commented Mar 7, 2017

Readability counts, a lot! Please use triple back-quotes aka fenced code blocks to format error messages code snippets. Bonus points if you use syntax highlighting with py for python snippets and pytb for tracebacks.

@ngoix

This comment has been minimized.

Show comment
Hide comment
@ngoix

ngoix Mar 8, 2017

Contributor

You're right a -1 is missing inside the log: np.log(n_samples_leaf) has to be replaced by np.log(n_samples_leaf - 1.)
@liuxinuestc do you want to open a PR?

Contributor

ngoix commented Mar 8, 2017

You're right a -1 is missing inside the log: np.log(n_samples_leaf) has to be replaced by np.log(n_samples_leaf - 1.)
@liuxinuestc do you want to open a PR?

@liuxinuestc

This comment has been minimized.

Show comment
Hide comment
@liuxinuestc

liuxinuestc Mar 9, 2017

I'm just a new user in github, and I'm a learner. I don't know much about PR. But recently I have designed an unsupervised DT algorithm, I want to implement this algorithm on GitHub, and for everyone to use. @ngoix

liuxinuestc commented Mar 9, 2017

I'm just a new user in github, and I'm a learner. I don't know much about PR. But recently I have designed an unsupervised DT algorithm, I want to implement this algorithm on GitHub, and for everyone to use. @ngoix

@liuxinuestc

This comment has been minimized.

Show comment
Hide comment
@liuxinuestc

liuxinuestc Mar 9, 2017

thanks for your suggestions~~!! @lesteve

liuxinuestc commented Mar 9, 2017

thanks for your suggestions~~!! @lesteve

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Mar 9, 2017

Member

I don't know much about PR

Read about it then, maybe this is a good start. If you are new to Github I would encourage you to browse https://opensource.guide and https://guides.github.com/. For scikit-learn specific guidance, have a look at http://scikit-learn.org/stable/developers/contributing.html#contributing.

Member

lesteve commented Mar 9, 2017

I don't know much about PR

Read about it then, maybe this is a good start. If you are new to Github I would encourage you to browse https://opensource.guide and https://guides.github.com/. For scikit-learn specific guidance, have a look at http://scikit-learn.org/stable/developers/contributing.html#contributing.

@pzbw

This comment has been minimized.

Show comment
Hide comment
@pzbw

pzbw Mar 12, 2017

Contributor

I'm working on this issue.

Contributor

pzbw commented Mar 12, 2017

I'm working on this issue.

@raghavrv raghavrv added Bug and removed Need Contributor labels Mar 12, 2017

@raghavrv raghavrv changed the title from the depth formula in iforest is incorrect to The depth formula in iforest is incorrect Mar 12, 2017

pzbw pushed a commit to pzbw/scikit-learn that referenced this issue Mar 12, 2017

@raghavrv raghavrv closed this Mar 14, 2017

raghavrv added a commit that referenced this issue Mar 14, 2017

[MRG+1] FIX Correct depth formula in iforest (#8576)
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests

herilalaina added a commit to herilalaina/scikit-learn that referenced this issue Mar 26, 2017

[MRG+1] FIX Correct depth formula in iforest (#8576)
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests

massich added a commit to massich/scikit-learn that referenced this issue Apr 26, 2017

[MRG+1] FIX Correct depth formula in iforest (#8576)
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests

Sundrique added a commit to Sundrique/scikit-learn that referenced this issue Jun 14, 2017

[MRG+1] FIX Correct depth formula in iforest (#8576)
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests

NelleV added a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017

[MRG+1] FIX Correct depth formula in iforest (#8576)
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests

paulha added a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017

[MRG+1] FIX Correct depth formula in iforest (#8576)
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017

[MRG+1] FIX Correct depth formula in iforest (#8576)
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017

[MRG+1] FIX Correct depth formula in iforest (#8576)
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment