Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The depth formula in iforest is incorrect #8549

Closed
liuxinuestc opened this issue Mar 7, 2017 · 6 comments
Closed

The depth formula in iforest is incorrect #8549

liuxinuestc opened this issue Mar 7, 2017 · 6 comments
Labels

Comments

@liuxinuestc
Copy link

@liuxinuestc liuxinuestc commented Mar 7, 2017

Description

In the paper of iforest the formula of average path length of unsuccessful search in BST as

avgPathLen = 2. * (np.log(n_samples_leaf   - 1.) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

but in code sklearn.ensemble.iforest._average_path_length the formula is

2 * (np.log(n_samples_leaf ) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

Steps/Code to Reproduce

Actual Results

if isinstance(n_samples_leaf, INTEGER_TYPES):
        if n_samples_leaf <= 1:
            return 1.
        else:
            return 2. * (np.log(n_samples_leaf) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

average_path_length[not_mask] = 2. * (
            np.log(n_samples_leaf[not_mask]) + 0.5772156649) - 2. * (
                n_samples_leaf[not_mask] - 1.) / n_samples_leaf[not_mask]

-->

Expected Results

if isinstance(n_samples_leaf, INTEGER_TYPES):
        if n_samples_leaf <= 1:
            return 1.
        else:
            return 2. * (np.log(n_samples_leaf - 1.0) + 0.5772156649) - 2. * (
                n_samples_leaf - 1.) / n_samples_leaf

average_path_length[not_mask] = 2. * (
            np.log(n_samples_leaf[not_mask] - 1.) + 0.5772156649) - 2. * (
                n_samples_leaf[not_mask] - 1.) / n_samples_leaf[not_mask]

Versions

@lesteve
Copy link
Member

@lesteve lesteve commented Mar 7, 2017

Readability counts, a lot! Please use triple back-quotes aka fenced code blocks to format error messages code snippets. Bonus points if you use syntax highlighting with py for python snippets and pytb for tracebacks.

@ngoix
Copy link
Contributor

@ngoix ngoix commented Mar 8, 2017

You're right a -1 is missing inside the log: np.log(n_samples_leaf) has to be replaced by np.log(n_samples_leaf - 1.)
@liuxinuestc do you want to open a PR?

@liuxinuestc
Copy link
Author

@liuxinuestc liuxinuestc commented Mar 9, 2017

I'm just a new user in github, and I'm a learner. I don't know much about PR. But recently I have designed an unsupervised DT algorithm, I want to implement this algorithm on GitHub, and for everyone to use. @ngoix

@liuxinuestc
Copy link
Author

@liuxinuestc liuxinuestc commented Mar 9, 2017

thanks for your suggestions~~!! @lesteve

@lesteve
Copy link
Member

@lesteve lesteve commented Mar 9, 2017

I don't know much about PR

Read about it then, maybe this is a good start. If you are new to Github I would encourage you to browse https://opensource.guide and https://guides.github.com/. For scikit-learn specific guidance, have a look at http://scikit-learn.org/stable/developers/contributing.html#contributing.

@pzbw
Copy link
Contributor

@pzbw pzbw commented Mar 12, 2017

I'm working on this issue.

@raghavrv raghavrv added Bug and removed Need Contributor labels Mar 12, 2017
@raghavrv raghavrv changed the title the depth formula in iforest is incorrect The depth formula in iforest is incorrect Mar 12, 2017
pzbw pushed a commit to pzbw/scikit-learn that referenced this issue Mar 12, 2017
@raghavrv raghavrv closed this Mar 14, 2017
raghavrv added a commit that referenced this issue Mar 14, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue #8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
herilalaina added a commit to herilalaina/scikit-learn that referenced this issue Mar 26, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
massich added a commit to massich/scikit-learn that referenced this issue Apr 26, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
Sundrique added a commit to Sundrique/scikit-learn that referenced this issue Jun 14, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
NelleV added a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
paulha added a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017
* Fixed depth formula in iforest

* Added non-regression test for issue scikit-learn#8549

* reverted some whitespace changes

* Made changes to what's new and whitespace changes

* Update whats_new.rst

* Update whats_new.rst

* fixed faulty whitespace

* faulty whitespace fix and change to whats new

* added constants to iforest average_path_length and the according non regression test

* COSMIT

* Update whats_new.rst

* Corrected IsolationForest average path formula and added integer array equiv test

* changed line to under 80 char

* Update whats_new.rst

* Update whats_new.rst

* reran tests

* redefine np.euler_gamma

* added import statement for euler_gammma in iforest and test_iforest

* changed np.euler_gamma to euler_gamma

* fix small formatting issue

* fix small formatting issue

* modified average_path_length tests

* formatting fix + removed redundant tests

* fix import error

* retry remote server error

* retry remote server error

* retry remote server error

* re-added some iforest tests

* re-added some iforest tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.