Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSelector filters: Chi Squared was calculated instead of Information Gain #2816

Merged
merged 1 commit into from Jul 28, 2021

Conversation

jokokojote
Copy link
Contributor

Fixed FSelector filter bug: Chi Squared was calculated instead of Information Gain due to wrong filter name.

Code to reproduce / investigate the problem the long way:

Gain ratio is different for FSelector and FSelectorRcpp:

generateFilterValuesData(
  makeClassifTask(data = iris, target = "Species"),
  method = c(
    "FSelector_gain.ratio",
    "FSelectorRcpp_gain.ratio"
    )
)
FilterValues:
Task: iris
           name    type                   filter     value
1:  Petal.Width numeric     FSelector_gain.ratio 0.9432359
2: Petal.Length numeric     FSelector_gain.ratio 0.9346311
3: Sepal.Length numeric     FSelector_gain.ratio 0.6288067
4:  Sepal.Width numeric     FSelector_gain.ratio 0.4922162
5:  Petal.Width numeric FSelectorRcpp_gain.ratio 0.8713692
6: Petal.Length numeric FSelectorRcpp_gain.ratio 0.8584937
7: Sepal.Length numeric FSelectorRcpp_gain.ratio 0.4196464
8:  Sepal.Width numeric FSelectorRcpp_gain.ratio 0.2472972

Docs look like it should be the same:

Indeed Gain Ratio is the same using base packages:

FSelector::gain.ratio(formula = Species~., data = iris)
FSelectorRcpp::information_gain(formula = Species~., data = iris, type = "gainratio")
            attr_importance
Sepal.Length       0.4196464
Sepal.Width        0.2472972
Petal.Length       0.8584937
Petal.Width        0.8713692

    attributes importance
1 Sepal.Length  0.4196464
2  Sepal.Width  0.2472972
3 Petal.Length  0.8584937
4  Petal.Width  0.8713692

mlr Docs hint that there is maybe a problem with the FSelector Chi Squared call, as it has the same description as the Info Gain:
listFilterMethods() (or see filters docs)

Indeed the results are the same:

generateFilterValuesData(
  makeClassifTask(data = iris, target = "Species"),
  method = c(
    "FSelector_gain.ratio",
    "FSelector_chi.squared"
  )
)
FilterValues:
Task: iris
           name    type                filter     value
1:  Petal.Width numeric  FSelector_gain.ratio 0.9432359
2: Petal.Length numeric  FSelector_gain.ratio 0.9346311
3: Sepal.Length numeric  FSelector_gain.ratio 0.6288067
4:  Sepal.Width numeric  FSelector_gain.ratio 0.4922162
5:  Petal.Width numeric FSelector_chi.squared 0.9432359
6: Petal.Length numeric FSelector_chi.squared 0.9346311
7: Sepal.Length numeric FSelector_chi.squared 0.6288067
8:  Sepal.Width numeric FSelector_chi.squared 0.4922162

After the fix it seems to work correctly now (Info Gain is the same for FSelector and FSelectorRcpp):

FilterValues:
Task: iris
            name    type                   filter     value
 1:  Petal.Width numeric FSelectorRcpp_gain.ratio 0.8713692
 2: Petal.Length numeric FSelectorRcpp_gain.ratio 0.8584937
 3: Sepal.Length numeric FSelectorRcpp_gain.ratio 0.4196464
 4:  Sepal.Width numeric FSelectorRcpp_gain.ratio 0.2472972
 5:  Petal.Width numeric     FSelector_gain.ratio 0.8713692
 6: Petal.Length numeric     FSelector_gain.ratio 0.8584937
 7: Sepal.Length numeric     FSelector_gain.ratio 0.4196464
 8:  Sepal.Width numeric     FSelector_gain.ratio 0.2472972
 9:  Petal.Width numeric    FSelector_chi.squared 0.9432359
10: Petal.Length numeric    FSelector_chi.squared 0.9346311
11: Sepal.Length numeric    FSelector_chi.squared 0.6288067
12:  Sepal.Width numeric    FSelector_chi.squared 0.4922162

@jokokojote
Copy link
Contributor Author

As requested I would like to request a review by @pat-s @larskotthoff @mllg @berndbischl (not sure if I should link only one or all). Thank you.

@larskotthoff
Copy link
Sponsor Member

Thanks for this -- yes, this is simply a typo.

Copy link
Member

@pat-s pat-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Ugly bug - might be worth getting out a patch release soon.

@pat-s pat-s merged commit 4b3430d into mlr-org:main Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants