Feature selection based on f_score: Dummy columns #2359

amelio-vazquez-reina · 2013-08-13T01:19:48Z

I found that feature_selection based on f_classif (i.e. F-test) can break if one feature has a constant value (e.g. all zeros).

The best way to test this is to add a column of all zeros to X in the example plot_feature_selection.py, e.g.: see the column dummy below

# Some noisy data not correlated
E = np.random.uniform(0, 0.1, size=(len(iris.data), 20))

# Add the noisy data to the informative features
X = np.hstack((iris.data, E))

dummy = np.zeros((X.shape[0],3))
X = np.hstack((iris.data, E, dummy))
y = iris.target

f_classif already throws a warning whenever multiple columns (multiple features) are duplicates of each other. It may be a good idea to also warn the user when a feature is constant across all instances.

The text was updated successfully, but these errors were encountered:

agramfort · 2013-08-15T13:51:20Z

fine with me. Pull request very welcome.

On Tue, Aug 13, 2013 at 3:19 AM, ribonoous notifications@github.com wrote:

I found that feature_selection based on f_classif (i.e. F-test) can break
if one feature has a constant value (e.g. all zeros).

The best way to test this is to add a column of all zeros to X in the
example plot_feature_selection.py, e.g.: see the column dummy below

Some noisy data not correlated

E = np.random.uniform(0, 0.1, size=(len(iris.data), 20))

Add the noisy data to the informative features

X = np.hstack((iris.data, E))

dummy = np.zeros((X.shape[0],3))
X = np.hstack((iris.data, E, dummy))
y = iris.target

f_classif already throws a warning whenever multiple columns (multiple
features) are duplicates of each other. It may be a good idea to also warn
the user when a feature is constant across all instances.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2359
.

arjoly added the Enhancement label Jul 18, 2014

MechCoder mentioned this issue Oct 8, 2014

[MRG] FIX: Raise warnings in f_classif a given feature is constant throughout #3744

Merged

agramfort closed this as completed in #3744 Oct 8, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature selection based on f_score: Dummy columns #2359

Feature selection based on f_score: Dummy columns #2359

amelio-vazquez-reina commented Aug 13, 2013

agramfort commented Aug 15, 2013

Some noisy data not correlated

Add the noisy data to the informative features

Feature selection based on f_score: Dummy columns #2359

Feature selection based on f_score: Dummy columns #2359

Comments

amelio-vazquez-reina commented Aug 13, 2013

agramfort commented Aug 15, 2013

Some noisy data not correlated

Add the noisy data to the informative features