You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that feature_selection based on f_classif (i.e. F-test) can break if one feature has a constant value (e.g. all zeros).
The best way to test this is to add a column of all zeros to X in the example plot_feature_selection.py, e.g.: see the column dummy below
# Some noisy data not correlated
E = np.random.uniform(0, 0.1, size=(len(iris.data), 20))
# Add the noisy data to the informative features
X = np.hstack((iris.data, E))
dummy = np.zeros((X.shape[0],3))
X = np.hstack((iris.data, E, dummy))
y = iris.target
f_classif already throws a warning whenever multiple columns (multiple features) are duplicates of each other. It may be a good idea to also warn the user when a feature is constant across all instances.
The text was updated successfully, but these errors were encountered:
I found that feature_selection based on f_classif (i.e. F-test) can break
if one feature has a constant value (e.g. all zeros).
The best way to test this is to add a column of all zeros to X in the
example plot_feature_selection.py, e.g.: see the column dummy below
Some noisy data not correlated
E = np.random.uniform(0, 0.1, size=(len(iris.data), 20))
Add the noisy data to the informative features
X = np.hstack((iris.data, E))
dummy = np.zeros((X.shape[0],3))
X = np.hstack((iris.data, E, dummy))
y = iris.target
f_classif already throws a warning whenever multiple columns (multiple
features) are duplicates of each other. It may be a good idea to also warn
the user when a feature is constant across all instances.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2359
.
I found that
feature_selection
based onf_classif
(i.e. F-test) can break if one feature has a constant value (e.g. all zeros).The best way to test this is to add a column of all zeros to
X
in the exampleplot_feature_selection.py
, e.g.: see the columndummy
belowf_classif
already throws a warning whenever multiple columns (multiple features) are duplicates of each other. It may be a good idea to also warn the user when a feature is constant across all instances.The text was updated successfully, but these errors were encountered: