Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of selected features #40

Closed
PelFritz opened this issue Mar 25, 2021 · 1 comment
Closed

Number of selected features #40

PelFritz opened this issue Mar 25, 2021 · 1 comment

Comments

@PelFritz
Copy link

Hello, I just tried this tool on a Metabolomics data I have. Interestingly, HSIC Lasso selects just 76 metabolites out of 2035 available metabolites. And the R-squared score if I use these selected metabolites is just 0.18. In comparison to Lasso on the original 2035 metabolites which obtains an R-squared of about 0.60. My assumption is probably the amount of selected features are too small. I used SVR (kernel='ref') from sklearn after feature selection with HSIC.
Is there a way to increase the number of features HSIC Lasso selects ?

@hclimente
Copy link
Collaborator

Hi,

Something to consider is that R^2 is a linear measure of association. Since Lasso only searches for linear relationships between features and outcome, it's not unsurprising that the features it selects have higher R^2. On the other hand, HSIC Lasso captures both linear and non-linear associations, so another measure might be more informative.

As far as I understand your case, HSIC Lasso is selecting only 76 features despite you requesting a higher number. Regarding that, @myamada0321, do you have ideas about how to force HSIC Lasso to recover more features?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants