Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: KFOCI #10

Open
lang-benjamin opened this issue Dec 20, 2023 · 6 comments
Open

Feature request: KFOCI #10

lang-benjamin opened this issue Dec 20, 2023 · 6 comments

Comments

@lang-benjamin
Copy link

There is a R package KPC that implements a more general and improved version of FOCI, called KFOCI (Kernel FOCI), that was proposed by Huang et al. The improvement over existing methods in certain settings is quite remarkable, thus I believe it would be a great addition to have functions similar to the ones for FOCI. What do you think?
For categorical variables, it may even be possible to partially (i.e. as long as they have an order) refrain from creating dummy variables by using them as integer-based variables.

@matloff
Copy link
Owner

matloff commented Dec 20, 2023 via email

@lang-benjamin
Copy link
Author

Great, happy to hear that! Some more food for thoughts: If binary/categorical variables are included, there will be randomness when calling KFOCI (due to breaking ties in the k-NN graph). So it could make sense to multiply call KFOCI on the same data set and somehow condense or visualize the results. For the former some sort of stability selection could be done, e.g. as proposed in Section 2.3 https://onlinelibrary.wiley.com/doi/10.1002/sim.8955. This proposal is in a slightly different context but sounds generic and could be applicable to KFOCI (and FOCI) as well. Unfortunately, I do not know how this "stable set" would behave, maybe it is not a good idea because it could violate the nice property of Theorem 7 from Huang et al. Any thoughts?

@matloff
Copy link
Owner

matloff commented Dec 22, 2023 via email

@lang-benjamin
Copy link
Author

Thanks, point taken! Appreciate your comment, that exactly goes into the direction I was aiming for.

@matloff
Copy link
Owner

matloff commented Dec 24, 2023 via email

@lang-benjamin
Copy link
Author

I think it is fair to say that KFOCI performs better than FOCI. Still, I found that the performance for linear or monotone relationships lacks 'power' (this is in line with other observations, e.g. in A survey of some recent developments in measures of
association
. Possible mitigation strategies might be to decrease the number K for the KNN-graph (e.g. n/40 instead of n/20) or to combine it with the selected variables from ncvreg::cv.ncvreg (although in this case, the resulting set of variables may be harder to interpret as there will be no clear ordering amongst the combined selected variables anymore).

I also found that the algorithm in Kormaksson et al (with r = 0.5) works quite well when applied to multiple independent runs of KFOCI on the same data set. One disadvantage resulting from that is that the ordering of variables gets lost; this may however be resolved by saving the ranks of each run and then investigate the tuples of ranks via cdparcoord::discparcoord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants