-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE][ML] A measure of strength of relationship between RVs plus better seeding of feature sample probabilities for boosted tree #488
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on implementing MICe! 👏 I have a number of comments which aim to improve readability.
…lculation of k in main search when there are duplicates
Thanks for the review @valeriy42! I think I've addressed all your comments. Can you take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good job!
This implements the (refined) "maximal information coefficient" measure of the strength of the relationship between two variables and uses it to initialise feature sample probabilities for the boosted tree. It also puts in place a mechanism to restrict the features used, if there are insufficient training data, to those variables with the strongest relationship with dependent variable.