-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Classifier CMI test #85
Conversation
Signed-off-by: Adam Li <adam2392@gmail.com>
Codecov Report
@@ Coverage Diff @@
## main #85 +/- ##
==========================================
+ Coverage 88.24% 88.35% +0.10%
==========================================
Files 26 28 +2
Lines 1650 1760 +110
Branches 267 276 +9
==========================================
+ Hits 1456 1555 +99
- Misses 114 123 +9
- Partials 80 82 +2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary p-value is weird. The reference paper says that the CMI estimate itself is a substitute for the p-value. Perhaps return the CMI estimate? It also says you could do a type of bootstrap estimate of the p-value by applying this method to many permuted datasets. Maybe explain this in the docstrings?
Other that that LGTM.
Signed-off-by: Adam Li <adam2392@gmail.com>
Yeah sorry this is still a draft, so it's a work-in-progress (WIP). I'll ping you when this is ready for a more in-depth review. I need to finish the actual implementation and add some unit tests |
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Now, I just need to add a unit-test |
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
The CMI value isn't between [0,1], so they binarize it using a threshold. Alternatively, similar to the CCIT, we can compute a null distribution. However, I found this isn't very robust, so noting it in the docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One comment. The CMI value that is calculated as follows:
val = hxyz - (hxz + hyz - hz).mean()
Doesn't this have an asymptotic chi-squared distribution under the null hypothesis? If so, should there be an option to calculate the p-value that way?
Signed-off-by: Adam Li adam2392@gmail.com
Addresses the CMI part of #17
Closes: #61
Changes proposed in this pull request:
utils.py
locationblack
Note: in the publication, they also support using a neural network approach, but that is outside the scope of this PR.
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting