-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add the classifier CI test #28
Conversation
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Codecov Report
@@ Coverage Diff @@
## main #28 +/- ##
==========================================
+ Coverage 68.04% 71.47% +3.43%
==========================================
Files 13 16 +3
Lines 679 852 +173
Branches 126 142 +16
==========================================
+ Hits 462 609 +147
- Misses 175 193 +18
- Partials 42 50 +8
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Signed-off-by: Adam Li <adam2392@gmail.com>
There might be a misconception with the idea of the functional form. There is no need to ask for model parameters in the CI function. For instance:
You can also expect a factory instead that generates a user specific ML model instead of expecting the parameter as inputs to the function. Ultimately, you would only gain something from the class-based approach when you plan to reuse the object and want to save some passing of parameters (which is of course a valid argument). However, we should make sure that these tests are stateless then. |
Agree with @bloebp here. Classes make sense when something requires a longer life-span, e.g. because you want to pass it around and it gets invoked at a different place than where it's constructed. This is typically also the case in bigger systems where you want to separate wiring of objects from invoking them. Another case is, when something requires multiple steps to build, such as a graph object. But in scenarios where we instantiate a class and then immediately invoke the object as suggested by the unit test: ci_estimator = ClassifierCITest(clf, random_state=rng)
_, pvalue = ci_estimator.test(df, {"x"}, {"x1"})
assert pvalue > 0.05
_, pvalue = ci_estimator.test(df, {"x"}, {"z"})
assert pvalue < 0.05 the benefit of a class seems really questionable. Seems like this could be written as: _, pvalue = classifier_ci_test(df, {"x"}, {"x1"}, clf)
assert pvalue > 0.05
_, pvalue = classifier_ci_test(df, {"x"}, {"z"}, clf)
assert pvalue < 0.05 And this would save a lot of bookkeeping code in the implementation too. What do you think? |
Okay this makes a lot of sense! Let me start another issue to track the refactoring of the CI tests into functions. Also tagging #26 since we discussed moving these altogether from do discover + dowhy to another repo. Is it okay if we leave as is in this repo, so I can convert them all at once? If so, then I will fix some of the documentation steps that were raised in the review, and then refactor class -> functions in the next PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we expand this method to work with a neural-net based classifier using Keras or Pytorch? It likely feels like overkill but it would be a first step to telling a story to the broader community that this repo is using cutting edge "deep" methods. We could also workshop an example with multidimensional variables later (e.g. pixels) though not in this PR.
Signed-off-by: Adam Li <adam2392@gmail.com>
I've addressed the comments and now just need a PR approval to move forward. Just a note, that the poetry.lock file update took over 500 LOC when adding flaky. The actual LOC diff for the CCIT implementation and unit test is only around 300-400 LOC. |
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is at a good place and we can get it merged. If we need to make changes later we can.
Signed-off-by: Adam Li <adam2392@gmail.com>
@robertness can you re-approve this PR? I had to merge in changes from main and resolve conflicts, which dismissed your approval. Not sure why tho... it seems that's slightly redundant, since it's fairly often that one might need to rebase/merge in changes from main. I guess it's to protect changes from conflicting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Adding classifier CI test (ccit) and unit test * Adds a simulation module for simulating non-linear additive noise models * Add flaky to the unit test suite CCIT * Address typing issues with regards to adding sklearn and pytorch NN modules as the "classifier" Signed-off-by: Adam Li <adam2392@gmail.com> Signed-off-by: Chris Trevino <darthtrevino@gmail.com>
Addresses one of: #17
Changes proposed in this pull request:
A note on class/function design for CI tests:
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting