[ENH] Add the classifier CI test #28

adam2392 · 2022-08-24T18:38:02Z

Addresses one of: #17

Changes proposed in this pull request:

Implements the CCIT proposed test which is a nonparametric conditional independence test that allows multivariate X, Y and Z
Adds a simulation function from the CCIT paper
The nearest-neighbor sampling procedure proposed to generate samples from the null hypothesis: $X \perp Y | Z$ is possibly of independent interest beyond just the CCIt function. I remember seeing the same algorithm in the CCMI paper too, which I am interested in for my projects. I will prolly PR that sometime soon too.

A note on class/function design for CI tests:

I'm more of a fan of the API for class-based CI test vs function-based. This helps demonstrate that. There are so many parameters for a sklearn classifier, and ideally a user can instantiate that separately from the CI estimator parameters.

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>

codecov-commenter · 2022-08-24T19:04:59Z

Codecov Report

Merging #28 (7a56182) into main (5b7465b) will increase coverage by 3.43%.
The diff coverage is 85.14%.

@@            Coverage Diff             @@
##             main      #28      +/-   ##
==========================================
+ Coverage   68.04%   71.47%   +3.43%     
==========================================
  Files          13       16       +3     
  Lines         679      852     +173     
  Branches      126      142      +16     
==========================================
+ Hits          462      609     +147     
- Misses        175      193      +18     
- Partials       42       50       +8

Impacted Files	Coverage Δ
dodiscover/ci/typing.py	`75.00% <75.00%> (ø)`
dodiscover/ci/clf_test.py	`83.20% <83.20%> (ø)`
dodiscover/ci/simulate.py	`93.75% <93.75%> (ø)`
dodiscover/ci/kernel_test.py	`90.34% <100.00%> (+0.13%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Adam Li <adam2392@gmail.com>

bloebp · 2022-08-31T19:04:08Z

I'm more of a fan of the API for class-based CI test vs function-based. This helps demonstrate that. There are so many parameters for a sklearn classifier, and ideally a user can instantiate that separately from the CI estimator parameters.

There might be a misconception with the idea of the functional form. There is no need to ask for model parameters in the CI function. For instance:

def my_ci_test(X, Y, Z, ml_model):
   ...
   ml_model.fit(...)

mdl = MyComplexMLModel(**millions_of_parametrs)

p_value = my_ci_test(X, Y, Z, mdl)

You can also expect a factory instead that generates a user specific ML model instead of expecting the parameter as inputs to the function.

Ultimately, you would only gain something from the class-based approach when you plan to reuse the object and want to save some passing of parameters (which is of course a valid argument). However, we should make sure that these tests are stateless then.

dodiscover/ci/clf_test.py

petergtz · 2022-08-31T20:42:23Z

Ultimately, you would only gain something from the class-based approach when you plan to reuse the object and want to save some passing of parameters (which is of course a valid argument). However, we should make sure that these tests are stateless then.

Agree with @bloebp here. Classes make sense when something requires a longer life-span, e.g. because you want to pass it around and it gets invoked at a different place than where it's constructed. This is typically also the case in bigger systems where you want to separate wiring of objects from invoking them. Another case is, when something requires multiple steps to build, such as a graph object.

But in scenarios where we instantiate a class and then immediately invoke the object as suggested by the unit test:

    ci_estimator = ClassifierCITest(clf, random_state=rng)

    _, pvalue = ci_estimator.test(df, {"x"}, {"x1"})
    assert pvalue > 0.05
    _, pvalue = ci_estimator.test(df, {"x"}, {"z"})
    assert pvalue < 0.05

the benefit of a class seems really questionable. Seems like this could be written as:

    _, pvalue = classifier_ci_test(df, {"x"}, {"x1"}, clf)
    assert pvalue > 0.05
    _, pvalue = classifier_ci_test(df, {"x"}, {"z"}, clf)
    assert pvalue < 0.05

And this would save a lot of bookkeeping code in the implementation too.

What do you think?

adam2392 · 2022-08-31T20:48:53Z

And this would save a lot of bookkeeping code in the implementation too.

What do you think?

Okay this makes a lot of sense! Let me start another issue to track the refactoring of the CI tests into functions. Also tagging #26 since we discussed moving these altogether from do discover + dowhy to another repo.

Is it okay if we leave as is in this repo, so I can convert them all at once?

If so, then I will fix some of the documentation steps that were raised in the review, and then refactor class -> functions in the next PR.

robertness

Could we expand this method to work with a neural-net based classifier using Keras or Pytorch? It likely feels like overkill but it would be a first step to telling a story to the broader community that this repo is using cutting edge "deep" methods. We could also workshop an example with multidimensional variables later (e.g. pixels) though not in this PR.

dodiscover/ci/clf_test.py

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2022-09-20T14:34:52Z

I've addressed the comments and now just need a PR approval to move forward. Just a note, that the poetry.lock file update took over 500 LOC when adding flaky. The actual LOC diff for the CCIT implementation and unit test is only around 300-400 LOC.

Signed-off-by: Adam Li <adam2392@gmail.com>

robertness

I think this is at a good place and we can get it merged. If we need to make changes later we can.

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2022-09-21T22:12:34Z

@robertness can you re-approve this PR? I had to merge in changes from main and resolve conflicts, which dismissed your approval.

Not sure why tho... it seems that's slightly redundant, since it's fairly often that one might need to rebase/merge in changes from main. I guess it's to protect changes from conflicting.

robertness

LGTM

* Adding classifier CI test (ccit) and unit test * Adds a simulation module for simulating non-linear additive noise models * Add flaky to the unit test suite CCIT * Address typing issues with regards to adding sklearn and pytorch NN modules as the "classifier" Signed-off-by: Adam Li <adam2392@gmail.com> Signed-off-by: Chris Trevino <darthtrevino@gmail.com>

adam2392 added 9 commits August 23, 2022 12:20

Typing changes

7a4957f

Signed-off-by: Adam Li <adam2392@gmail.com>

Adding ccit

e597074

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge

c4271b2

Signed-off-by: Adam Li <adam2392@gmail.com>

Update doc string

e7fa49a

Signed-off-by: Adam Li <adam2392@gmail.com>

Merge branch 'main' into ccit

8de4930

Adding fully working ccit

233ac68

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix unit tests

92079e4

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix unit tests

5190fbd

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix unit tests

88d4440

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 requested a review from robertness August 24, 2022 18:38

adam2392 added 4 commits August 24, 2022 14:38

Fix unit tests

48414cc

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix unit tests

17117d5

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix unit tests

88a0d43

Signed-off-by: Adam Li <adam2392@gmail.com>

Fixed docs

a979986

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 mentioned this pull request Aug 24, 2022

Add additional CI tests: Mutual Information, Chi square, Monte Carlo #17

Open

Generalizing simulator

5d9a070

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 requested review from darthtrevino and bloebp August 30, 2022 22:23

bloebp reviewed Aug 31, 2022

View reviewed changes

dodiscover/ci/clf_test.py Outdated Show resolved Hide resolved

dodiscover/ci/clf_test.py Show resolved Hide resolved

dodiscover/ci/clf_test.py Show resolved Hide resolved

robertness requested changes Sep 4, 2022

View reviewed changes

dodiscover/ci/clf_test.py Outdated Show resolved Hide resolved

Address comments

fd10959

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 mentioned this pull request Sep 7, 2022

[DOC] Implement an example using a neural-net based classifier to do Classifier CI Testing #43

Open

Fix merge

0c3493a

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 requested a review from robertness September 20, 2022 14:34

adam2392 requested a review from bloebp September 20, 2022 14:35

adam2392 added 4 commits September 20, 2022 10:44

Add flaky

e789ab7

Signed-off-by: Adam Li <adam2392@gmail.com>

Add flaky

c73281d

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix all the warnings

6608c2d

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix style and unit tests

0b410fc

Signed-off-by: Adam Li <adam2392@gmail.com>

robertness previously approved these changes Sep 21, 2022

View reviewed changes

Merging in main

7a56182

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 dismissed robertness’s stale review via 7a56182 September 21, 2022 22:08

adam2392 requested a review from robertness September 21, 2022 22:15

robertness approved these changes Sep 22, 2022

View reviewed changes

adam2392 merged commit 0771188 into py-why:main Sep 22, 2022

adam2392 deleted the ccit branch September 22, 2022 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add the classifier CI test #28

[ENH] Add the classifier CI test #28

adam2392 commented Aug 24, 2022 •

edited

Loading

codecov-commenter commented Aug 24, 2022 •

edited

Loading

bloebp commented Aug 31, 2022 •

edited

Loading

petergtz commented Aug 31, 2022

adam2392 commented Aug 31, 2022 •

edited

Loading

robertness left a comment

adam2392 commented Sep 20, 2022 •

edited

Loading

robertness left a comment

adam2392 commented Sep 21, 2022

robertness left a comment

[ENH] Add the classifier CI test #28

[ENH] Add the classifier CI test #28

Conversation

adam2392 commented Aug 24, 2022 • edited Loading

Before submitting

After submitting

codecov-commenter commented Aug 24, 2022 • edited Loading

Codecov Report

bloebp commented Aug 31, 2022 • edited Loading

petergtz commented Aug 31, 2022

adam2392 commented Aug 31, 2022 • edited Loading

robertness left a comment

Choose a reason for hiding this comment

adam2392 commented Sep 20, 2022 • edited Loading

robertness left a comment

Choose a reason for hiding this comment

adam2392 commented Sep 21, 2022

robertness left a comment

Choose a reason for hiding this comment

adam2392 commented Aug 24, 2022 •

edited

Loading

codecov-commenter commented Aug 24, 2022 •

edited

Loading

bloebp commented Aug 31, 2022 •

edited

Loading

adam2392 commented Aug 31, 2022 •

edited

Loading

adam2392 commented Sep 20, 2022 •

edited

Loading