[REFACTOR] Fix types for CI tests and refactor types of inputs for CI tests #25

adam2392 · 2022-08-23T16:26:16Z

Signed-off-by: Adam Li adam2392@gmail.com

Just adds the necessary types for each CI test.

Changes proposed in this pull request:

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2022-08-23T16:27:28Z

This should address some issues raised in the OG PR #18. Lmk wdyt.

codecov-commenter · 2022-08-23T16:31:27Z

Codecov Report

Merging #25 (a29a4a7) into main (dec2cbf) will increase coverage by 16.83%.
The diff coverage is 77.07%.

@@             Coverage Diff             @@
##             main      #25       +/-   ##
===========================================
+ Coverage   60.00%   76.83%   +16.83%     
===========================================
  Files           1        8        +7     
  Lines           5      367      +362     
  Branches        0       59       +59     
===========================================
+ Hits            3      282      +279     
- Misses          2       66       +64     
- Partials        0       19       +19

Impacted Files	Coverage Δ
dodiscover/ci/oracle.py	`56.52% <56.52%> (ø)`
dodiscover/_protocol.py	`60.86% <60.86%> (ø)`
dodiscover/ci/g_test.py	`61.98% <61.98%> (ø)`
dodiscover/ci/base.py	`83.33% <83.33%> (ø)`
dodiscover/ci/kernel_test.py	`90.20% <90.20%> (ø)`
dodiscover/ci/fisher_z_test.py	`96.87% <96.87%> (ø)`
dodiscover/typing.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Adam Li <adam2392@gmail.com>

bloebp · 2022-08-23T17:59:01Z

dodiscover/ci/base.py

-        self, df: pd.DataFrame, x_var: Any, y_var: Any, z_covariates: Any = None
+        self,
+        df: pd.DataFrame,
+        x_var: Column,


Oh sorry, I just realized you expect the column name of the df here and not the data itself (was thinking of the API as here https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py#L25). But then you would need to allow multiple names, similar to z_covariates: Set[Column], since your test can involve multivariate inputs (the independence test itself then would need to throw an error if it doesn't support it. But most of them do, like HSIC or KCI).

For now and simplicity sake, I left the X/Y variables as a univariate variable column in the dataframe. We can definitely extend later and involve multivariate inputs as you said (and error otw).

Why would already limit it for now? For instance, this would not allow to verify a local Markov condition where you would need to check independence between a target and multiple non-descendant nodes given its parents.

Wouldn't that be possible by just running each test multiple times?

But yeah I guess I'm just not 100% sure how to extend these tests for handling multivariate input slash which ones are capable. That was my main bottleneck tbh. Was hoping someone with more expertise in the community there can help.

E.g. Does FisherZ, G test handle multivariate, or should we just raise an error?

Wouldn't that be possible by just running each test multiple times?

No, testing variables pairwise vs jointly is not the same. For instance, in case of an XOR Z := X ⊕ Y, the independence test would indicate that Z and X or Z and Y are independent, but it would correctly indicate that Z and (X, Y) are dependent. See for instance this code snippet:

import numpy as np from dowhy.gcm import independence_test X = np.random.choice(2, 500) Y = np.random.choice(2, 500) Z = np.bitwise_xor(X, Y) print(independence_test(X, Z)) >>> 0.65 print(independence_test(np.column_stack([X, Y]), Z)) >>> 0.0

Therefore, you (unfortunately) need to add all variables.

I am more familiar with the kernel tests (where multivariate inputs are not a problem), see for instance:

HSIC: https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py#L338

KCI: https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py#L232

RCIT: https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py#L493

RIT: https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py#L431

Regression based: https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/regression.py#L14

Oh yeah that's true. Good example! Hmm okay how about I enforce an error currently for multivariate input for these Fisher/G^2, but allow it in general because I think my implementation of KCI is essentially the same. I'll modify the typing now. In that case, x_vars and y_vars will have type Set[Column].

^ Now that you bring up those tests, I think it might also make sense to refactor a copy into dodiscover (for now). With the goal of eventually offloading them in general. Just so dodiscover can run structure learning w/o needing to import everything in dowhy. WDYT?

Maybe instead of a creating a copy, we can already discuss the creation of a separate package. I would expect that we get more tests over time and independence testing is a crucial feature for various functionalities (not only causal discovery). Here, I would also prefer to utilize the HSIC and KCI implementations from the causal-learn package, since they are well maintained and optimized. I already prepared a wrapper for that previously, but couldn't use it due to the license. Now, this is possible.

Lets for now keep it as it is and pick this up in our next meeting. If we all agree, I think we can create the new package rather quickly and move the parts over. This also gives a chance to revisit the API (somehow, I missed the previous discussion on it.

Okay sounds good. For the purposes of moving the #20 and #21 along, do you mind if I merge this in? When we pull out the CI tests, it'll be easy to just move the internal folder to another repo. But for now, I can then resolve some of the typing issues.

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 · 2022-08-24T00:11:00Z

I will merge this for the sake of consolidating the skeleton PR. Since we might move the CI tests anyways, these can easily be changed when we move them to a new PR.

Moreover, if there's any high level design issues, then I can refactor them then.

… tests (py-why#25) - Improve type checking for CI tests - add fingerprint for circleCI Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 added 3 commits August 23, 2022 12:20

Fix types for cit

439c772

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix type check

a55aeb1

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix type check

4dadb6b

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 requested review from darthtrevino, bloebp and robertness August 23, 2022 16:27

adam2392 added 2 commits August 23, 2022 13:12

Merge branch 'main' into types

861cf4c

Merge

a58411e

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 mentioned this pull request Aug 23, 2022

[ENH] Skeleton learning method #20

Merged

5 tasks

adam2392 added 2 commits August 23, 2022 13:17

Fix deploy key

e3314dd

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix CI

aca682b

Signed-off-by: Adam Li <adam2392@gmail.com>

bloebp reviewed Aug 23, 2022

View reviewed changes

adam2392 changed the title ~~[ENH] Fix types for CI tests~~ [REFACTOR] Fix types for CI tests and refactor types of inputs for CI tests Aug 23, 2022

Adding fix to types

d9f6f9f

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 added the No Changelog Needed label Aug 23, 2022

fix style

a29a4a7

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 merged commit b56a106 into py-why:main Aug 24, 2022

adam2392 deleted the types branch August 24, 2022 00:11

adam2392 mentioned this pull request Aug 24, 2022

Tracking CI tests and moving them to a new repo? #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REFACTOR] Fix types for CI tests and refactor types of inputs for CI tests #25

[REFACTOR] Fix types for CI tests and refactor types of inputs for CI tests #25

adam2392 commented Aug 23, 2022

adam2392 commented Aug 23, 2022

codecov-commenter commented Aug 23, 2022 •

edited

Loading

bloebp Aug 23, 2022 •

edited

Loading

adam2392 Aug 23, 2022

bloebp Aug 23, 2022

adam2392 Aug 23, 2022

bloebp Aug 23, 2022

adam2392 Aug 23, 2022

bloebp Aug 23, 2022 •

edited

Loading

adam2392 Aug 23, 2022

adam2392 commented Aug 24, 2022

[REFACTOR] Fix types for CI tests and refactor types of inputs for CI tests #25

[REFACTOR] Fix types for CI tests and refactor types of inputs for CI tests #25

Conversation

adam2392 commented Aug 23, 2022

Before submitting

After submitting

adam2392 commented Aug 23, 2022

codecov-commenter commented Aug 23, 2022 • edited Loading

Codecov Report

bloebp Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

adam2392 Aug 23, 2022

Choose a reason for hiding this comment

bloebp Aug 23, 2022

Choose a reason for hiding this comment

adam2392 Aug 23, 2022

Choose a reason for hiding this comment

bloebp Aug 23, 2022

Choose a reason for hiding this comment

adam2392 Aug 23, 2022

Choose a reason for hiding this comment

bloebp Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

adam2392 Aug 23, 2022

Choose a reason for hiding this comment

adam2392 commented Aug 24, 2022

codecov-commenter commented Aug 23, 2022 •

edited

Loading

bloebp Aug 23, 2022 •

edited

Loading

bloebp Aug 23, 2022 •

edited

Loading