Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REFACTOR] Fix types for CI tests and refactor types of inputs for CI tests #25

Merged
merged 9 commits into from
Aug 24, 2022

Conversation

adam2392
Copy link
Collaborator

Signed-off-by: Adam Li adam2392@gmail.com

Just adds the necessary types for each CI test.

Changes proposed in this pull request:

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

This should address some issues raised in the OG PR #18. Lmk wdyt.

@codecov-commenter
Copy link

codecov-commenter commented Aug 23, 2022

Codecov Report

Merging #25 (a29a4a7) into main (dec2cbf) will increase coverage by 16.83%.
The diff coverage is 77.07%.

@@             Coverage Diff             @@
##             main      #25       +/-   ##
===========================================
+ Coverage   60.00%   76.83%   +16.83%     
===========================================
  Files           1        8        +7     
  Lines           5      367      +362     
  Branches        0       59       +59     
===========================================
+ Hits            3      282      +279     
- Misses          2       66       +64     
- Partials        0       19       +19     
Impacted Files Coverage Δ
dodiscover/ci/oracle.py 56.52% <56.52%> (ø)
dodiscover/_protocol.py 60.86% <60.86%> (ø)
dodiscover/ci/g_test.py 61.98% <61.98%> (ø)
dodiscover/ci/base.py 83.33% <83.33%> (ø)
dodiscover/ci/kernel_test.py 90.20% <90.20%> (ø)
dodiscover/ci/fisher_z_test.py 96.87% <96.87%> (ø)
dodiscover/typing.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 mentioned this pull request Aug 23, 2022
5 tasks
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
self, df: pd.DataFrame, x_var: Any, y_var: Any, z_covariates: Any = None
self,
df: pd.DataFrame,
x_var: Column,
Copy link
Member

@bloebp bloebp Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I just realized you expect the column name of the df here and not the data itself (was thinking of the API as here https://github.com/py-why/dowhy/blob/main/dowhy/gcm/independence_test/kernel.py#L25). But then you would need to allow multiple names, similar to z_covariates: Set[Column], since your test can involve multivariate inputs (the independence test itself then would need to throw an error if it doesn't support it. But most of them do, like HSIC or KCI).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now and simplicity sake, I left the X/Y variables as a univariate variable column in the dataframe. We can definitely extend later and involve multivariate inputs as you said (and error otw).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would already limit it for now? For instance, this would not allow to verify a local Markov condition where you would need to check independence between a target and multiple non-descendant nodes given its parents.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be possible by just running each test multiple times?

But yeah I guess I'm just not 100% sure how to extend these tests for handling multivariate input slash which ones are capable. That was my main bottleneck tbh. Was hoping someone with more expertise in the community there can help.

E.g. Does FisherZ, G test handle multivariate, or should we just raise an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be possible by just running each test multiple times?

No, testing variables pairwise vs jointly is not the same. For instance, in case of an XOR Z := X ⊕ Y, the independence test would indicate that Z and X or Z and Y are independent, but it would correctly indicate that Z and (X, Y) are dependent. See for instance this code snippet:

import numpy as np

from dowhy.gcm import independence_test

X = np.random.choice(2, 500)
Y = np.random.choice(2, 500)
Z = np.bitwise_xor(X, Y)

print(independence_test(X, Z))
>>> 0.65
print(independence_test(np.column_stack([X, Y]), Z))
>>> 0.0

Therefore, you (unfortunately) need to add all variables.

I am more familiar with the kernel tests (where multivariate inputs are not a problem), see for instance:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah that's true. Good example! Hmm okay how about I enforce an error currently for multivariate input for these Fisher/G^2, but allow it in general because I think my implementation of KCI is essentially the same. I'll modify the typing now. In that case, x_vars and y_vars will have type Set[Column].

^ Now that you bring up those tests, I think it might also make sense to refactor a copy into dodiscover (for now). With the goal of eventually offloading them in general. Just so dodiscover can run structure learning w/o needing to import everything in dowhy. WDYT?

Copy link
Member

@bloebp bloebp Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of a creating a copy, we can already discuss the creation of a separate package. I would expect that we get more tests over time and independence testing is a crucial feature for various functionalities (not only causal discovery). Here, I would also prefer to utilize the HSIC and KCI implementations from the causal-learn package, since they are well maintained and optimized. I already prepared a wrapper for that previously, but couldn't use it due to the license. Now, this is possible.

Lets for now keep it as it is and pick this up in our next meeting. If we all agree, I think we can create the new package rather quickly and move the parts over. This also gives a chance to revisit the API (somehow, I missed the previous discussion on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay sounds good. For the purposes of moving the #20 and #21 along, do you mind if I merge this in? When we pull out the CI tests, it'll be easy to just move the internal folder to another repo. But for now, I can then resolve some of the typing issues.

@adam2392 adam2392 changed the title [ENH] Fix types for CI tests [REFACTOR] Fix types for CI tests and refactor types of inputs for CI tests Aug 23, 2022
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

I will merge this for the sake of consolidating the skeleton PR. Since we might move the CI tests anyways, these can easily be changed when we move them to a new PR.

Moreover, if there's any high level design issues, then I can refactor them then.

@adam2392 adam2392 merged commit b56a106 into py-why:main Aug 24, 2022
@adam2392 adam2392 deleted the types branch August 24, 2022 00:11
adam2392 added a commit to adam2392/dodiscover that referenced this pull request Aug 24, 2022
… tests (py-why#25)

- Improve type checking for CI tests
- add fingerprint for circleCI

Signed-off-by: Adam Li <adam2392@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants