Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] CI tests for constraint-based structure learning #16

Merged
merged 10 commits into from
Aug 18, 2022

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Aug 16, 2022

Signed-off-by: Adam Li adam2392@gmail.com

Follow up to: #14 to lead to #11

Changes proposed in this pull request:

  • Adds conditional independence testing to the submodule dodiscover.ci in a consistent API.
  • Also removed pylint from the style checking as it is too stringent.

Needs to update the networkx algorithms that will work on mixedEdgeGraphs.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
pyproject.toml Show resolved Hide resolved
dodiscover/ci/_script.py Outdated Show resolved Hide resolved
dodiscover/ci/fisher_z_test.py Outdated Show resolved Hide resolved
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@robertness
Copy link
Collaborator

Looking at this now.

Makefile Outdated
@echo "Building documentation"
make -C doc/ clean
make -C doc/ html-noplot
cd doc/ && make view
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline.
Does the makefile need to be commited?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This is just for my dev sake rn. I'm still wrangling upstream dependencies. I.e. pywhy-graphs. I'll remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually jk I think it would be nice to have this. It makes the docs building a bit easier. If someone knows how to add a multi-step procedure to the poetry command build_docs, then maybe not tho...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's something scripted like this I would generally model this under a bash file and invoke the bash file via a poetry script.

@adam2392
Copy link
Collaborator Author

I'll get these CI green and then work on the PC part.

This turned out to be a lot of work since we're trying to keep integration with networkx proposal, and pywhy-graphs. But I think it'll be worth it once we see the PC algo implemented with these in mind :).

import networkx as nx


class GraphProtocol(Protocol):
Copy link
Collaborator

@robertness robertness Aug 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. What is the rationale for using duck typing instead of subclassing? Is it normal to use protocols with methods that all need to be overridden?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protocols just are Python's way of defining an abstract interface. So in the code, note that pywhy-graphs is never imported. And does not need to be. This is in line w/ discussions we had awhile back where the Dev team requested that we don't explicitly have to rely on pywhy's implementation of causal graphs.

This GraphProtocol class then allows a user to define their own graph class, as long as it adheres to the API, then it "should" still work. In addition, this provides mypy type checking w/o having to import pywhy-graphs explicitly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add an explicit example:

if someone from say ananke wants to make their graph implementation work with the implementation of the algorithms here that rely on a GraphProtocol, then they just need to modify their API to have the functions specified.

I personally am okay w/ being opinionated and requiring users to use pywhy-graphs directly and providing a transformation module for converting graphs from common formats, but I think we can always swap that easily using ctrl+f+replace


@abstractmethod
def test(
self, df: pd.DataFrame, x_var: Any, y_var: Any, z_covariates: Any = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are x_var, y_var and z_covariates of the Any type? Categoricals, integers, and floats right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can be anything that a pandas dataframe column supports. I wasn't 100% sure if Categoricals, integers, and floats are sufficient. Are they?


def test(
self, df: pd.DataFrame, x_var: Any, y_var: Any, z_covariates: Any = None
) -> Tuple[float, float]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to create types for the test stat and the p-value as this will be used in multiple places.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't they just floats tho? With pvalue being constrained between 0 and 1

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
dodiscover/ci/g_test.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@robertness robertness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My only issues are with some of the type hints.

@adam2392 can you comment here on why some flavors are left out? For example:

  • mutual information and asymptotic chi-square test for discrete variables, perhaps with a James-Stein estimator
  • Monte Carlo permutation testing and sequential Monte Carlo permutation testing
  • Jonckheere-Terpstra test for ordinal variables

@adam2392
Copy link
Collaborator Author

Will address the code comments tonight.

Re the additional CI tests: just never implemented them myself :p. It would be nice to get a community contribution for those. Or one of us can tackle it when there's time?

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@codecov-commenter
Copy link

Codecov Report

Merging #16 (bbb128b) into main (7533e90) will increase coverage by 16.90%.
The diff coverage is 77.15%.

@@             Coverage Diff             @@
##             main      #16       +/-   ##
===========================================
+ Coverage   60.00%   76.90%   +16.90%     
===========================================
  Files           1        7        +6     
  Lines           5      342      +337     
  Branches        0       55       +55     
===========================================
+ Hits            3      263      +260     
- Misses          2       63       +61     
- Partials        0       16       +16     
Impacted Files Coverage Δ
dodiscover/ci/oracle.py 47.36% <47.36%> (ø)
dodiscover/_protocol.py 60.86% <60.86%> (ø)
dodiscover/ci/g_test.py 61.01% <61.01%> (ø)
dodiscover/ci/base.py 91.66% <91.66%> (ø)
dodiscover/ci/kernel_test.py 92.64% <92.64%> (ø)
dodiscover/ci/fisher_z_test.py 96.55% <96.55%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@adam2392
Copy link
Collaborator Author

Gonna merge this for now to move forward w/ the next step. Left some comments open w/ my responses tho.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants