Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add PC algorithm #18

Closed
wants to merge 13 commits into from
Closed

[ENH] Add PC algorithm #18

wants to merge 13 commits into from

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Aug 18, 2022

Fixes #11
Closes: #10

Changes proposed in this pull request:

  • PC algorithm implementation with separate skeleton discovery stage (has all the features of causal-learn I think(?), except for missing-value support)
  • Adds the abstract class for Constraint Based learning algorithms
  • Adds the Context class
  • Adds an example showing how to run PC algorithm
  • Adds a metric submodule for comparing graphs

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

adam2392 commented Aug 18, 2022

Here are the docs output:

https://output.circle-artifacts.com/output/job/42854568-1d11-48d3-b89a-660449f4b9c0/artifacts/0/dev/index.html

Although not sure how to manage the versions and what they look like. It seemed that 0.0.0 was needed to be hard coded right? Do we just manually increment that when we make release?
@darthtrevino

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2022

Codecov Report

Merging #18 (28c1fa6) into main (a67b95c) will decrease coverage by 0.66%.
The diff coverage is 73.45%.

@@            Coverage Diff             @@
##             main      #18      +/-   ##
==========================================
- Coverage   76.90%   76.23%   -0.67%     
==========================================
  Files           7       13       +6     
  Lines         342      787     +445     
  Branches       55      159     +104     
==========================================
+ Hits          263      600     +337     
- Misses         63      138      +75     
- Partials       16       49      +33     
Impacted Files Coverage Δ
dodiscover/constraint/pcalg.py 65.41% <65.41%> (ø)
dodiscover/constraint/utils.py 68.75% <68.75%> (ø)
dodiscover/context.py 69.56% <69.56%> (ø)
dodiscover/_protocol.py 61.53% <73.91%> (+0.66%) ⬆️
dodiscover/constraint/skeleton.py 75.88% <75.88%> (ø)
dodiscover/constraint/_classes.py 81.15% <81.15%> (ø)
dodiscover/metrics.py 91.66% <91.66%> (ø)
dodiscover/ci/g_test.py 62.71% <0.00%> (+1.69%) ⬆️
dodiscover/ci/oracle.py 100.00% <0.00%> (+52.63%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Collaborator Author

Signed-off-by: Adam Li <adam2392@gmail.com>
@bloebp
Copy link
Member

bloebp commented Aug 18, 2022

This PR is quite big. Would it be possible to have each of the listed changes as a separate PR? Otherwise it is almost impossible that there is a proper review. Optimally, a PR should not have more than ~250 lines (preferably less). Except this is simply moving stuff over that has been review before already.

@adam2392
Copy link
Collaborator Author

That's doable! The challenge here is that the CI is still evolving as well. Sorry some files are changing there.

If you're okay with the skeleton, abstract class, etc. being PRed without tests first that's fine.

Then the final PR for the PC algo would have the corresponding tests?

@bloebp
Copy link
Member

bloebp commented Aug 18, 2022

You can also choose another PR (i.e. the branch) as a reference ('based on'), then if you make changes in, e.g., the CI stuff, it would be reflected here as well. Also, if there are inconsistencies, you can simply resolve them once the PR is approved. E.g., if the PR is about the PC-algorithm, we don't care about the details of the CI tests as long as we assume that there is 'something'. Then once the PR for the PC-algorithm is approved, you can simply adjust/rebase it to the latest changes of the CI tests.

Wouldn't it be possible to add tests for the classes independently of the PC algo? Just some basic tests (if they are necessary at all), like adding/deleting edges behave as expected etc. And then you can add further ones with the PC commit.

EquivalenceClassProtocol

Constraint-Based Structure Learning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these have gone in the previous PR? Perhaps they can go into a separate PR?

}
# Add pygraphviz png scraper, if available
try:
from pygraphviz.scraper import PNGScraper
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could graphviz visualiation be its own PR?

@@ -28,6 +28,7 @@ Changelog

- |Feature| Implement continuous integration and repository docs, by `Adam Li`_ (:pr:`15`)
- |Feature| Implement conditional independence tests under the ``dodiscover.ci`` submodule, by `Adam Li`_ (:pr:`16`)
- |Feature| Implement PC algorithm and skeleton for constraint-based structure learning algorithms under the ``dodiscover.constraint`` submodule, by `Adam Li`_ (:pr:`18`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam2392 are you updating this change log manually? Isn't this something we could handle with semantic commits? @darthtrevino

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah rn it's just a per PR changelog. What's semantic commits? :0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://python-semantic-release.readthedocs.io/ - this library will parse git commit messages for changelog information

https://pypi.org/project/semversioner/ - this project will let you batch up change documents that are applied into the changelog via their CLI.

I kind of prefer the latter approach; commit-message parsing can be flaky if contributors aren't aware of the requirement. We use similar tooling in our JS projects, and we require that every PR has a semver-impact document associated with it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit-message parsing can be flaky

I've certainly been guilty of crummy semantic commit messages. I'm inclined to go with whatever you think is best @darthtrevino . I'd like to employ what CI tech that can help these PR's can get leaner. The leaner they are, the easier it is for me to onboard new contributors.

@adam2392
Copy link
Collaborator Author

This PR is quite big. Would it be possible to have each of the listed changes as a separate PR? Otherwise it is almost impossible that there is a proper review. Optimally, a PR should not have more than ~250 lines (preferably less). Except this is simply moving stuff over that has been review before already.

@bloebp @robertness and @darthtrevino I've now broken up the PR a bit. I'm leaving this open to demonstrate the PC example for discussion, but for the code itself:

@adam2392
Copy link
Collaborator Author

^ Apologies for the gigantic PRs. I am more used to doing things in bite-sized chunks, but getting a repo setup sometimes takes a bit of wrangling. Will be better about it :p

Hopefully the breakups now make reviewing more streamlined and easy.

Copy link
Collaborator

@darthtrevino darthtrevino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; just a question about pygraphviz install

- run:
name: Check poetry package versions
command: |
poetry run pip install --upgrade --progress-bar off pygraphviz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird - why not add pygraphviz as a listed dependency in pyproject.toml? In general, my approach in dowhy has been to avoid pip entirely and model all dependencies inside of `pyproject.toml'. When we have these tacit installs it makes setting up a dev environment kind of tricky.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added it to the pyproject.toml and updated the poetry.lock file in #19

@@ -28,6 +28,7 @@ Changelog

- |Feature| Implement continuous integration and repository docs, by `Adam Li`_ (:pr:`15`)
- |Feature| Implement conditional independence tests under the ``dodiscover.ci`` submodule, by `Adam Li`_ (:pr:`16`)
- |Feature| Implement PC algorithm and skeleton for constraint-based structure learning algorithms under the ``dodiscover.constraint`` submodule, by `Adam Li`_ (:pr:`18`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://python-semantic-release.readthedocs.io/ - this library will parse git commit messages for changelog information

https://pypi.org/project/semversioner/ - this project will let you batch up change documents that are applied into the changelog via their CLI.

I kind of prefer the latter approach; commit-message parsing can be flaky if contributors aren't aware of the requirement. We use similar tooling in our JS projects, and we require that every PR has a semver-impact document associated with it.

@adam2392
Copy link
Collaborator Author

Closed in favor of #30

@adam2392 adam2392 closed this Aug 30, 2022
@adam2392 adam2392 deleted the pc2 branch August 30, 2022 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement PC algorithm following API
5 participants