-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PC tutorial using ASIA data #67
Conversation
Oh interesting, so our PC algo incorrectly orients L -> S <- B, whereas it should be the other way around. This means that the algorithm incorrectly found that Are the alpha levels and CI tests the same used in dodiscover and R's pcstable? Re constraints:
WDYT? |
I used .05 in bnlearn's pcstable. What's the default here? |
So I once created an algorithm that would modify the CPDAG to account for edges fixed by interventions, constraints, and graph priors. Do you think we could use something like that here? |
How about the ALARM network? I think that was the first successful use case of the PC algo. |
Can you elaborate? How would this change things? |
+1
Also 0.05. Hmm perhaps there is a bug in the implementation of our CI test and/or the PC algo itself. Are you able to check through the separating sets? Cuz the skeleton learned looks the same in both, so this is good. Therefore the error must be in what the separating sets are or the orientation phase of the PC algo itself.
What does this do?
If we add background knowledge, the returned graph is no longer necessarily a Markov equivalence class of the DAG. It is an esoteric point for now, but it's something to note I would say. For example, say you get the true CPDAG:
and In this simple setup, assuming the conditional independences learned were correct, then you would automatically have |
So for score-based algorithms, one learns a DAG and then converts it to a PDAG. In bnlearn, you convert to a PDAG using cpdag. I notice now the algo has a wlbl argument which allows you to apply constraints that would force some edges to stay oriented. This is what I was thinking of. My algorithm extended from constraints to causal priors and interventions as well, but those extensions don't matter quite yet. |
Ok, I tried imposing the constraint and it doesn't seem to work. Is this a bug or did I make an error?
|
Created an issue to address the need to convert characters to ints in the data: #69 |
1 similar comment
Created an issue to address the need to convert characters to ints in the data: #69 |
I did not thoroughly test adding constraints. However, I think this is a bug in the implementation, or perhaps just a miscommunication of how the constraints are applied. This is related to #46, which we should probably revisit. I think the issue could be that inside the skeleton learning, we have the following function:
|
I'm going to create a bug issue since we have reproducibility with my above code, and link to issue 46. |
@adam2392 I cleaned up the narrative to discuss the less than ideal results. I created an issue to do another notebook that demonstrates the use of constraints, once that issue is fixed. Can you approve? |
Hi @robertness I will try to get to this before EOY. So my hypothesis is that there is a runtime-issue that is created when you assume edge-constraints before the skeleton is discovered. |
Approved. Chatting with Robert, it looks like the notebook itself is working correctly, even though it is uncovering issues in the library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertness I have a few comments that would just make the overall maintenance easier:
- is it possible to use
bnlearn
's API to pull theasia.csv
, so that way it is not merged in with the source code?
# Example of interactive plotting
import bnlearn as bn
# Load example dataset
df = bn.import_example(data='asia')
Ref: https://erdogant.github.io/bnlearn/pages/html/Plot.html?highlight=asia
You would just delete asia.csv
file and then add bnlearn
to the doc dev list of dependencies: [tool.poetry.group.docs.dependencies]
inside the pyproject.toml
file.
- I fixed the CI, so now there are some spelling issues caught in the notebook:
examples/notebooks/example-pc-algo.ipynb:155: distinquish ==> distinguish
[42](https://github.com/py-why/dodiscover/actions/runs/3728444238/jobs/6323464519#step:7:43)
examples/notebooks/example-pc-algo.ipynb:178: implemention ==> implementation
- If you move the notebook to
doc/tutorial/markovian
, then this will cleanly separate simple example scripts and more involved Jupyter notebook tutorials. I would classify this notebook as a tutorial.
Lmk if you have any questions, or think something could be changed.
Okay sounds good to me. I think the notebook itself is fine then. I left some minor comments to make sure the tutorials sections are relatively lightweight/clean. They should be easily resolved and then I'll approve and merge! I'll work on debugging moving the edge constraints to post-skeleton-discovery. |
Codecov Report
@@ Coverage Diff @@
## main #67 +/- ##
=======================================
Coverage 82.13% 82.13%
=======================================
Files 20 20
Lines 1304 1304
Branches 228 229 +1
=======================================
Hits 1071 1071
Misses 152 152
Partials 81 81
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@adam2392 I moved to docs, removed the asia data, and added bnlearn as a dependency. |
4c8a83b
to
ac070eb
Compare
965cee8
to
fb03346
Compare
Signed-off-by: Robert Ness <robertness@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now works. A few notes:
- I fixed the CI. There were some installation issues that just needed to be iterated on based on the error messages that were provided.
- I updated the notebook and in turn found a minor bug to fix in the PC algorithm orientation phase.
- In the future, @robertness can you push to a branch on your fork and then start a PR from the fork rather than a branch on the main repo?
Unfortunately, unsure why the Windows build is failing. It seems in general poetry has difficulty with Windows... @darthtrevino any ideas?
Will use my own fork in future. Thanks @adam2392 |
* Create PC algo tutorial * Add updated poetry lock file * Fix notebook and update docs for CI. Fix code spell Signed-off-by: Robert Ness <robertness@gmail.com> Co-authored-by: Adam Li <adam2392@gmail.com> Signed-off-by: Adam Li <adam2392@gmail.com>
Fixes issue 66.
Changes proposed in this pull request:
Adds a tutorial for the PC algo on a public dataset.
How to review this PR
Here is the learned CPDAG on the ASIA network.
Here is the comparable result from
pc.stable
in the bnlearn package.So our performance is similar to bnlearn's implementation. But it doesn't reconstruct the graph very well. Here is the ground truth network for reference:
So it is not doing as well on this data, and therefore the tutorial is not telling a compelling story. I suspect we could improve the causal discovery and the narrative if we add some constraints. Any suggestions?
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting