PC tutorial using ASIA data #67

robertness · 2022-12-08T20:48:36Z

Changes proposed in this pull request:
Adds a tutorial for the PC algo on a public dataset.

How to review this PR

Here is the learned CPDAG on the ASIA network.

Here is the comparable result from pc.stable in the bnlearn package.

So our performance is similar to bnlearn's implementation. But it doesn't reconstruct the graph very well. Here is the ground truth network for reference:

So it is not doing as well on this data, and therefore the tutorial is not telling a compelling story. I suspect we could improve the causal discovery and the narrative if we add some constraints. Any suggestions?

Before submitting

I've read and followed all steps in the Making a pull request
section of the CONTRIBUTING docs.
I've updated or added any relevant docstrings following the syntax described in the
Writing docstrings section of the CONTRIBUTING docs.
If this PR fixes a bug, I've added a test that will fail without my fix.
If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

All GitHub Actions jobs for my pull request have passed.

adam2392 · 2022-12-08T22:12:25Z

Oh interesting, so our PC algo incorrectly orients L -> S <- B, whereas it should be the other way around. This means that the algorithm incorrectly found that $L \perp B$ and $L \not\perp B | S$.

Are the alpha levels and CI tests the same used in dodiscover and R's pcstable?

Re constraints:

We can impose the constraint that S must cause L and B because smoking causes lung cancer and not the other way around. We should note that the constraint may make the resulting graph not a valid CPDAG.
Another thing we could do is just note this as an imperfection of causal discovery algorithms when data is unfaithful. I think IIRC, the ASIA dataset is unfaithful. This is seen because it is very hard to detect the edge between E and X.
Another thing we can do is also run the ConservativePC to demonstrate that the CPDAG there is more "robust"

WDYT?

robertness · 2022-12-08T22:45:01Z

Are the alpha levels and CI tests the same used in dodiscover and R's pcstable?

I used .05 in bnlearn's pcstable. What's the default here?

robertness · 2022-12-08T22:48:31Z

We can impose the constraint that S must cause L and B because smoking causes lung cancer and not the other way around. We should note that the constraint may make the resulting graph not a valid CPDAG.

So I once created an algorithm that would modify the CPDAG to account for edges fixed by interventions, constraints, and graph priors. Do you think we could use something like that here?

robertness · 2022-12-08T22:50:59Z

Another thing we could do is just note this as an imperfection of causal discovery algorithms when data is unfaithful. I think IIRC, the ASIA dataset is unfaithful. This is seen because it is very hard to detect the edge between E and X.

How about the ALARM network? I think that was the first successful use case of the PC algo.

robertness · 2022-12-08T22:52:08Z

Another thing we can do is also run the ConservativePC to demonstrate that the CPDAG there is more "robust"

Can you elaborate? How would this change things?

adam2392 · 2022-12-08T23:14:23Z

How about the ALARM network? I think that was the first successful use case of the PC algo.

+1

I used .05 in bnlearn's pcstable. What's the default here?

Also 0.05. Hmm perhaps there is a bug in the implementation of our CI test and/or the PC algo itself. Are you able to check through the separating sets? Cuz the skeleton learned looks the same in both, so this is good. Therefore the error must be in what the separating sets are or the orientation phase of the PC algo itself.

So I once created an algorithm that would modify the CPDAG to account for edges fixed by interventions, constraints, and graph priors. Do you think we could use something like that here?

What does this do?

Can you elaborate? How would this change things?

If we add background knowledge, the returned graph is no longer necessarily a Markov equivalence class of the DAG. It is an esoteric point for now, but it's something to note I would say.

For example, say you get the true CPDAG:

X - Y - Z

and $X \not\perp Z, X \perp Z | Y$, then you apply prior knowledge to say $X \rightarrow Y$, then $X \rightarrow Y - Z$ is not a CPDAG for the CI statements written.

In this simple setup, assuming the conditional independences learned were correct, then you would automatically have $X \rightarrow Y \rightarrow Z$, since a collider is not possible. But in general, I think this problem is open on how to systematically combine prior knowledge w/ causal orientation rules.

robertness · 2022-12-09T19:53:18Z

What does this do?

So for score-based algorithms, one learns a DAG and then converts it to a PDAG. In bnlearn, you convert to a PDAG using cpdag. I notice now the algo has a wlbl argument which allows you to apply constraints that would force some edges to stay oriented. This is what I was thinking of. My algorithm extended from constraints to causal priors and interventions as well, but those extensions don't matter quite yet.
I suppose in a constraint algo like the PC algo, we would do something like have the constraints force certain edges to be oriented when otherwise they'd be undirected. But we're already doing this via the contraints in the context object, correct?

robertness · 2022-12-09T20:08:37Z

We can impose the constraint that S must cause L and B because smoking causes lung cancer and not the other way around. We should note that the constraint may make the resulting graph not a valid CPDAG.

Ok, I tried imposing the constraint and it doesn't seem to work. Is this a bug or did I make an error?

included_edges = nx.DiGraph([('S', 'L'), ('S', 'B')])
context = make_context().variables(data=data).edges(include=included_edges).build()

ci_estimator = GSquareCITest(data_type="discrete")
pc = PC(ci_estimator=ci_estimator)

def convert_to_int(df):
    for var in df.columns:
        data[var] = [1 if x == "yes" else 0 for x in data[var]]
    return df
data_mod = convert_to_int(data)

pc.fit(data_mod, context)
graph = pc.graph_

draw(graph)

robertness · 2022-12-09T20:19:34Z

Created an issue to address the need to convert characters to ints in the data: #69

robertness · 2022-12-09T20:19:51Z

Created an issue to address the need to convert characters to ints in the data: #69

adam2392 · 2022-12-09T22:36:28Z

We can impose the constraint that S must cause L and B because smoking causes lung cancer and not the other way around. We should note that the constraint may make the resulting graph not a valid CPDAG.

Ok, I tried imposing the constraint and it doesn't seem to work. Is this a bug or did I make an error?

I did not thoroughly test adding constraints. However, I think this is a bug in the implementation, or perhaps just a miscommunication of how the constraints are applied. This is related to #46, which we should probably revisit.

I think the issue could be that inside the skeleton learning, we have the following function:

                    # ignore fixed edges
                    if (x_var, y_var) in self.context.included_edges.edges:
                        continue

robertness · 2022-12-10T21:08:09Z

I did not thoroughly test adding constraints. However, I think this is a bug in the implementation, or perhaps just a miscommunication of how the constraints are applied. This is related to #46, which we should probably revisit.

I think the issue could be that inside the skeleton learning, we have the following function:

I'm going to create a bug issue since we have reproducibility with my above code, and link to issue 46.

robertness · 2022-12-10T21:33:04Z

@adam2392 I cleaned up the narrative to discuss the less than ideal results. I created an issue to do another notebook that demonstrates the use of constraints, once that issue is fixed. Can you approve?

adam2392 · 2022-12-12T16:31:24Z

@adam2392 I cleaned up the narrative to discuss the less than ideal results. I created an issue to do another notebook that demonstrates the use of constraints, once that issue is fixed. Can you approve?

Hi @robertness I will try to get to this before EOY. So my hypothesis is that there is a runtime-issue that is created when you assume edge-constraints before the skeleton is discovered.

emrekiciman · 2022-12-18T06:49:47Z

Approved. Chatting with Robert, it looks like the notebook itself is working correctly, even though it is uncovering issues in the library.

emrekiciman

Approved.

adam2392

@robertness I have a few comments that would just make the overall maintenance easier:

is it possible to use bnlearn's API to pull the asia.csv, so that way it is not merged in with the source code?

# Example of interactive plotting
import bnlearn as bn

# Load example dataset
df = bn.import_example(data='asia')

Ref: https://erdogant.github.io/bnlearn/pages/html/Plot.html?highlight=asia

You would just delete asia.csv file and then add bnlearn to the doc dev list of dependencies: [tool.poetry.group.docs.dependencies] inside the pyproject.toml file.

I fixed the CI, so now there are some spelling issues caught in the notebook:

examples/notebooks/example-pc-algo.ipynb:155: distinquish ==> distinguish
[42](https://github.com/py-why/dodiscover/actions/runs/3728444238/jobs/6323464519#step:7:43)
examples/notebooks/example-pc-algo.ipynb:178: implemention ==> implementation

If you move the notebook to doc/tutorial/markovian, then this will cleanly separate simple example scripts and more involved Jupyter notebook tutorials. I would classify this notebook as a tutorial.

Lmk if you have any questions, or think something could be changed.

adam2392 · 2022-12-19T04:36:20Z

Approved. Chatting with Robert, it looks like the notebook itself is working correctly, even though it is uncovering issues in the library.

Okay sounds good to me. I think the notebook itself is fine then. I left some minor comments to make sure the tutorials sections are relatively lightweight/clean. They should be easily resolved and then I'll approve and merge!

I'll work on debugging moving the edge constraints to post-skeleton-discovery.

codecov-commenter · 2022-12-19T04:37:33Z

Codecov Report

Merging #67 (45eeb1a) into main (4d9a788) will not change coverage.
The diff coverage is 60.00%.

❗ Current head 45eeb1a differs from pull request most recent head f34981d. Consider uploading reports for the commit f34981d to get more accurate results

@@           Coverage Diff           @@
##             main      #67   +/-   ##
=======================================
  Coverage   82.13%   82.13%           
=======================================
  Files          20       20           
  Lines        1304     1304           
  Branches      228      229    +1     
=======================================
  Hits         1071     1071           
  Misses        152      152           
  Partials       81       81

Impacted Files	Coverage Δ
dodiscover/constraint/pcalg.py	`79.80% <60.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

robertness · 2022-12-28T00:34:27Z

@adam2392 I moved to docs, removed the asia data, and added bnlearn as a dependency.

Signed-off-by: Robert Ness <robertness@gmail.com>

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392

This now works. A few notes:

I fixed the CI. There were some installation issues that just needed to be iterated on based on the error messages that were provided.
I updated the notebook and in turn found a minor bug to fix in the PC algorithm orientation phase.
In the future, @robertness can you push to a branch on your fork and then start a PR from the fork rather than a branch on the main repo?

Unfortunately, unsure why the Windows build is failing. It seems in general poetry has difficulty with Windows... @darthtrevino any ideas?

robertness · 2023-01-04T17:32:27Z

Will use my own fork in future. Thanks @adam2392

* Create PC algo tutorial * Add updated poetry lock file * Fix notebook and update docs for CI. Fix code spell Signed-off-by: Robert Ness <robertness@gmail.com> Co-authored-by: Adam Li <adam2392@gmail.com> Signed-off-by: Adam Li <adam2392@gmail.com>

robertness requested a review from adam2392 December 8, 2022 20:48

robertness changed the title ~~FCI tutorial using ASIA data~~ PC tutorial using ASIA data Dec 8, 2022

This was referenced Dec 10, 2022

Constraints aren't working with PC algorithm #70

Open

Demonstrate use of edge constraints in a causal discovery algorithm tutorial #71

Open

robertness requested a review from emrekiciman December 18, 2022 01:11

emrekiciman previously approved these changes Dec 18, 2022

View reviewed changes

adam2392 requested changes Dec 19, 2022

View reviewed changes

adam2392 dismissed emrekiciman’s stale review via c91bdb7 December 19, 2022 04:32

adam2392 mentioned this pull request Dec 29, 2022

Update deps #74

Merged

5 tasks

robertness force-pushed the tutorial branch 3 times, most recently from 4c8a83b to ac070eb Compare December 31, 2022 03:16

robertness force-pushed the tutorial branch 5 times, most recently from 965cee8 to fb03346 Compare December 31, 2022 04:04

Create PC algo tutorial

d8d85af

Signed-off-by: Robert Ness <robertness@gmail.com>

robertness force-pushed the tutorial branch from fb03346 to d8d85af Compare December 31, 2022 04:10

adam2392 added 6 commits January 2, 2023 19:48

Add updated poetry lock file

596690b

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix index

2098cbe

Signed-off-by: Adam Li <adam2392@gmail.com>

fix circle ci

af58427

Signed-off-by: Adam Li <adam2392@gmail.com>

Try again

9156e56

Signed-off-by: Adam Li <adam2392@gmail.com>

Try again

f52f1f3

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix notebook and update docs for CI

45eeb1a

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 previously approved these changes Jan 3, 2023

View reviewed changes

Fix code spell

f34981d

Signed-off-by: Adam Li <adam2392@gmail.com>

adam2392 dismissed their stale review via f34981d January 3, 2023 04:17

adam2392 approved these changes Jan 3, 2023

View reviewed changes

adam2392 merged commit e94e038 into main Jan 3, 2023

adam2392 deleted the tutorial branch January 3, 2023 04:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PC tutorial using ASIA data #67

PC tutorial using ASIA data #67

robertness commented Dec 8, 2022 •

edited

Loading

adam2392 commented Dec 8, 2022

robertness commented Dec 8, 2022

robertness commented Dec 8, 2022

robertness commented Dec 8, 2022 •

edited

Loading

robertness commented Dec 8, 2022

adam2392 commented Dec 8, 2022

robertness commented Dec 9, 2022

robertness commented Dec 9, 2022 •

edited

Loading

robertness commented Dec 9, 2022

robertness commented Dec 9, 2022

adam2392 commented Dec 9, 2022

robertness commented Dec 10, 2022 •

edited

Loading

robertness commented Dec 10, 2022 •

edited

Loading

adam2392 commented Dec 12, 2022

emrekiciman commented Dec 18, 2022

emrekiciman left a comment

adam2392 left a comment •

edited

Loading

adam2392 commented Dec 19, 2022

codecov-commenter commented Dec 19, 2022 •

edited

Loading

robertness commented Dec 28, 2022

adam2392 left a comment •

edited

Loading

robertness commented Jan 4, 2023

PC tutorial using ASIA data #67

PC tutorial using ASIA data #67

Conversation

robertness commented Dec 8, 2022 • edited Loading

How to review this PR

Before submitting

After submitting

adam2392 commented Dec 8, 2022

robertness commented Dec 8, 2022

robertness commented Dec 8, 2022

robertness commented Dec 8, 2022 • edited Loading

robertness commented Dec 8, 2022

adam2392 commented Dec 8, 2022

robertness commented Dec 9, 2022

robertness commented Dec 9, 2022 • edited Loading

robertness commented Dec 9, 2022

robertness commented Dec 9, 2022

adam2392 commented Dec 9, 2022

robertness commented Dec 10, 2022 • edited Loading

robertness commented Dec 10, 2022 • edited Loading

adam2392 commented Dec 12, 2022

emrekiciman commented Dec 18, 2022

emrekiciman left a comment

Choose a reason for hiding this comment

adam2392 left a comment • edited Loading

Choose a reason for hiding this comment

adam2392 commented Dec 19, 2022

codecov-commenter commented Dec 19, 2022 • edited Loading

Codecov Report

robertness commented Dec 28, 2022

adam2392 left a comment • edited Loading

Choose a reason for hiding this comment

robertness commented Jan 4, 2023

robertness commented Dec 8, 2022 •

edited

Loading

robertness commented Dec 8, 2022 •

edited

Loading

robertness commented Dec 9, 2022 •

edited

Loading

robertness commented Dec 10, 2022 •

edited

Loading

robertness commented Dec 10, 2022 •

edited

Loading

adam2392 left a comment •

edited

Loading

codecov-commenter commented Dec 19, 2022 •

edited

Loading

adam2392 left a comment •

edited

Loading