Skip to content

Conversation

@zhi-yi-huang
Copy link
Collaborator

@zhi-yi-huang zhi-yi-huang commented Aug 13, 2022

Updates

  • Fixed FCI bugs
  • Refactored unit tests for FCI
  • Fixed GeneralGraph bugs
  • Refactored DAG2PAG
  • Refactored unit tests for PC

Description

  • Fixed the bug that line 85-86 in the code as of commit d75b215 would cause the recursion error.
  • Fix the bug that the dictionary object previous in function getPossibleDsep is not updated.
  • The variable dpath is used to record whether there is a directed path between two variables. But the previous implementation does not determine whether the edges are fully directed or not.
  • In GeneralGraph's remove_edge function, not every call of function remove_edge needs to reconstitute dpath, it only has an effect on dpath when a directed edge is removed. In order to reduce unnecessary reconstruction, so I made a modification to the remove_edge function.
  • We added a random graph (erdos renyi graph) to the test.

Test Plan

python -m unittest tests.TestFCI # should pass

image

python -m unittest tests.TestPC # should pass

image

Copy link
Contributor

@tofuwen tofuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your awesome work!!! :)

if self.existsSemidirectedPath(node_r, node_x, graph) or self.existsSemidirectedPath(node_r, node_b, graph):
if self.existOnePathWithPossibleParents(previous, node_r, node_x, node_b, graph):
return True
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the code logic got changed here.....

Is it because our original implementation has bug?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will cause "RecursionError: maximum recursion depth exceeded while calling a Python object". And I compared the code in Tetrad and it seems that these two lines of code are not needed.

tests/TestFCI.py Outdated
from causallearn.utils.GraphUtils import GraphUtils
from causallearn.utils.PCUtils.BackgroundKnowledge import BackgroundKnowledge

sys.path.append("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't do this.

Could you check discussion here and make changes correspondingly?

#59

tests/TestFCI.py Outdated
Comment on lines 76 to 79
ground_truth_dag.add_directed_edge(ground_truth_nodes[0], ground_truth_nodes[1])
ground_truth_dag.add_directed_edge(ground_truth_nodes[0], ground_truth_nodes[2])
ground_truth_dag.add_directed_edge(ground_truth_nodes[1], ground_truth_nodes[3])
ground_truth_dag.add_directed_edge(ground_truth_nodes[2], ground_truth_nodes[3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do a for loop here, instead of manually add every edge.

e.g.

for node_1, node2 in [(0, 1), (0, 2), (1, 3), (2, 3)]:
    ground_truth_dag.add_directed_edge(ground_truth_nodes[node_1], ground_truth_nodes[node_1])

tests/TestFCI.py Outdated
Comment on lines 102 to 111
ground_truth_dag.add_directed_edge(ground_truth_nodes[7], ground_truth_nodes[0])
ground_truth_dag.add_directed_edge(ground_truth_nodes[7], ground_truth_nodes[1])
ground_truth_dag.add_directed_edge(ground_truth_nodes[8], ground_truth_nodes[3])
ground_truth_dag.add_directed_edge(ground_truth_nodes[8], ground_truth_nodes[4])
ground_truth_dag.add_directed_edge(ground_truth_nodes[2], ground_truth_nodes[5])
ground_truth_dag.add_directed_edge(ground_truth_nodes[2], ground_truth_nodes[6])
ground_truth_dag.add_directed_edge(ground_truth_nodes[5], ground_truth_nodes[1])
ground_truth_dag.add_directed_edge(ground_truth_nodes[6], ground_truth_nodes[3])
ground_truth_dag.add_directed_edge(ground_truth_nodes[3], ground_truth_nodes[0])
ground_truth_dag.add_directed_edge(ground_truth_nodes[1], ground_truth_nodes[4])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, maybe use for loop

tests/TestFCI.py Outdated
Comment on lines 130 to 133
ground_truth_dag.add_directed_edge(ground_truth_nodes[0], ground_truth_nodes[2])
ground_truth_dag.add_directed_edge(ground_truth_nodes[1], ground_truth_nodes[2])
ground_truth_dag.add_directed_edge(ground_truth_nodes[2], ground_truth_nodes[3])
ground_truth_dag.add_directed_edge(ground_truth_nodes[2], ground_truth_nodes[4])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

tests/TestFCI.py Outdated
Comment on lines 152 to 143
ground_truth_dag.add_directed_edge(ground_truth_nodes[7], ground_truth_nodes[0])
ground_truth_dag.add_directed_edge(ground_truth_nodes[7], ground_truth_nodes[5])
ground_truth_dag.add_directed_edge(ground_truth_nodes[8], ground_truth_nodes[0])
ground_truth_dag.add_directed_edge(ground_truth_nodes[8], ground_truth_nodes[6])
ground_truth_dag.add_directed_edge(ground_truth_nodes[9], ground_truth_nodes[3])
ground_truth_dag.add_directed_edge(ground_truth_nodes[9], ground_truth_nodes[4])
ground_truth_dag.add_directed_edge(ground_truth_nodes[9], ground_truth_nodes[6])
ground_truth_dag.add_directed_edge(ground_truth_nodes[0], ground_truth_nodes[1])
ground_truth_dag.add_directed_edge(ground_truth_nodes[0], ground_truth_nodes[2])
ground_truth_dag.add_directed_edge(ground_truth_nodes[1], ground_truth_nodes[2])
ground_truth_dag.add_directed_edge(ground_truth_nodes[2], ground_truth_nodes[4])
ground_truth_dag.add_directed_edge(ground_truth_nodes[5], ground_truth_nodes[6])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@tofuwen
Copy link
Contributor

tofuwen commented Sep 5, 2022

btw, could you update your description to contain more details on the bug you fixed?

We should write a better descriptions in general (i.e. for every possible PRs) for people who check this PR later.

@jdramsey
Copy link
Collaborator

jdramsey commented Sep 5, 2022 via email

@tofuwen
Copy link
Contributor

tofuwen commented Sep 16, 2022

Yeah, I share the same concern as Joe.

Could you please ask the owner (or whoever familiar with the code) about the context why we need
"if self.existOnePathWithPossibleParents(previous, node_r, node_x, node_b, graph):
return True")?

Someone added this previously before, so I guess very likely it's for fixing some bugs. And by removing this line, it's likely you are introducing new bugs. So we better get to know the context and figure out the solutions instead of directly deleting it.

@tofuwen
Copy link
Contributor

tofuwen commented Sep 27, 2022

@zhi-yi-huang any updates regarding this PR? :) Sorry for the late reply though hhhh

@zhi-yi-huang
Copy link
Collaborator Author

zhi-yi-huang commented Oct 8, 2022

Sorry, I was a little busy a while ago. The latest code has been pushed to GitHub. Special thanks to @jdramsey @chenweiDelight @wean2016 for helping to fix FCI and Fas algorithm.

@tofuwen
Copy link
Contributor

tofuwen commented Oct 26, 2022

Thanks for the awesome work, @zhi-yi-huang !!!! This is great and really non-trivial, you fixed a tons of bugs that makes the package significantly better!

One final thing: since you changed GeneralGraph.py, could you run all other tests to make sure your changes doesn't break any other tests? After all tests passed, I think we are ready to push this PR.

cc @kunwuz I think we should work on enabling auto tests for all our PR. Could you do some research to see how we can enable this for our packages? :)

@kunwuz
Copy link
Collaborator

kunwuz commented Oct 26, 2022

Thanks for the awesome work, @zhi-yi-huang !!!! This is great and really non-trivial, you fixed a tons of bugs that makes the package significantly better!

One final thing: since you changed GeneralGraph.py, could you run all other tests to make sure your changes doesn't break any other tests? After all tests passed, I think we are ready to push this PR.

cc @kunwuz I think we should work on enabling autotests for all our PR. Could you do some research to see how we can enable this for our packages? :)

Sure, I will explore it in the near future. Btw, since I'm not very familiar with that, any suggestions (perhaps based on previous experiences) are super welcome. For this PR, maybe we could push it after finishing some necessary tests that @zhi-yi-huang and @tofuwen think could be necessary.

@tofuwen
Copy link
Contributor

tofuwen commented Oct 26, 2022

yeah, for this PR, after @zhi-yi-huang run all other tests, we can push it.

@tofuwen
Copy link
Contributor

tofuwen commented Nov 7, 2022

@MarkDana could you help review the failed PC tests in this PR? :)

@MarkDana
Copy link
Collaborator

MarkDana commented Nov 7, 2022

@tofuwen Oh thanks for reminding! Will do it later tomorrow.

@MarkDana
Copy link
Collaborator

MarkDana commented Nov 9, 2022

@zhi-yi-huang Seems that this pr contains changes of the following aspects (correct me if I have misunderstandings):

  1. Fixed FCI and Fas bugs in FCI.py and Fas.py (of which the latter will affect PC).
  2. Graph operations in GeneralGraph.py and DAG2PAG.py (both will affect PC).
  3. Other utils that do not affect, or unrelated to PC, e.g., DepthChoiceGenerator.py, BackgroundKnowledge.py, and TestFCI.py.
  4. Generated FCI benchmark graphs for future testing.

The problem that causes failed tests on PC should be in 2 I guess:

  • Everything else same as the current main + Fas.py of this pr -> TestPC passes.
  • Everything else same as the current main + GeneralGraph.py of this pr -> TestPC fails.

I didn't check details into your modified graph operations, but maybe you could start from this. Also, just a kind reminder - after bugs about graphs are fixed, we need to regenerate all benchmark files (e.g., bnlearn_discrete_10000_alarm_fci_chisq_0.05.txt).

Thanks a lot 🍺!

@tofuwen
Copy link
Contributor

tofuwen commented Dec 6, 2022

@zhi-yi-huang any updates? :) Did you find the bug? Hopefully to merge this PR in the near future to make causal-learn better!

@zhi-yi-huang
Copy link
Collaborator Author

The problem that causes failed tests on PC should be in GeneralGraph.py. I had added TestGeneralGraph.py in branch pc and pc_with_new_generalgraph of repo. The only difference between the code on these two branches is the main difference in GenralGraph.py.

The original GenralGraph.py would cause issue #82. I have reproduced it in TestGeneralGraph.py. The root cause of this issue is the function reconstitute_dpath which lead to errors in determining ancestor nodes as shown in TestGeneralGraph.py.

After updating to the GeneralGraph.py on this commit 3cfd99e, the above issues would disappear. The result is shown in TestGeneralGraph.py.

@tofuwen
Copy link
Contributor

tofuwen commented Dec 10, 2022

@zhi-yi-huang Thanks so much for your awesome! I can definitely see you spent lots of efforts on this PR and this is definitely not a trivial task.

Just to make sure, the current PR can pass ALL of our tests, right? (NOT just PC test and your new test)

cc @MarkDana to review PC related

@MarkDana
Copy link
Collaborator

Hi @zhi-yi-huang Thanks for your efforts! Could you please commit your changes (e.g., GeneralGraph.py, TestGeneralGraph.py) in your repo to this pull request? Then I'll review it before it's finally merged. Thanks! :)

@zhi-yi-huang
Copy link
Collaborator Author

There are some tests that fail:

  1. All the tests in TestANM.py fail to pass. The p_value in the assertion is inconsistent with what is expected.
  2. The test test_skeleton_discovery in TestBackgroundKnowledge.py fail to pass. It feeds back as "AttributeError: 'str' object has no attribute 'method'". It is the ci_test method in class CausalGraph that is the cause.
  3. The test test_pc_with_mv_fisherz_MCAR_data_assertion in TestMVPC_mv_fisherz.py fail to pass. It feeds back as "AssertionError: A test deletion fisher-z test showed no overlapping data involving variables. Please check the input data.".

These tests do not pass in the main branch code either. And it doesn't look like the changed in this PR affects these tests.

GeneralGraph.py has already updated in this PR. I'll commit the change TestGeneralGraph.py later.

@kunwuz
Copy link
Collaborator

kunwuz commented Jan 15, 2023

Hi all, since this PR has already been for a while, please let me know if there is anything else before we can merge this.

@MarkDana
Copy link
Collaborator

H @kunwuz! @zhi-yi-huang identified some flaws in the original GeneralGraph class, which led to the issue in #82. Now @zhi-yi-huang has fixed them in GeneralGraph, and is regenerating all the benchmarks used for tests, as with the dependency on GeneralGraph, our original testing benchmarks might also be wrong. As soon as @zhi-yi-huang finishes this part and commits the new benchmarks, this pr is ready to go.

Thanks so much everyone (especially @zhi-yi-huang ) for your efforts!

@kunwuz kunwuz mentioned this pull request Feb 1, 2023
zhi-yi-huang and others added 8 commits February 3, 2023 01:23
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: wean2016 <weanyq@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: wean2016 <weanyq@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: wean2016 <weanyq@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: wean2016 <weanyq@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
Signed-off-by: ZhiyiHuang <huangzhiyi.chn@gmail.com>
@MarkDana
Copy link
Collaborator

MarkDana commented Feb 4, 2023

Hi @zhi-yi-huang Awesome! I just reviewed this pr and it is good to me. As an additional check, all modifications on the benchmark CPDAG results are in the form -1 -> 1 in adjacency matrices, i.e., to change some false Meeks-oriented edges back to undirected. This aligns with your observations.

@kunwuz I think this pr is ready to go!! Thanks everyone (especially @zhi-yi-huang) for your efforts!! Thanks 🍺🍺

@kunwuz kunwuz merged commit 56a410c into py-why:main Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants