Skip to content

Conversation

@MarkDana
Copy link
Collaborator

@MarkDana MarkDana commented Jun 30, 2022

Updated files:

tests/TestPC.py, and some .txt files (data, graph, benchmark results, etc.) to load in tests/TestData.

Test plan:

To test all unit tests:

cd tests/
python -m unittest TestPC

Or to run some specific unit test:

python -m unittest TestPC.TestPC.test_pc_with_fisher_z

The expected result is that all tests are passed.

What these unit tests can do:

  • Contains tests for PC over loaded dataset (linear Gaussian, discrete, and linear Gaussian with missing values), and PC over data simulated from specific graphs.
  • For developers, make sure their modified code is logically consistent (return the same result as benchmark) with the benchmark (as of commit 94d1536).
  • For users, know how to run PC, and have a quick glance at the PC algorithms performance (e.g. SHD to truth graph).

TODO:

  • Design more comprehensive fuzzing analysis, including
    • For code, traverse over every possible execution flow of PC.
    • For data, design dataset of corner cases (e.g., with super big cardinalities/sample size).
  • Incorporate TestMVPC*.py.
  • Incorporate PC with background knowledge.
  • Need to explain why (and how) users should choose parameters (of stable, uc_rule, and uc_priority) as in this test file.
  • Reorganize files in tests/ folder.

@tofuwen
Copy link
Contributor

tofuwen commented Jun 30, 2022

Thanks @MarkDana for the great work! It's a great progress to make causal-learn more reliable! :) And it looks so much better than the original one. Now I am much more confident to refactor other files. :)

Some general questions:

  1. I am a little worried about the correctness of current PC algorithm. Is it possible to run Tetrad on some dataset as well and compare it with the current version of PC in causal-learn? We don't need to run all of them --- just one or two as an example would be good. I think we shouldn't expect the results returned by two packages are significantly different?
  2. Is it possible to add more simple test cases to make sure our PC code is correct? Your test_pc_simulate_linear_gaussian_with_fisher_ is great (and exactly what I envision as tests :) ) But this only covers linear gaussian case? Maybe add another simple test like discrete case? :)

Some nits:

  1. For files under true_dag_graph, are those the "true graph"? (we have true graph information? Where did we get that?)
  2. How long it takes to finish each test roughly?

Some general process nits that may make the review process more efficient:

  1. When pushing really large PRs like this, usually it would be better to split the PR into several small PRs that will make the review much easier. In this specific case, we can have three PRs: one for txt files under "true graph" and "benchmark_returned_results", one for the simple test code change (where we know PC must return the true graph, e.g. test_pc_simulate_linear_gaussian_with_fisher_), one for complex benchmark test code changes.

Regarding huge code changes, one industry rule is PR with >200 significant lines will be auto-rejected. Of course we don't need to strictly follow this --- but generally making PR small and only contains one small changes each time will make the review easier and tracking easier. :)

  1. When making big changes like this, we can first go through the design together first. :) In this way, we can make sure people are aligned and we won't waste time by changing back and forth. Of course, this PR looks great --- just for the sake of later PRs.

  2. Could you include a test plan section which contains which tests you run and the results? (The result screenshot) Example: support user input variables name to PC #42

Thanks again for the great work! I actually didn't thought you are so productive --- I thought the PR only contains simple simulation tests (which is already great)! But you also finished the benchmark tests --- great!!!

@MarkDana
Copy link
Collaborator Author

Hi @tofuwen Thanks so much for your comments. Learned a lot :)

Regarding your questions:

I am a little worried about the correctness of current PC algorithm.

Me too. And thus it's very likely that the benchmark result files require frequent changes in the near future (as stated in the comment block here). I am still thinking of a convenient way to update those benchmark files and the corresponding MD5 verifications at every usage (e.g. here).

Is it possible to run Tetrad on some dataset as well and compare it.

Great suggestion! I'll do it soon. Currently there already seems to be some results returned by Tetrad java code (e.g. tetrad_graph_discrete_10.txt) in TestData. But I don't know what are the respective configuration that generate them. @jdramsey @kunwuz Maybe you happen to know? Thanks!

Is it possible to add more simple test cases

Cool! I'll do so. And we may also need more complex test cases - it may just happen to return the correct graph on simple cases.

For files under true_dag_graph, are those the "true graph"? (we have true graph information? Where did we get that?)

I found them at e.g. graph.10.txt. But similar as above, I don't know the respective codes to generate datasets. @jdramsey @kunwuz How about we also include codes to generate e.g. data_linear_10.txt from graph.10.txt?

How long it takes to finish each test roughly?

Except for test_pc_load_bnlearn_discrete_datasets, each should cost less than 20s. test_pc_load_bnlearn_discrete_datasets should cost around one minute. TODO I'll update the exact time later.

Could you include a test plan section which contains which tests you run and the results? (The result screenshot)

It's a nice suggestion for general unit tests, but for the tests included in this PR, if I understand correctly, it seems that we only need to pass each tests? - there is no need to verify or compare results by human, since they are done by assertions.

Thanks so much for your greeeatt work👍!

@kunwuz
Copy link
Collaborator

kunwuz commented Jun 30, 2022

Thanks so much for your hard and elegant work, and thanks Yewen for reviewing and brilliant comments! Regarding the datasets/results you mentioned, if I remember correctly, @chenweiDelight may be familiar with the detailed configuration of them.

@tofuwen
Copy link
Contributor

tofuwen commented Jul 1, 2022

  1. Yeah, you are perfect right on "we may also need more complex test cases". It's always better to have more tests! :) But I think that may need additional work --- and adding some simple tests to cover more code paths (e.g. different independence test; different variable types) is already great, and can already make sure at least at the code level, our code is mostly likely correct (and can detect stupid bugs already). :)
  2. Regarding test plan section, we still need to include it in every PR, saying what did you do to test this PR. For this PR, in your test plan section, it can be as simple as writing all the test commands you ran (i.e. python -m unittest TestPC.TestPC.test_pc_load_linear_10_with_fisher_z), and includes a screenshot of your terminal saying the test passed (the screenshot is optional though --- you can just say in your test plan all the tests passed)
  3. I also commented on fixing random seed in your PR, maybe you miss it? :) @MarkDana

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants