Skip to content

Conversation

@MarkDana
Copy link
Collaborator

Updated files:

  • TestPC.py: added test_pc_with_citest_local_checkpoint as a usage example of save&load local cache checkpoints. Other usages on constraint-based methods (e.g., FCI and CDNOD) can be just referred to here.
  • PC.py, FCI.py, CDNOD.py: added kwargs to func pc to pass additional parameters to CIT (and possibly more for future usage).

Why we need this:

  • A typical use case 1: pc(uc_rule=0, ...) then after the long run and viewing the results, I want to just finetune some params (e.g., pc(uc_rule=2, ...) - then I'll need to wait for another 6 hrs??? (though most of the CI tests (on skeleton discovery) are the same).
  • A typical use case 2: sometimes due to slow speed/some random error I interrupt the running PC. Then I need to run it again - I've already spent hours on it, and want to resume from the breaking point.
  • In the above cases, usually the CIT results can be resumed, so as to save almost all of the running time - especially when the graph scale is big or KCI is used - where CI tests consume almost all time.
  • So, we designed a user-specified feature (default as off), to write CITest cache from runtime (which we already have at Pc fci fast haoyue #6) to some user-specified local path (which we implemented at Refactor CITs in oop way #62).

How to use this:

Just refer to test_pc_with_citest_local_checkpoint in TestPC.py:

For example, if we plan to use local cache, instead of directly use

cg = pc(data, 0.05, kci)

as before, now we use

citest_cache_file = "./TestData/citest_cache_linear_10_first_500_kci.json"    # .json file
cg = pc(data, 0.05, kci, cache_path=citest_cache_file)

If citest_cache_file does not exist in your local cache, a new one will be created. Otherwise, the cache will be first loaded from the json file to the CIT class and used during the runtime. Note that 1) data hash and parameters hash will first be checked at loading to ensure consistency, and 2) during runtime, the cache will be saved to the local file every 30 seconds.

Test plan:

python -m unittest TestPC.TestPC.test_pc_with_citest_local_checkpoint

The test should pass and console will show:

First pc run takes 125.663s.
Second pc run takes 27.316s.

Also a "./TestData/citest_cache_linear_10_first_500_kci.json" is saved. It will look like:

{
  "data_hash": "41b2bb03b69f7aa5910437ce481cd09a",
  "method_name": "kci",
  "parameters_hash": "99914b932bd37a50b983c5e7c90ae93b",
  "0;1": 0.2909559404988158,
  "0;2": 0.0,
  "0;3": 0.0,
  "0;4": 0.00021451118990445384,
  "0;5": 0.0,
  "0;6": 6.658751283694642e-08,
  "0;7": 1.7363888105137448e-13,
  "0;8": 0.0,
  "0;9": 2.3744259060709538e-05,
  "0;10": 0.5269525537421107,
  "0;11": 0.32119609314891906,

...

  "1;12|0": 0.0,
  "1;12|3": 0.0,
  "1;12|4": 0.0,
  "1;12|7": 0.0,
  "1;12|17": 0.0,
  "1;12|19": 0.0,
  "3;12|1": 1.1102230246251565e-16,
  "3;12|6": 7.784628497375934e-11,
  "3;12|9": 4.080547788554156e-09,
  "3;12|15": 9.85247894380592e-09,
  "4;12|1": 0.0009963076474257537,
  "4;12|6": 4.189646983976392e-05,
  "4;12|7": 0.025117609530545315,
  "4;12|9": 0.06133924708469907,
  "4;12|11": 0.02263991680291122,
  "4;12|15": 0.013199259718645107,
  "4;12|17": 0.0019548269683874464,
  "6;12|3": 3.385422775448177e-08,
  "6;12|4": 2.5775825918117334e-11,
  "6;12|7": 7.315127963369861e-07,
  "6;12|9": 7.233954915086827e-07,
  "6;12|15": 1.267475481236957e-07,
  "6;12|17": 2.8023321030357096e-07,
  "6;12|19": 3.471212206562768e-11,
  "7;12|1": 9.992007221626409e-16,
  "7;12|4": 4.387676888484293e-10,
  "7;12|6": 1.062029930665176e-08,
  "7;12|9": 2.457501979691301e-10,
  "7;12|11": 2.1808488348540322e-10
}

TODO:

To update the related examples/tutorials in the docs.

@kunwuz
Copy link
Collaborator

kunwuz commented Jul 27, 2022

Wow, that's really awesome! Please let me know what you would like to include in the documentation. I believe this is a 'revolution' for causal discovery packages :)

@MarkDana
Copy link
Collaborator Author

@kunwuz wow thanks. Just a small patch lol.

Sure I'll send you doc updates later (e.g., calls to CIT https://causal-learn.readthedocs.io/en/latest/independence_tests_index/fisherz.html also need to get updated). thx:))

Copy link
Contributor

@tofuwen tofuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, this is really awesome!!

(We had more **kargs, probably need to change more in later refactor lol)

@tofuwen
Copy link
Contributor

tofuwen commented Jul 29, 2022

@kunwuz You can merge this PR

@kunwuz kunwuz merged commit b198dc1 into py-why:main Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants