Pc fci fast haoyue #6

MarkDana · 2021-12-18T20:38:25Z

optimize speed for PC and FCI

change overview

I mainly did two things:

Add cache to prevent from repeated CITests, which is especially common and wastes much time in FCI.
- This applies to PC and FCI, and to all CITests.
Use matrix operation to translate For-loops in discrete CITests' conditioning subsetting.
- This applies to all constraint-based methods, and to Chi2 and G2 tests (discrete data).

These two optimizations are faithful to original code (same calculation, same procedure, same result, no approximation). The only difference is speed. And the larger data is, the more speedup we'll get (#subsets exponential to #nodes).

result overview

Make sure the new CITest is exactly the same as old one:

python -m unittest TestPC.TestPC.test_new_old_gsq_chisq_equivalent

Test PC speed on discrete datasets:

Run:

python -m unittest TestPC.TestPC.test_bnlearn_discrete_datasets

Data: In original tests/ folder the discrete data are not large enough. So I used some data from bnlearn with samplesize=10,000. See ./TestData/bnlearn_discrete_10000.
Competitors:
- Tetrad java version. Used the Python interface py-causal.
- pcalg. In R language.
- causal-learn-old. Used the latest 9b8f06f version @12/13/2021.
- causal-learn-new (this pull request).
Note:
- To test on causal-learn-old, please checkout to current main branch, copy the bnlearn data folder, copy the TestPC.py and run respectively.
- Just a rough run. I'm sure that causal-learn-old and causal-learn-new runs exactly the same procedure. But not sure about parameters/operations inside are same in Tetrad and pcalg.
Result (produced on my M1max):

data (#nodes/#edges) time (sec)	pcalg-R	Tetrad-java	causal-learn-old	causal-learn-new	~x times faster than before
cancer 5/4	1.540	0.327	0.037	0.011	3
earthquake 5/4	1.583	0.326	0.043	0.011	4
survey 6/6	2.970	0.334	0.075	0.013	6
asia 8/8	2.999	0.678	0.134	0.023	6
sachs 11/17	3.096	2.225	4.495	0.142	32
child 20/25	56.050	18.118	14.298	0.619	23
insurance 27/52	118.203	29.115	25.032	1.377	18
water 32/66	1.553	2.839	4.276	0.316	14
alarm 37/46	110.337	7.908	14.123	0.857	16
barley 48/84	493.209	766.548	97.113	3.430	28
hailfinder 56/66	757.147	5.843	18.300	0.875	21
hepar2 70/123	/	83.793	282.980	9.508	30
win95pts 76/112	/	17.492	73.937	3.395	22
munin1 186/273	/	2258.087	8580.979	145.942	59
andes 223/338	/	191.619	1456.823	27.463	53

Test FCI speed on discrete datasets:

Similar as above,

python -m unittest TestFCI.TestFCI.test_bnlearn_discrete_datasets

Result:

data (#nodes/#edges) time (sec)	pcalg-R	Tetrad-java	causal-learn-old	causal-learn-new	~x times faster than before
cancer 5/4	1.942	0.514	0.054	0.002	23
earthquake 5/4	1.717	0.339	0.055	0.002	32
survey 6/6	2.528	0.244	0.071	0.003	21
asia 8/8	3.165	0.600	0.153	0.031	5
sachs 11/17	44.944	2.090	6.014	0.085	71
child 20/25	88.366	4.191	19.557	0.687	28
insurance 27/52	219.912	7.686	56.823	1.764	32
water 32/66	1.376	1.513	6.619	0.364	18
alarm 37/46	169.848	3.253	23.864	0.980	24
barley 48/84	665.854	152.248	275.160	4.902	56
hailfinder 56/66	/	/	38.710	1.704	23
hepar2 70/123	/	64.398	597.187	10.733	56
win95pts 76/112	/	9.538	138.113	4.093	34
munin1 186/273	/	611.075	>6 hrs	278.110	>78
andes 223/338	/	86.151	3083.325	39.021	79

Test FCI on continuous dataset:

python -m unittest TestFCI.TestFCI.test_large_continuous_dataset

On ./data_linear_10.txt, old FCI uses 4.41656s, and new FCI uses 1.87639s. This difference is only due to cache.

Also checkout

Thanks Wei's for earlier optimization on FCI at commit 9b8f06f. Compare release 0.1.2.0 vs 0.1.1.9, FCI is also dozens of times faster.

MarkDana added 11 commits December 18, 2021 12:08

added larger discrete datasets from bnlearn

0cc3609

added entry to citest and cache in class CausalGraph

a1591a1

added citest_cache to fci

93a6655

added verbose option to pc

5e8e5cb

use matrix parallelization for chi2 and g2 test

f849680

added cache to fas (for fci)

f5c7129

added verbose option for skeleton discovery (passed from pc))

8d63f07

added equivalent_citest test and discrete speed test

f4c2e18

added #example comment; minor

1d8f9e5

added test for discrete and large continuous data

73b115f

added a clearer explanation on a bug fix

151cac7

kunwuz merged commit 1e6aa46 into py-why:main Dec 19, 2021

MarkDana mentioned this pull request Jun 22, 2022

Fix cit.chisq big memory bug #37

Merged

MarkDana mentioned this pull request Jul 26, 2022

Added a local cache resume example in TestPC #64

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pc fci fast haoyue #6

Pc fci fast haoyue #6

Uh oh!

MarkDana commented Dec 18, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pc fci fast haoyue #6

Pc fci fast haoyue #6

Uh oh!

Conversation

MarkDana commented Dec 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

optimize speed for PC and FCI

change overview

result overview

Also checkout

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarkDana commented Dec 18, 2021 •

edited

Loading