TestPC: added simple simulation for discrete and kci test #63

MarkDana · 2022-07-19T14:00:09Z

Updated files:

tests/TestPC.py: test_pc_simulate_linear_nongaussian_with_kci and test_pc_simulate_discrete_with_chisq.
tests/utils_simulate_data.py: some utils for simulation. Now contains forward sampling (for discrete variables) and linear SEM (for continuous variables).

What this update can do:

Ensure PC's correctness on simple simulated data:
- Last time we had tests for PC over loaded dataset and over data simulated from specific graphs.
- For the latter one, we expect that PC will return totally correct CPDAG when the given graph is simple.
- So except for the linear Gaussian case (using fisherZ) last time, now we added discrete case (using chisq) and linear nonGaussian (exponential) case (using kci).
- The simple example graph used are all same: a 5 nodes 7 edges graph.
Some utils for data simulation from DAG [and TODOs, with lower priority]:
- simulate_linear_continuous_data: Simulate data by linear mixing of exogenous noises. TODOs:
  - Now noises are of the same distribution (and variance).
  - Only support noise components of Gaussian and exponential.
  - And the parameters to noise generators are fixed.
- simulate_discrete_data: Use pgmpy's forward sampling for discrete case. TODOs:
  - Now the max cardinalities product is fixed (for speed).
  - Alpha parameters for Dirichlet sampling is also fixed.
- Overall TODOs:
  - Check with existed simulations in Tetrad to see whether parameters here are reasonable.
  - Also incorporate nonlinear SEM.

Test plan:

python -m unittest TestPC.TestPC.test_pc_simulate_linear_nongaussian_with_kci 
# should pass, but in ~17mins (for 5 nodes, samplesize=2500)
# if you reduce the samplesize (e.g., to 2000) for speed consideration, the returned graph will not be totally correct.

# if you need to run forward sampling, please install pgmpy
pip install pgmpy
python -m unittest TestPC.TestPC.test_pc_simulate_discrete_with_chisq  # should pass within one second

tofuwen

Thanks @MarkDana for your awesome work!! :)

This is great!!

BTW, I think we can add some PC benchmarked results (for speed and scale) in our doc now, as PC is currently best tested method :) (thanks to you!)

cc @kunwuz to merge this PR

MarkDana · 2022-07-21T08:30:23Z

@tofuwen Thanks! I will work on the PC benchmarks soon.

And @kunwuz this is ready to go. Thx :))

MarkDana added 2 commits July 19, 2022 21:17

TestPC: added simple simulation for discrete and kci test

7154852

Created some utils for data simulation (other tests may use this))

032bfcc

tofuwen approved these changes Jul 21, 2022

View reviewed changes

kunwuz merged commit 970ac64 into py-why:main Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TestPC: added simple simulation for discrete and kci test #63

TestPC: added simple simulation for discrete and kci test #63

Uh oh!

MarkDana commented Jul 19, 2022

Uh oh!

tofuwen left a comment

Uh oh!

MarkDana commented Jul 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TestPC: added simple simulation for discrete and kci test #63

TestPC: added simple simulation for discrete and kci test #63

Uh oh!

Conversation

MarkDana commented Jul 19, 2022

Updated files:

What this update can do:

Test plan:

Uh oh!

tofuwen left a comment

Choose a reason for hiding this comment

Uh oh!

MarkDana commented Jul 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants