Skip to content

Conversation

@MarkDana
Copy link
Collaborator

Updated files:

  • tests/TestPC.py: test_pc_simulate_linear_nongaussian_with_kci and test_pc_simulate_discrete_with_chisq.
  • tests/utils_simulate_data.py: some utils for simulation. Now contains forward sampling (for discrete variables) and linear SEM (for continuous variables).

What this update can do:

  • Ensure PC's correctness on simple simulated data:
    • Last time we had tests for PC over loaded dataset and over data simulated from specific graphs.
    • For the latter one, we expect that PC will return totally correct CPDAG when the given graph is simple.
    • So except for the linear Gaussian case (using fisherZ) last time, now we added discrete case (using chisq) and linear nonGaussian (exponential) case (using kci).
    • The simple example graph used are all same: a 5 nodes 7 edges graph.
  • Some utils for data simulation from DAG [and TODOs, with lower priority]:
    • simulate_linear_continuous_data: Simulate data by linear mixing of exogenous noises. TODOs:
      • Now noises are of the same distribution (and variance).
      • Only support noise components of Gaussian and exponential.
      • And the parameters to noise generators are fixed.
    • simulate_discrete_data: Use pgmpy's forward sampling for discrete case. TODOs:
      • Now the max cardinalities product is fixed (for speed).
      • Alpha parameters for Dirichlet sampling is also fixed.
    • Overall TODOs:
      • Check with existed simulations in Tetrad to see whether parameters here are reasonable.
      • Also incorporate nonlinear SEM.

Test plan:

python -m unittest TestPC.TestPC.test_pc_simulate_linear_nongaussian_with_kci 
# should pass, but in ~17mins (for 5 nodes, samplesize=2500)
# if you reduce the samplesize (e.g., to 2000) for speed consideration, the returned graph will not be totally correct.

# if you need to run forward sampling, please install pgmpy
pip install pgmpy
python -m unittest TestPC.TestPC.test_pc_simulate_discrete_with_chisq  # should pass within one second

Copy link
Contributor

@tofuwen tofuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarkDana for your awesome work!! :)

This is great!!

BTW, I think we can add some PC benchmarked results (for speed and scale) in our doc now, as PC is currently best tested method :) (thanks to you!)

cc @kunwuz to merge this PR

@MarkDana
Copy link
Collaborator Author

@tofuwen Thanks! I will work on the PC benchmarks soon.

And @kunwuz this is ready to go. Thx :))

@kunwuz kunwuz merged commit 970ac64 into py-why:main Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants