-
Notifications
You must be signed in to change notification settings - Fork 231
Rewrite CITests as a class && re-use covariance matrix for fisherz #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I have some general design comments about this PR, how about we find some time to discuss?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spend time reviewing cit.py --- I guess it's mainly copy and paste and we have cit tests already?
I think the PR is generally good to push --- only some nits comments.
Thanks the great work! It not only fix the old issues, but also significantly increased the code quality. :)
| self.data = None # store the data | ||
| self.test = None # store the name of the conditional independence test | ||
| self.corr_mat = None # store the correlation matrix of the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the attributes you deleted, did you make sure that no code referenced it anymore? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this by PyCharm -> Find Usages and there is no other usages in the project.
Is this a reliable way to find usages or do you have any other recommendations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. Maybe try it using other variables to see whether it did a great job? Normally these kind of things should rely on simple tests (if we have 100% test coverage hhh) && smart IDE.
| self.labels = {} | ||
| self.prt_m = {} # store the parents of missingness indicators | ||
| self.mvpc = False | ||
| self.cardinalities = None # only works when self.data is discrete, i.e. self.test is chisq or gsq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
|
|
||
|
|
||
| def cdnod(data: ndarray, c_indx: ndarray, alpha: float = 0.05, indep_test=fisherz, stable: bool = True, | ||
| def cdnod(data: ndarray, c_indx: ndarray, alpha: float = 0.05, indep_test: str = fisherz, stable: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
standard coding style here:
no spaces between = in function declaration
check examples in https://google.github.io/styleguide/pyguide.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Fixed.
| if mvpc: # missing value PC | ||
| if indep_test == fisherz: | ||
| indep_test = mv_fisherz | ||
| if indep_test == fisherz: indep_test = mv_fisherz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, should have two lines here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
√
tests/TestPC.py
Outdated
| import unittest | ||
| import hashlib | ||
| import numpy as np | ||
| np.random.seed(42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we have two tests here?
Also please do not fix random seed for whole file here.
How about we have one test with seed fixed, one test without seed fixed? And only fix seed in the fix-seed function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
√
|
Great, after fixing all the nits above, I think this PR is ready to go! |
Updated files:
cit.py: main change: integrate CITests functions to a classPC.py,FCI.py,CDNOD.py,Fas.py,SkeletonDiscovery.py: adjust accordingly for the new CIT class.GraphClass.py: remove unnecessary attributes (e.g., data, cache) in theCausalGraphclass.TestPC.py: there is randomness inmvpc, and thus we need two lines of test: w/ or wo/ randomness.TestFCI.py,TestMVPC_mv_fisherz_test.py: minor issues (e.g., file directory).Why this update is needed:
pcandfisherz, the correlation matrix over a same datasetdatais re-computed at every call onfisherz, which wastes lots of time. Thanks @kunwuz for raising this speedup!if indep_test == fisherzline and pass the pre-computedcorr_mattofisherzat every possible usage, code will be super lengthy and ugly:indep_testis used everywhere in constraint-based methods, and parameters passing will be complicated.data,cache, and:corr_matforfisherz,cardinalitiesforgsq/chisq, user-specified parameters forkci, etc.How to use the CIT class
though they may notice that now
causallearn.utils.cit.fisherzis a string"fisherz", instead of the function before.while before we write code as:
Test plan:
To ensure the new CIT class is correct and does not change the logic of the original code (as of commit 94d1536):
Speedup gain:
E.g., for
TestPC.TestPC.test_pc_with_fisher_z, it takes24.812sin a run of the original code. Now it takes3.748s.Todo:
cit.py: L113:np.corrnumerical issues withnanvalues.data,cachein theCausalGraphclass. While we still havetestattributes (aCITobject), to adapt tocg.ci_testmethods in the original codes. But, for simplicity, shouldtestbe part of a graph class, eventually?