-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NaN problem while normailze the data #55
Changes from 7 commits
2ee93c6
7f40937
67f6ca7
ffe75f9
3603458
c58c816
709999d
04e2ba5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
from math import log, sqrt | ||
from collections.abc import Iterable | ||
|
||
import numpy as np | ||
from scipy.stats import chi2, norm | ||
|
@@ -59,9 +60,18 @@ def _unique(column): | |
} | ||
|
||
def kci(self, X, Y, condition_set): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently X and Y here can be int / Iterable? This doesn't sound like a good design --- if possible, we better make it every variable type-checked. Why not enforce X and Y here to be a list of Int? This is the most general one right? So we don't need those lines just to do type-checking --- it's usually a good idea to make code concise. But I think this can be changed in your later PR, @MarkDana There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes let me handle this in the later pr, or in the new KCI subclass. |
||
if type(X) == int: | ||
X = [X] | ||
Comment on lines
+63
to
+64
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better with an And then |
||
elif type(X) != list: | ||
Y = list(X) | ||
if type(Y) == int: | ||
Y = [Y] | ||
elif type(Y) != list: | ||
Y = list(Y) | ||
|
||
if len(condition_set) == 0: | ||
return self.kci_ui.compute_pvalue(self.data[:, [X]], self.data[:, [Y]])[0] | ||
return self.kci_ci.compute_pvalue(self.data[:, [X]], self.data[:, [Y]], self.data[:, list(condition_set)])[0] | ||
return self.kci_ui.compute_pvalue(self.data[:, X], self.data[:, Y])[0] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. great, thanks! QQ:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tofuwen There is no special design in the old code for supporting only one variable. I just aligned with other tests (used in constraint-based methods), and forgot that KCI can take in multivariate unconditional variables. I was just about to fix it. So thanks so much for your work @cogito233 ! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we'll need to handle the case for iterable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also for the cache key in https://github.com/cmu-phil/causal-learn/blob/ffe75f95c4003fa7e9d7d5f3bbec4ace90ed3a41/causallearn/utils/cit.py#L339, we'll need to handle There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I have no ideas about how to change the assert in the case of Maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wow cool. You're so productive! Thanks for all these! I see your point. This line was intended for correctness checks in constraint-based methods. As for the citest itself, is How about we force There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
return self.kci_ci.compute_pvalue(self.data[:, X], self.data[:, Y], self.data[:, list(condition_set)])[0] | ||
|
||
def fisherz(self, X, Y, condition_set): | ||
""" | ||
|
@@ -326,7 +336,23 @@ def __call__(self, X, Y, condition_set=None, *args): | |
else: | ||
assert len(args) == 2, "Arguments other than skel and prt_m are provided for mc_fisherz." | ||
if condition_set is None: condition_set = tuple() | ||
assert X not in condition_set and Y not in condition_set, "X, Y cannot be in condition_set." | ||
|
||
if type(X) == int and type(Y) == int: | ||
assert X not in condition_set and Y not in condition_set, "X, Y cannot be in condition_set." | ||
else: | ||
if isinstance(X, Iterable): | ||
assert len(set(condition_set).intersection(X)) == 0, "X cannot be in condition_set." | ||
elif isinstance(X, int): | ||
assert X not in condition_set, "X cannot be in condition_set." | ||
else: | ||
raise Exception("Undefined type of X, X should be int or Iterable") | ||
if isinstance(Y, Iterable): | ||
assert len(set(condition_set).intersection(Y)) == 0, "Y cannot be in condition_set." | ||
elif isinstance(Y, int): | ||
assert Y not in condition_set, "Y cannot be in condition_set." | ||
else: | ||
raise Exception("Undefined type of Y, Y should be int or Iterable") | ||
|
||
i, j = (X, Y) if (X < Y) else (Y, X) | ||
cache_key = (i, j, frozenset(condition_set)) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.var(data_x)
does not support multi-dimdata_x
with some dims being constant (while others not), or all dims are constant but with different values:var=0
is not enough:stats.zscore
?This operation is safe, since we would (expect to) check any raw
data_x
(before kci) so it does not contain nan values, the nan returned in the normalized array is only due to constant (not original nan values).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ensure this in code, i.e. ensure raw data_x doesn't contain nan value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I'll do this soon after this pr is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for your awesome contributions, @cogito233 , @MarkDana and @tofuwen ! @cogito233 Please let us know if you think the current PR is ready to go so we will solve the remaining issues in a new PR. Or we could include these in this PR if you would like to. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is already fixed. I guess maybe we can merge now~
Many thanks to all of your(@kunwuz, @MarkDana, @tofuwen ) help~ Since I am at the first time contributing to a community codebase and lack experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much! @tofuwen @MarkDana When you have time, could you please make a final check on the current PR and let me know when it's ready to be merged? Many thanks!