-
Notifications
You must be signed in to change notification settings - Fork 231
Added some unit tests for GIN #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if indep_test_method == 'kci': | ||
| indep_test = KCI_UInd() | ||
| else: | ||
| raise NotImplementedError((f"Independent test method {indep_test_method} is not implemented.")) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems previously we supported other test method?
Also, if we only support KCI, please indicate this clearly in your error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires kernel-based independence tests such as KCI, HSIC. But they are called differently, and I wonder if I need to write a function to eliminate their differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so will the old code support HSIC? Basically my point is that, we should never regress (i.e. previously supported functions are not supported anymore).
if the old code base only supports KCI, then it's fine
| _, _, v = np.linalg.svd(cov_m) | ||
| omega = v.T[:, -1] | ||
| return np.dot(omega, data[:, X].T) | ||
| return np.dot(data[:, X], omega.T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we change this? will it return previous result transpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce the overhead of data matrix transposition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this change the function return result?
The current return value is the old value transpose, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, omega seems to be a 1-dimensional array and transposing the matrix is an invalid operation. The output result is unchanged. np.dot operation on N-dimensional array with 1-dimensional array is in accordance with the algorithm logic. Please refer to numpy.dot
| L1 = np.random.uniform(-1, 1, size=sample_size) | ||
| L2 = np.random.uniform(1.2, 1.8) * L1 + np.random.uniform(-1, 1, size=sample_size) | ||
| X1 = np.random.uniform(1.2, 1.8) * L1 + 0.2 * np.random.uniform(-1, 1, size=sample_size) | ||
| X2 = np.random.uniform(1.2, 1.8) * L1 + 0.2 * np.random.uniform(-1, 1, size=sample_size) | ||
| X3 = np.random.uniform(1.2, 1.8) * L2 + 0.2 * np.random.uniform(-1, 1, size=sample_size) | ||
| X4 = np.random.uniform(1.2, 1.8) * L2 + 0.2 * np.random.uniform(-1, 1, size=sample_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you change the data generation parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous data generation parameters were set according to the paper, but generating according to the paper does not seem to be fully identifiable. So I changed the data generation parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the old data not fully identifiable? We can prove it not identifiable or just our algorithm fail to do so? Sorry causal n00b lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be an independence test error caused by the data.
tests/TestGIN.py
Outdated
| data = (data - np.mean(data, axis=0)) / np.std(data, axis=0) | ||
| g, k = GIN(data) | ||
| print(g, k) | ||
| _, k = GIN(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please give better naming (instead of naming it "k")
Basically you should never use not-meaningful variable name like "k" (only exception is well-known i and j for loops)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, I will update it later. I see that 'k' is often used in papers to indicate causal order, e.g. LiNGAM, so I just used k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, makes sense haha. But it would still be good to name it clearly in our codebase (you can add comment to say this variable corresponds to k in paper)
tests/TestGIN.py
Outdated
| data = (data - np.mean(data, axis=0)) / np.std(data, axis=0) | ||
| g, k = GIN(data) | ||
| print(g, k) | ||
| _, k = GIN(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
tests/TestGIN.py
Outdated
| g, k = GIN(data) | ||
| print(g, k) | ||
| _, k = GIN(data) | ||
| k = [sorted(k_i) for k_i in k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously you use i, here you use k_i.
It would be better if you give consistent naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because 'i' means index, so change the i to k_i, I forgot to unify it when I changed it before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it --- please make them consistent
tofuwen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work!
Almost done --- we should do a small refactor to make code more concise.
Remember the principle: never copy and paste and re-use code whenever possible. :)
tests/TestGIN.py
Outdated
| ground_truth = [[0, 1], [2, 3]] | ||
| assert len(causal_order) == len(ground_truth) | ||
| for i in range(len(causal_order)): | ||
| assert np.isclose(causal_order[i], ground_truth[i]).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to use np.isclose() right?
causal_order must equal to ground_truth, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first one, I don't understand. For the second one, yes.
tests/TestGIN.py
Outdated
| for i in range(len(causal_order)): | ||
| assert np.isclose(causal_order[i], ground_truth[i]).all() | ||
|
|
||
| def test_case1_hsic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function looks almost exactly the same as last function.
Please consider re-use the code, instead of copy and paste (basically you should never copy and paste code)
You can refer to #59 to make your test clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion, I will refer to modify the test code.
tests/TestGIN.py
Outdated
| for i in range(len(causal_order)): | ||
| assert np.isclose(causal_order[i], ground_truth[i]).all() | ||
|
|
||
| def test_case2_kci(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for test case 2.
Basically you can test both kci and hsic in a single function, and the only difference seems to be GIN(data, indep_test_method) indep_test_method this parameter.
Check #59
tests/TestGIN.py
Outdated
| assert np.isclose(causal_order[i], ground_truth[i]).all() | ||
|
|
||
|
|
||
| def test_case3_kci(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
|
BTW, I really like the three tests you designed. I think that's great. :) Also a small nit: when you made later changes, you can update your description and test plan instead of commenting a new one --- people will generally read the first one instead of scroll down to check all the conversions. This makes this PR clearer for other people. :) |
|
@zhi-yi-huang do you mind addressing the comments? After that, I think we can merge this PR --- very close now |
|
Sorry, I missed the email from GitHub. |
tofuwen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work, it looks much better now!
Finally one small thing need to be addressed
tests/TestGIN.py
Outdated
| def validate_result(ground_truth, estimated_result): | ||
| assert len(ground_truth) == len(estimated_result) | ||
| for i in range(len(estimated_result)): | ||
| assert np.isclose(estimated_result[i], ground_truth[i]).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we use "isclose()" instead of "==" here?
With integer comparison, you should expect "==", right?
I think "isclose()" is meant to compare floating point
|
This is awesome! I think this PR is ready to be merged. :) cc @kunwuz |

Description
The GIN algorithm returns the graph and the causal order. In this algorithm, the causal order and the causal graph correspond to each other, so only the causal order in the result is asserted.
Updates
Test Plan