Skip to content

Conversation

@zhi-yi-huang
Copy link
Collaborator

@zhi-yi-huang zhi-yi-huang commented Jul 18, 2022

Description

The GIN algorithm returns the graph and the causal order. In this algorithm, the causal order and the causal graph correspond to each other, so only the causal order in the result is asserted.

Updates

  • Fixed the independence test problem in GIN
  • Added assertion to the previous test cases
  • Removed the plot function in previous test cases for GIN
  • Added tests to test GIN algorithm using the hsic independence test.
  • Updated docstring
  • Removed default parameter of indep_test
  • Refactored the test code

Test Plan

python -m unittest tests.TestGIN # should pass

image

Comment on lines 44 to 48
if indep_test_method == 'kci':
indep_test = KCI_UInd()
else:
raise NotImplementedError((f"Independent test method {indep_test_method} is not implemented."))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems previously we supported other test method?

Also, if we only support KCI, please indicate this clearly in your error message.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires kernel-based independence tests such as KCI, HSIC. But they are called differently, and I wonder if I need to write a function to eliminate their differences.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so will the old code support HSIC? Basically my point is that, we should never regress (i.e. previously supported functions are not supported anymore).

if the old code base only supports KCI, then it's fine

_, _, v = np.linalg.svd(cov_m)
omega = v.T[:, -1]
return np.dot(omega, data[:, X].T)
return np.dot(data[:, X], omega.T)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we change this? will it return previous result transpose?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce the overhead of data matrix transposition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this change the function return result?

The current return value is the old value transpose, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, omega seems to be a 1-dimensional array and transposing the matrix is an invalid operation. The output result is unchanged. np.dot operation on N-dimensional array with 1-dimensional array is in accordance with the algorithm logic. Please refer to numpy.dot

Comment on lines +14 to +19
L1 = np.random.uniform(-1, 1, size=sample_size)
L2 = np.random.uniform(1.2, 1.8) * L1 + np.random.uniform(-1, 1, size=sample_size)
X1 = np.random.uniform(1.2, 1.8) * L1 + 0.2 * np.random.uniform(-1, 1, size=sample_size)
X2 = np.random.uniform(1.2, 1.8) * L1 + 0.2 * np.random.uniform(-1, 1, size=sample_size)
X3 = np.random.uniform(1.2, 1.8) * L2 + 0.2 * np.random.uniform(-1, 1, size=sample_size)
X4 = np.random.uniform(1.2, 1.8) * L2 + 0.2 * np.random.uniform(-1, 1, size=sample_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you change the data generation parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous data generation parameters were set according to the paper, but generating according to the paper does not seem to be fully identifiable. So I changed the data generation parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the old data not fully identifiable? We can prove it not identifiable or just our algorithm fail to do so? Sorry causal n00b lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be an independence test error caused by the data.

tests/TestGIN.py Outdated
data = (data - np.mean(data, axis=0)) / np.std(data, axis=0)
g, k = GIN(data)
print(g, k)
_, k = GIN(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please give better naming (instead of naming it "k")

Basically you should never use not-meaningful variable name like "k" (only exception is well-known i and j for loops)

Copy link
Collaborator Author

@zhi-yi-huang zhi-yi-huang Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I will update it later. I see that 'k' is often used in papers to indicate causal order, e.g. LiNGAM, so I just used k.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, makes sense haha. But it would still be good to name it clearly in our codebase (you can add comment to say this variable corresponds to k in paper)

tests/TestGIN.py Outdated
data = (data - np.mean(data, axis=0)) / np.std(data, axis=0)
g, k = GIN(data)
print(g, k)
_, k = GIN(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

tests/TestGIN.py Outdated
g, k = GIN(data)
print(g, k)
_, k = GIN(data)
k = [sorted(k_i) for k_i in k]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously you use i, here you use k_i.

It would be better if you give consistent naming

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because 'i' means index, so change the i to k_i, I forgot to unify it when I changed it before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it --- please make them consistent

@zhi-yi-huang
Copy link
Collaborator Author

Updates

  • Added tests to test GIN algorithm using the hsic independence test.
  • Updated docstring
  • Removed default parameter of indep_test
  • Fixed tests

Test Plan

python -m unittest tests.TestGIN # should pass

image

Copy link
Contributor

@tofuwen tofuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work!

Almost done --- we should do a small refactor to make code more concise.
Remember the principle: never copy and paste and re-use code whenever possible. :)

tests/TestGIN.py Outdated
ground_truth = [[0, 1], [2, 3]]
assert len(causal_order) == len(ground_truth)
for i in range(len(causal_order)):
assert np.isclose(causal_order[i], ground_truth[i]).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to use np.isclose() right?

causal_order must equal to ground_truth, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first one, I don't understand. For the second one, yes.

tests/TestGIN.py Outdated
for i in range(len(causal_order)):
assert np.isclose(causal_order[i], ground_truth[i]).all()

def test_case1_hsic(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function looks almost exactly the same as last function.

Please consider re-use the code, instead of copy and paste (basically you should never copy and paste code)

You can refer to #59 to make your test clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion, I will refer to modify the test code.

tests/TestGIN.py Outdated
for i in range(len(causal_order)):
assert np.isclose(causal_order[i], ground_truth[i]).all()

def test_case2_kci(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for test case 2.

Basically you can test both kci and hsic in a single function, and the only difference seems to be GIN(data, indep_test_method) indep_test_method this parameter.

Check #59

tests/TestGIN.py Outdated
assert np.isclose(causal_order[i], ground_truth[i]).all()


def test_case3_kci(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@tofuwen
Copy link
Contributor

tofuwen commented Jul 21, 2022

BTW, I really like the three tests you designed. I think that's great. :)

Also a small nit: when you made later changes, you can update your description and test plan instead of commenting a new one --- people will generally read the first one instead of scroll down to check all the conversions. This makes this PR clearer for other people. :)

@tofuwen
Copy link
Contributor

tofuwen commented Jul 22, 2022

@zhi-yi-huang do you mind addressing the comments? After that, I think we can merge this PR --- very close now

@zhi-yi-huang
Copy link
Collaborator Author

Sorry, I missed the email from GitHub.

Copy link
Contributor

@tofuwen tofuwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work, it looks much better now!

Finally one small thing need to be addressed

tests/TestGIN.py Outdated
def validate_result(ground_truth, estimated_result):
assert len(ground_truth) == len(estimated_result)
for i in range(len(estimated_result)):
assert np.isclose(estimated_result[i], ground_truth[i]).all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use "isclose()" instead of "==" here?

With integer comparison, you should expect "==", right?

I think "isclose()" is meant to compare floating point

@tofuwen
Copy link
Contributor

tofuwen commented Jul 25, 2022

This is awesome! I think this PR is ready to be merged. :)

cc @kunwuz

@kunwuz kunwuz merged commit b7bd990 into py-why:main Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants