Skip to content

Conversation

@tofuwen
Copy link
Contributor

@tofuwen tofuwen commented Jun 28, 2022

Description

This PR adds node_names parameters to PC and change the underlying functions accordingly. Now users can specify node_names when running PC.

This PR is intended to fix the following issue that raised by user: #38

Test Plan

Run the following code to make sure we get expected results.

data_path = "tests/TestData/data_linear_10.txt"
data = np.loadtxt(data_path, skiprows=1)  # Import the file at data_path as data
names = [f"tofu{i+1}" for i in range(data.shape[1])]
cg = pc(data, 0.05, fisherz, False, 0, -1, node_names=names)  # Run PC and obtain the estimated graph (CausalGraph object)
cg.draw_pydot_graph()

Screen Shot 2022-06-28 at 2 54 13 PM

Copy link
Collaborator

@MarkDana MarkDana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yewen! This is looking good. I ran the test example and it is correct. But I have some comments:

Now node_names parameter is added to the functions pc => mvpc_alg, pc_alg => skeleton_discovery, in order to pass it to the CausalGraph instance created in skeleton_discovery. However, this seems to be over coupling - node_names is redundant for functions other than pc.

Is it possible that we add a function in CausalGraph class that changes its nodes names, and here we only have the node_names parameter in pc. After the CausalGraph instance (with nodes names "X%d") is returned by mvpc_alg or pc_alg, we then change its names before it is returned to the user. This seems can be adapted to FCI, CD-NOD, and GES more easily, I guess?

Thanks Yewen for your work :))

@tofuwen
Copy link
Contributor Author

tofuwen commented Jun 28, 2022

Hi @MarkDana,

Thanks for your awesome review! :)

  1. IMHO, the CausalGraph constructor needs to take node_names as its input parameters --- it makes more sense to initial the nodes with their correct names. We shouldn't initial with wrong names and change it later.
  2. mvpc_alg and pc_alg are only used in pc, so I think adding a parameter int those functions is fine? Regarding skeleton_discovery, it is used in more places --- but any function that return a CausalGraph should return a correct graph (i.e. logically it should contain a CausalGraph that shouldn't need more post-process like changing node_names)
  3. change node_names in graph may have side effects --- if I remember correctly, sometimes some code will use names as node identifiers. Without good tests, I am afraid doing so will have some troubles. (And I still think initially with wrong names and change it later doesn't seem to logically consistent.) :)

Hope this makes sense and we can discuss more! :)

@MarkDana
Copy link
Collaborator

Hi Yewen, I agree with your opinion - it makes more sense to initial with correct names.

Then I think this pr is ready to be merged. @kunwuz Thanks :))

@kunwuz kunwuz merged commit 94d1536 into py-why:main Jun 28, 2022
@ztz1989
Copy link

ztz1989 commented Mar 24, 2023

Hi @tofuwen and @MarkDana,

Could you do the same for GES? I have a ground truth graph built with txt2generalgraph; the node names in the file do not follow the "X%d" format. So it is hard to use the estimation functions to compute the accuracy and SHD.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants