Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix examples/Discovery_LUCAS.ipynb #15

Closed
jayavanth opened this issue May 1, 2019 · 7 comments
Closed

Fix examples/Discovery_LUCAS.ipynb #15

jayavanth opened this issue May 1, 2019 · 7 comments

Comments

@jayavanth
Copy link

Cgnn.predict(data, graph=ugraph, nb_runs=16, train_epochs=1500, test_epochs=1000) CGNN predict function doesn't accept nb_runs, train_epochs and test_epochs anymore. It has to be called like this:

Cgnn = CGNN(nb_runs=16, train_epochs=1500, test_epochs=1000)
Cgnn.predict(data, graph=ugraph)

@diviyank
Copy link
Collaborator

diviyank commented May 7, 2019

Yes, I should fix the example ! Thanks for the feedback !

@gkericks
Copy link

I'm not sure this is related, but I am looking for an explanation of how the NUM_LUCAS.csv file was generated and can't find it. Do you have that listed somewhere?

@diviyank
Copy link
Collaborator

Hi,
Actually, NUM_LUCAS.csv was generated using the cdt.generators.AcyclicGraphGenerator class, by feeding it a ground truth graph. But yes, it doesn't make much sense to call it LUCAS, since it doen't have much to do with the true dataset except for the variables names and the graph structure, I should change that. I will add it on the next version
Best.
Diviyan

@gkericks
Copy link

@Diviyan-Kalainathan Thanks for the quick reply!

Okay so from that I see now that the example is about recreating the answer graph just using examples sampled from it. The original LUCAS data is all binary and this new dataset assumes guassians at every node (the sampled data looks standardized). That being said, what constraints on the data input are there for effectively using your library?

I have a causal problem I am trying to solve and like most real-world data, the input is of mixed types. Some numerical, some categorical. Would you still recommend your library for exploring the dependencies or should I be looking for a different technique? I apologize in advance if that is already covered in your README and I just missed it.

@diviyank
Copy link
Collaborator

Hi,
There are no constraints on the data input for the library. Instead, it depends on the algorithms from the package. For example, SAM and CGNN accept only numerical data, whereas PC can accept categorical data. For mixed types, I don't know of an algorithm or statistical test that is quite efficient ; I think your best bet would be to discretize your data and use an algorithm/test for categorical data (PC/ GES ).

Best regards,
Diviyan

diviyank added a commit that referenced this issue Jun 10, 2019
diviyank added a commit that referenced this issue Jun 11, 2019
@diviyank
Copy link
Collaborator

It should be fixed, sorry for the delay, but we really wanted to fix all the issues on dataset management before fixing this issue.
Please keep me updated.
Best,
Diiviyan

@diviyank
Copy link
Collaborator

diviyank commented Jul 8, 2019

I will be closing this issue, as it should be solved. Don't hesitate to reopen it if the bug still persists in the latest version.
Best,
Diviyan

@diviyank diviyank closed this as completed Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants