Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRAPL roadmap #18

Open
AlxndrMlk opened this issue Jun 28, 2023 · 5 comments
Open

GRAPL roadmap #18

AlxndrMlk opened this issue Jun 28, 2023 · 5 comments

Comments

@AlxndrMlk
Copy link
Contributor

Hi @max-little

Do we have a roadmap for the package?

I believe that a documentation page with some additional educational materials and literature references would be a great next step towards broadening the adoption of GRAPL in the community.

What are your thoughts on this?

Additionally, are there any algorithms that you have thought about adding to the next release?

Alex

@max-little
Copy link
Owner

Hi @AlxndrMlk

Thanks for raising this!

Yes, agreed, would be valuable to have some educational materials. I suppose, structural causal modelling and inference is not a "mainstream" topic in e.g. data science so I can imagine that the materials would have to cover some of the basics of the topic itself, with worked examples using GRAPL? Let me know what you think, and we can brainstorm a design.

Re: new algorithms for an updated release, there's a few obvious ones which have yet to be implemented, which would definitely be worth adding to the list:

  • m-connection/separation (mconn, closely following the function dconn in admg.py)
  • test for m-separation (ismsep, closely following the function isdsep in admg.py)
  • latent projection (latproj in admg.py)

For definitions, see Nested Markov Properties for Acyclic Directed Mixed Graphs. Let me know if you need advice on how to implement these.

Best
Max

@AlxndrMlk
Copy link
Contributor Author

Hey @max-little

Perhaps we could start with a documentation page, e.g. using Read The Docs (https://readthedocs.org/)

An example docs page: https://pandas-datareader.readthedocs.io/en/latest/

What are your thoughts?

@max-little
Copy link
Owner

Thanks for the suggestion @AlxndrMlk .

I think improved documentation is definitely valuable, but Read The Docs is far too heavyweight for such a small codebase (as yet). I know this system provides good templates but without a lot of work it just ends up being full of distracting boilerplate. I for one simply don't have the time to maintain something on that scale.

For now, I think a better approach is a few, carefully targeted notebooks. What is needed here, in my opinion, are short, brief tutorial notebooks on the basics of nonparametric causal inference (CI), showing how GRAPL can be used to learn about the subject by assisting reasoning through computational experiments.

Given that, the main issue would be designing the tutorials, how about the following scheme:

  1. DAGs - what are they, what are they for, GRAPL language representation
  2. Determining node relationships in DAGs (e.g. parent, child, ancestors, derived/extended relationships etc.)
  3. Nonparametric distributions - relationship to DAGs
  4. Manipulating DAGs (e.g.subgraphs, do-interventions)
  5. Basic causal inference in DAGs (e.g. admissable sets, how to find these with GRAPL)

At least to as a start. Call it "Chapter 1"?

Max

@AlxndrMlk
Copy link
Contributor Author

Hi @max-little

Thank you for sharing the ideas.

I understand your concern regarding readthedocs.

I am wondering what would bring the most value for the users.

I believe that the very basics of graphical models and causal inference are already covered elsewhere (e.g. Brady Neal's YouTube series or my book)

Describing GRAPL representations I think would be very helpful.

And perhaps describing the already implemented algorithms at least at high level.

What are your thoughts?

Alex

@max-little
Copy link
Owner

Hi @AlxndrMlk

It's worth emphasising that all the algorithms implemented here are described in the relevant literature, for instance, the Tian factorization for ADMGs is described in Tian's papers on the topic. So, the user could simply read these papers. However, it's possible that the descriptions in these papers are a fairly sophisticated technical level and could be made more accessible. In my opinion this would be best implemented as tutorial-style notebooks, as they are inherently interactive.

Best
Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants