-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add required/excluded edge directions in the orientation phase of PC/FCI #46
Comments
Would this be the right place to include the knowledge tiers functionality that TETRAD currently has? E.g. I group variables into X groups, where variables in group i can only point to groups i+1 or greater. In some sense you could implement tiers by excluding a whole lot of edges but maybe a nicer API could be exposed to do this instead. This would also require directional excluded edges, so I would vote for doing so in the orientation phase. |
Also is there a technical reason that would prevent us implementing both kinds of prior knowledge? As a concrete example I've done some structure learning applied to cardiac data. When talking to physicians the kinds of knowledge I was able to solicit was grouping variables into tiers (e.g. pre, intra, and post operation), and also ruling out certain edge directions (due to medical domain knowledge). I don't think we ruled out adjacencies between any variables in that particular example, but in general I could see this happening if the domain knowledge was weaker. |
I think the idea of placing edges into knowledge tiers would be nice, and could potentially be represented graphically (in code) perhaps as a Cluster-DAG? Or perhaps,
You mean including directional excluded edge information in the orientation phase and not skeleton phase?
No. Not really imo. We can really come up with an arbitrary API for all sorts of background knowledge to stick into the learning algorithm, but the one thing to be cognizant of is that the resulting object may not be a CPDAG/PAG, and hence the resulting equivalence-class ID/estimation algorithms cannot be used on those objects off the shelf. But for the purposes of just structure learning, yeah sure. I think we just stuck in what was common so far in #30 |
Regarding your first point the list of tuples of sets would probably work in the first instance. You could have an Regarding your second point, yes. Regarding your third point, that's probably true in general but I could probably see instances where you do end up with a CPDAG/PAG anyways. Then I assume the downstream algorithms would have the responsibility of checking that their input is not a CPDAG/PAG and fail there? Happy to try implementing this issue. |
Yes you're right, I just opted for a graph structure to be very explicit about what edges are passed in. The nice thing with an explicit graph structure, is that you have access to the networkx API to do any type of error checking, or utility functions on the
True, but I suppose what we really want then is a utility function somewhere to check the validity of a CPDAG/PAG? However, I'm not aware of a solution to this problem. Do you need the dataset? Is it something you can check just based on the structure? Either way, I'm pretty keen on the ability for the
There are a few separate things being discussed here. Firstly is the inclusion/exclusion of edges at the orientation phase. How should the API look like? Should it be separate from the inclusion/exclusion at the skeleton phase, which is what is implemented right now? Would this be a kwarg inside the structure learning algorithms themselves? e.g.
or would this be applied somewhere else? Secondly, you are proposing knowledge tiers, which are just "sets" of included/excluded edges I think... This might be doable by just extending the inclusion/exclusion edges for skeleton/orientation phase into a new data structure. That is a list of tuples of sets? Then the following would be a "knowledge tier"
whereas this would just be the normal inclusion/exclusion of edges
Misc.Ideally I can merge in #30 so that way you can build on the PC algo implementation. |
Hmm good point. I'll keep that in mind.
Yeah I am not totally sure either. Maybe a good reason to leave this out in the first pass.
Is this the kind of paper you have in mind when talking about orientation and knowledge? Perkovic et al. 2017 I think this paper talks about background knowledge consisting of directed edges, so this seems like an implementation on the orientation phase. Is there a good paper I should read on orientation in the skeleton phase?
Is it insane to support certain edges being applied at different phases?
Yes, knowledge tiers would probably be implemented by just having a utility function that creates the right set of included/excluded edges. I should also review the TETRAD implementation to see how that is implemented. This issue seemed useful: Joe Ramsey gives the following (somewhat cryptic) note on PC's background knowledge:
Note that the pcalg implementation can fail if background knowledge is inconsistent with the CPDAG, but it seems like the TETRAD implementation doesn't. And if I understand correctly I think both would be considered orientation phase? |
Summary
Required and excluded edges are not taken into account at the orientation phase (i.e. setting edge directions) of constraint-based structure learning algos.
There are a few challenges though to tackle:
Misc. reference
"Required edges and excluded edges are built into the
LearnSkeleton
class rn meaning they do not account for directionality. However, these can also be taken into account at the orientation phase...There currently is a tradeoff that theory has yet to answer:
^ the nuanced challenge here is that the resulting object is NOT a CPDAG/PAG, and thus would not be expected to work with the downstream ID/Estimation algorithms that assume an input CPDAG/PAG. I'm leaning towards passing in a warning str to the graph object that when printed out, warns users to not use them in ID/estimation if they set prior edges that might break the conditional-independence statements.
Originally posted by @adam2392 in #30 (comment)"
The text was updated successfully, but these errors were encountered: