-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Influencing node coordinates to achieve desired horizontal ordering #60
Comments
Ah, I've figured out how to set the rank accessor. New code: function layering(dag) {
layeringSimplex()(dag)
dag.descendants().forEach(node => {
node.rank = node.layer;
node.layer = undefined;
if (!node.data.isUnion) {
node.rank += 1;
}
});
layeringSimplex().rank(n => n.rank)(dag)
} Now the outcome is |
I think I'm asking for the impossible - something cannot both be a descendant of node on level X and drawn on level X. |
I don't think you're trying the impossible, I just think you're going about it in the wrong way. I think it starts with the way you're trying to get around the relative nature of rank by first trying to layer without it. That simply won't work. Let's think about this from first principles. First, I see two things you could try an guarantee.
I don't think all of those are possible, so let's just stick with only 1. The second thing to deal with is how you draw the lines connecting them. The sugiyama layout is designed to compute how to draw lines so that they curve around other nodes. It seems like you probably want this. If so the solution to that is to create your own dummy node that is the descendant of parents with children, and all children are a descendant of that node. I think this mostly complicates things, so let's ignore it going forward. Given that we're ignoring the edge drawing, and we want parents to have the same rank, we have a problem in that how do we know which ranks to make each and which to not. This can't be solved with the current rank accessor. The easiest solution would be to copy the simplex source code, and remove d3-dag/src/sugiyama/layering/simplex.ts Lines 114 to 117 in b1b2107
That way only nodes with the same rank will have the same layer, but no other constraints will be enforced. I've been thinking about how to support this use case more generally. It seems like I could do something like the rank accessor, but it's just an equality class accessor. That is probably worth adding, but I'm generally curious on your overall thoughts as to what work / what you're looking for. Hope this helps! |
Hi Erik, thanks so much for your thoughtful reply. d3-dag is an awesome library, but beyond that your willingness to put time into support really makes a huge difference. When I posted my questions on Friday I was on the clock for a client, and hence perhaps not explaining everything very well. Now that it's the weekend, and I'm doing this for my own edification, I will take more time to explain everything. I would also be very happy to write an example/tutorial after we solve this. If that sounds useful, let me know the best format/platform for that. I've found it relatively easy to draw generations on the same level by just ensuring that my data is very consistent. The rank method would work well for that if I needed an extra guarantee to overcome data inconsistencies. I was actually trying to use rank as a hack to solve a different problem related to horizontal ordering of nodes. Here's a subsection of a tree that I've outputted using your library: Note that Judith is a descendant of Anne and William. Judith married Thomas, and had three children. Thomas also married Margaret and had a child. This relationship is made much harder to see clearly by the fact that Judith's brother Hamnet is drawn between her and her spouse Thomas. I have been looking for ways to influence the horizontal placement of nodes to avoid situations like this. My goal with rank was to draw the union node (the black dot) on the same level as the two spouses. I (perhaps naively) thought this would work as the decrossing algorithm would be much more likely to place spouses alongside each other. So the key thing here is to take a fact from the the data ("X and Y produced children together") and use it to influence the placement of nodes adjacent to each other. I hope this makes sense, and thanks again for taking the time to reply. |
Yeah, that makes sense. I was thinking about doing this, and had a solution to the equality constraint that I just pushed, but the problem I ran into is how to organize horizontal nodes. The problem is in the decrossing step, where we essentially want to add an extra step that says, "for the purposes of decrossing these two nodes should be next to to each other." The problem I ran into (and I'm curious as to your thoughts on) is exactly how to do this succinctly. The way the commit I just pushed works is that everything can have an optional "equality group" that says that everything in the group needs to have the same layer. That concept doesn't seem to work here in that you need to be able to say: margaret and thomas are a pair, as are thomas an judith. Things could have "adjacency sets" e.g. multiple groups they belong to, but adjacency sets seem complicated, and it's not clear how sets that have more than two members should be handled, or nodes that belong to more than two sets, as it wouldn't be possible to actually constrain them. If I'm missing something obvious, or you have ideas for an appropriate api, throw them out. Then there's the problem of implementation. This depends somewhat on the api.
As for a write up, if this all works out, probably just making your own observable with dummy data would be great, but I've also just linked to other projects that serve as an example. |
@gareth-lloyd I'm still not sure of a great to indicate that two nodes should be adjacent, however, it seems like a generally good property in the decrossing phase would be to minimize the index distance between nodes with a common parent or child. I think this may be possible with decross-opt / decross-twolayer-opt, but it should generally be what happens with twolayer-mean and twolayer-median. In the past this was a problem because the twolayer methods only did one downward pass, but the master branch is updated so it does a down and and uppass by default, and the number of passes can be tweaked. I think in your case, this should probably be sufficient to produce a pleasing layout. |
Thanks so much for this. I will try it out and let you know how it goes. |
@gareth-lloyd also, I just figured out how to add the minimization to the two "opt" methods. It makes the optimization more expensive, so you have to enable it with e.g. Note unfortunately, none of this is documented yet, because I haven't pushed the update, so feel free to ping here if things aren't clear. |
I've been trying to experiment with these changes, but I keep hitting a bug that seems to come up from the
From the stack trace, I can see it's happening during layering simplex. Here's what I did:
It seems to fail for me on any data, but here's a concrete example.
|
wow! thanks for taking the time to post this. I switched master to esbuild, but don't really have tests set up for the bundle. It seems that rollup is setting window to an empty object, and that was allowing the wrapper around |
Awesome, thanks. I'll try again. |
I'm still loving this library and having generally great results. However, I do keep hitting trees that crash the decrossing step. It doesn't seem to be purely related to the size of the graph - I can quite easily find graphs which are acceptable on the "medium" setting for which the layout algorithm will never complete. I've sketched a minimal reproduction of this issue. When you run Here's an image of this algo's output: Running either of the other two functions will cause the process to use maximum processor, and it has never returned a result (I've waited max 10 minutes). import {
coordQuad,
dagConnect,
sugiyama,
layeringSimplex,
decrossTwoLayer,
twolayerOpt,
decrossOpt,
} from "d3-dag";
function setUpGraph() {
const visibleNodeDirectory = new Map();
VISIBLE_NODES.forEach((i) => visibleNodeDirectory.set(i.id, i));
const edges = [];
for (let node of visibleNodeDirectory.values()) {
for (let id of node.children) {
// Add the relationship if the node is in the visible set.
if (visibleNodeDirectory.has(id)) {
edges.push([node.id, id]);
}
}
}
return dagConnect()(edges);
}
/*
* LAYS OUT SUCCESSFULLY
*/
export function thisWillWork() {
const dag = setUpGraph();
let treeFn = sugiyama()
.layering(layeringSimplex())
.nodeSize(() => [100, 100])
.decross(decrossTwoLayer().order(twolayerOpt().large("medium")))
.coord(coordQuad());
console.log("Attempting layout");
treeFn(dag);
console.log("Layout succeeded")
return dag;
}
/*
* NEVER COMPLETES
* I have attempted this on the 0.7.0 release and the latest code on master
*/
export function thisWillNotWorkWithV070() {
const dag = setUpGraph();
let treeFn = sugiyama()
.layering(layeringSimplex())
.nodeSize(() => [100, 100])
.decross(decrossOpt().large("medium"))
.coord(coordQuad());
console.log("Attempting layout");
treeFn(dag);
console.log("This will never be logged")
return dag;
}
/*
* NEVER COMPLETES
* I have attempted this on the latest code on master
*/
export function thisWillNotWorkWithLatestCommits() {
const dag = setUpGraph();
let treeFn = sugiyama()
.layering(layeringSimplex())
.nodeSize(() => [100, 100])
.decross(decrossTwoLayer().order(twolayerOpt().dist(true).large("medium")))
.coord(coordQuad());
console.log("Attempting layout");
treeFn(dag);
console.log("This will never be logged")
return dag;
}
const VISIBLE_NODES = [
{ id: "@P9@", children: ["@P10@-@P9@"] },
{ id: "@P10@-@P9@", children: ["@P18@", "@P17@", "@P3@"] },
{ id: "@P10@", children: ["@P10@-@P9@"] },
{ id: "@P18@", children: ["@P18@-@P58@"] },
{ id: "@P18@-@P58@", children: ["@P59@", "@P57@", "@P56@", "@P55@"] },
{ id: "@P58@", children: ["@P18@-@P58@"] },
{ id: "@P59@", children: ["@P59@-@P73@"] },
{ id: "@P57@", children: ["@P57@-@P79@", "@P57@-@P80@"] },
{ id: "@P56@", children: ["@P56@-@P84@"] },
{ id: "@P55@", children: ["@P55@-@P88@"] },
{ id: "@P17@", children: ["@P17@-@P54@"] },
{ id: "@P17@-@P54@", children: ["@P53@", "@P52@"] },
{ id: "@P54@", children: ["@P17@-@P54@"] },
{ id: "@P53@", children: ["@P53@-@P65@"] },
{ id: "@P52@", children: ["@P52@-@P69@"] },
{ id: "@P3@", children: ["@P2@-@P3@"] },
{ id: "@P2@-@P3@", children: ["@P1@", "@P11@"] },
{ id: "@P2@", children: ["@P2@-@P3@"] },
{ id: "@P1@", children: ["@P1@-@P6@"] },
{ id: "@P11@", children: ["@P11@-@P14@"] },
{
id: "@P19@-@P20@",
children: ["@P588@", "@P9@", "@P586@", "@P587@", "@P590@", "@P589@"],
},
{ id: "@P19@", children: ["@P19@-@P20@"] },
{ id: "@P20@", children: ["@P19@-@P20@"] },
{ id: "@P588@", children: [] },
{ id: "@P586@", children: [] },
{ id: "@P587@", children: ["@P587@-@P631@"] },
{ id: "@P587@-@P631@", children: [] },
{ id: "@P631@", children: ["@P587@-@P631@"] },
{ id: "@P590@", children: ["@P590@-@P604@", "@P590@-@P605@"] },
{ id: "@P590@-@P604@", children: [] },
{ id: "@P604@", children: ["@P590@-@P604@"] },
{ id: "@P590@-@P605@", children: ["@P607@", "@P606@"] },
{
id: "@P605@",
children: ["@P590@-@P605@", "@P605@-@P765@", "@P605@-@P764@"],
},
{ id: "@P607@", children: ["@P607@-@P768@"] },
{ id: "@P606@", children: ["@P606@-@P772@"] },
{ id: "@P589@", children: ["@P589@-@P608@"] },
{ id: "@P589@-@P608@", children: [] },
{ id: "@P608@", children: ["@P589@-@P608@"] },
{
id: "@P591@-@P592@",
children: ["@P19@", "@P598@", "@P597@", "@P596@", "@P595@"],
},
{ id: "@P591@", children: ["@P591@-@P592@"] },
{ id: "@P592@", children: ["@P591@-@P592@"] },
{ id: "@P598@", children: [] },
{ id: "@P597@", children: ["@P597@-@P599@"] },
{ id: "@P597@-@P599@", children: ["@P601@", "@P600@"] },
{ id: "@P599@", children: ["@P597@-@P599@"] },
{ id: "@P601@", children: [] },
{ id: "@P600@", children: [] },
{ id: "@P596@", children: [] },
{ id: "@P595@", children: [] },
{
id: "@P584@-@P585@",
children: [
"@P609@",
"@P613@",
"@P20@",
"@P612@",
"@P611@",
"@P614@",
"@P610@",
],
},
{ id: "@P584@", children: ["@P584@-@P585@"] },
{ id: "@P585@", children: ["@P584@-@P585@"] },
{ id: "@P609@", children: ["@P609@-@P619@"] },
{ id: "@P609@-@P619@", children: [] },
{ id: "@P619@", children: ["@P609@-@P619@"] },
{ id: "@P613@", children: [] },
{ id: "@P612@", children: ["@P612@-@P620@"] },
{
id: "@P612@-@P620@",
children: ["@P625@", "@P624@", "@P623@", "@P622@", "@P621@", "@P626@"],
},
{ id: "@P620@", children: ["@P612@-@P620@"] },
{ id: "@P625@", children: [] },
{ id: "@P624@", children: [] },
{ id: "@P623@", children: [] },
{ id: "@P622@", children: ["@P622@-@P789@"] },
{ id: "@P621@", children: [] },
{ id: "@P626@", children: [] },
{ id: "@P611@", children: ["@P611@-@P627@"] },
{ id: "@P611@-@P627@", children: ["@P628@", "@P629@", "@P630@"] },
{ id: "@P627@", children: ["@P611@-@P627@"] },
{ id: "@P628@", children: ["@P628@-@P756@"] },
{ id: "@P629@", children: ["@P629@-@P754@"] },
{ id: "@P630@", children: ["@P630@-@P758@"] },
{ id: "@P614@", children: [] },
{ id: "@P610@", children: [] },
]; |
If I understand what you're saying correctly. This works for decross two-layer, but you want it to work for decross opt? If that's what you're asking, I think you're sol. Optimal decross minimization is an np-complete problem. The way it's implemented is currently extra bad (I think I talk about this more in another issue), but essentially there's no way around it. If you want to try an implement a faster optimal decrossing, I'm definitely open to it, but it's never going to be a scalable solution. If this is the route you want to go,almost all of this is implemented as a binary optimization, so a binary optimization solver would probably be better. An arbitrary non-convex optimizer would allow you to model the space more compactly, so maybe that's a better approach. One alternative is to improve the two-layer decrossing. One technique from the original paper talks about just doing greedy swapping after an initial good layout (e.g. median). I think this probably makes sense, and have considered implementing it, but I'm not sure how to make it reasonably fast or implement it ellegantly, so I also haven't bothered. As a final note, the medium and large flags are just there to allow you to shoot yourself in the foot if you want to. "small" should finish, "medium" should run but not finish, "large" should just crash. At least roughly, they are very rough estimates of how long it might take to finish. |
Thanks for your reply. It's been a long time since my CS degree, and I never went that deep on these concepts in the first place. I have some research to do before I can properly process all this! Let me tell you what I'm seeing, and try to relate it to your response as best I can. I'm building a family tree explorer. You start with a default view and you can choose to expand roots and branches to look around the whole data set. Say I start with 40 nodes in the view. By expanding a couple's offspring, I might add 4 nodes here, and by expanding an individual's parents I might add a couple more. Obviously the algorithm takes longer to lay out more nodes, but I've seen it handle 70-80 nodes in a few seconds or less. At some (unpredictable) point, I will add some relationships into the layout that cause the algorithm never to finish. Even if it had been laying out the tree in <1s before, adding just a few more nodes can suddenly switch it to never completing. I've uncovered relatively small data sets (<50 nodes) that cause this behaviour. I guess I was analysing this situation with reference to more conventional algorithms that run over their input in polynomial time. If such an algorithm returned a result when n=50 and never returned when n=52, I would look for bugs such as infinite loops caused by certain configurations of input. If I understand correctly, you're telling me that there's no such bug. Instead, the results of the "opt" layouts is actually non-deterministic, and we can't know ahead of time if a certain input has any solution. And if it does not, nothing will ever be produced by the algorithm. Thanks for explaining the purpose of the "medium" and "large" options. I will do more research and perhaps try to come up with a new algorithm that works on the limited subset of graphs representing valid family trees. EDIT: if the above is a correct reading, please consider this issue closed. |
Yeah, that's basically correct. It's not that no solution exists, but it's that computing it can take exponential time. It's not clear exactly what might cause it, but yes it is totally expected that somehow adding two nodes will cause it to "never" complete. The medium hook was meant to catch you before you get that far. An obvious alternate would be to try both with a timeout, and if opt takes too long to go with something simpler. I don't want to discourage you looking, but if a family tree can basically be any dag, then I don't think you'll find anything simpler. Two questions out of curiosity:
|
Thanks to a hint in a previous issue from @erikbrinkman, I've tried the following:
Running this has not altered the layout at all - as if the ranks are invalid and being ignored.
The text was updated successfully, but these errors were encountered: