Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Influencing node coordinates to achieve desired horizontal ordering #60

Closed
gareth-lloyd opened this issue May 7, 2021 · 15 comments
Closed

Comments

@gareth-lloyd
Copy link

Thanks to a hint in a previous issue from @erikbrinkman, I've tried the following:

  function layering(dag) {
    // Run layering once so that everything has a level.
    layeringSimplex()(dag);

    dag.descendants().forEach(node => {
      // Give everything a rank, but unions have a rank one less than their level.
      node.rank = node.layer;
      if (node.data.isUnion) {
        node.rank -= 1;
      }
    });
    // Re-run layering to allow ranks to take effect.
    layeringSimplex()(dag)
  }

  let treeFn = sugiyama()
    .nodeSize(nodeSize)
    .layering(layering)
    .decross(decrossOpt())
    .coord(coordQuad());
treeFn(dag);

Running this has not altered the layout at all - as if the ranks are invalid and being ignored.

@gareth-lloyd
Copy link
Author

Ah, I've figured out how to set the rank accessor.

New code:

  function layering(dag) {
    layeringSimplex()(dag)
    dag.descendants().forEach(node => {
      node.rank = node.layer;
      node.layer = undefined;
      if (!node.data.isUnion) {
        node.rank += 1;
      }
    });
    layeringSimplex().rank(n => n.rank)(dag)
  }

Now the outcome is Error: could not find a feasbile simplex layout, check that rank accessors are not ill-defined.

@gareth-lloyd
Copy link
Author

gareth-lloyd commented May 7, 2021

I think I'm asking for the impossible - something cannot both be a descendant of node on level X and drawn on level X.

@erikbrinkman
Copy link
Owner

I don't think you're trying the impossible, I just think you're going about it in the wrong way. I think it starts with the way you're trying to get around the relative nature of rank by first trying to layer without it. That simply won't work.

Let's think about this from first principles. First, I see two things you could try an guarantee.

  1. parents are both on the same level
  2. children are all on the same level
  3. children are exactly on level below parents

I don't think all of those are possible, so let's just stick with only 1.

The second thing to deal with is how you draw the lines connecting them. The sugiyama layout is designed to compute how to draw lines so that they curve around other nodes. It seems like you probably want this. If so the solution to that is to create your own dummy node that is the descendant of parents with children, and all children are a descendant of that node. I think this mostly complicates things, so let's ignore it going forward.

Given that we're ignoring the edge drawing, and we want parents to have the same rank, we have a problem in that how do we know which ranks to make each and which to not. This can't be solved with the current rank accessor. The easiest solution would be to copy the simplex source code, and remove

// inequality constraint
constraints[cons] = { min: 1 };
low[cons] = -1;
high[cons] = 1;

That way only nodes with the same rank will have the same layer, but no other constraints will be enforced.

I've been thinking about how to support this use case more generally. It seems like I could do something like the rank accessor, but it's just an equality class accessor. That is probably worth adding, but I'm generally curious on your overall thoughts as to what work / what you're looking for. Hope this helps!

@gareth-lloyd
Copy link
Author

Hi Erik, thanks so much for your thoughtful reply. d3-dag is an awesome library, but beyond that your willingness to put time into support really makes a huge difference.

When I posted my questions on Friday I was on the clock for a client, and hence perhaps not explaining everything very well. Now that it's the weekend, and I'm doing this for my own edification, I will take more time to explain everything. I would also be very happy to write an example/tutorial after we solve this. If that sounds useful, let me know the best format/platform for that.

I've found it relatively easy to draw generations on the same level by just ensuring that my data is very consistent. The rank method would work well for that if I needed an extra guarantee to overcome data inconsistencies. I was actually trying to use rank as a hack to solve a different problem related to horizontal ordering of nodes.

Here's a subsection of a tree that I've outputted using your library:

Screenshot 2021-05-09 at 17 08 57

Note that Judith is a descendant of Anne and William. Judith married Thomas, and had three children. Thomas also married Margaret and had a child.

This relationship is made much harder to see clearly by the fact that Judith's brother Hamnet is drawn between her and her spouse Thomas. I have been looking for ways to influence the horizontal placement of nodes to avoid situations like this.

My goal with rank was to draw the union node (the black dot) on the same level as the two spouses. I (perhaps naively) thought this would work as the decrossing algorithm would be much more likely to place spouses alongside each other.

So the key thing here is to take a fact from the the data ("X and Y produced children together") and use it to influence the placement of nodes adjacent to each other.

I hope this makes sense, and thanks again for taking the time to reply.

@gareth-lloyd gareth-lloyd reopened this May 9, 2021
@erikbrinkman
Copy link
Owner

Yeah, that makes sense. I was thinking about doing this, and had a solution to the equality constraint that I just pushed, but the problem I ran into is how to organize horizontal nodes. The problem is in the decrossing step, where we essentially want to add an extra step that says, "for the purposes of decrossing these two nodes should be next to to each other."

The problem I ran into (and I'm curious as to your thoughts on) is exactly how to do this succinctly. The way the commit I just pushed works is that everything can have an optional "equality group" that says that everything in the group needs to have the same layer. That concept doesn't seem to work here in that you need to be able to say: margaret and thomas are a pair, as are thomas an judith. Things could have "adjacency sets" e.g. multiple groups they belong to, but adjacency sets seem complicated, and it's not clear how sets that have more than two members should be handled, or nodes that belong to more than two sets, as it wouldn't be possible to actually constrain them. If I'm missing something obvious, or you have ideas for an appropriate api, throw them out.

Then there's the problem of implementation. This depends somewhat on the api.

  1. The first would be a breaking change. Before decrossing, we could group nodes into "meta nodes" that capture all nodes that need to be next to each other. Then all current decrossing algorithms would probably work (but the constraint that there's only one edge between nodes would now break, so that might cause issues).
  2. The second would be just like with rank, applying a configuration option to each individual decrossing operator. This might be a little more work and would allow only partial support which maybe isn't great, but it does open up one nice feature. Since the decrossing still "knows" about the independent nodes, methods like "decross-opt" will will work. Thus you added two constraints "margaret close to thomas" and "thomas close to judith" decross opt will then naturally put thomas in the center without needed anything hacky. The downside is that for family trees all the optimization is top down, so I don't think any current layering besides decross-opt will actually do a good job in this scenario.

As for a write up, if this all works out, probably just making your own observable with dummy data would be great, but I've also just linked to other projects that serve as an example.

@gareth-lloyd gareth-lloyd changed the title Trying to figure out 'rank' on layeringSimplex Influencing node coordinates to achieve desired horizontal ordering May 11, 2021
@erikbrinkman
Copy link
Owner

@gareth-lloyd I'm still not sure of a great to indicate that two nodes should be adjacent, however, it seems like a generally good property in the decrossing phase would be to minimize the index distance between nodes with a common parent or child. I think this may be possible with decross-opt / decross-twolayer-opt, but it should generally be what happens with twolayer-mean and twolayer-median. In the past this was a problem because the twolayer methods only did one downward pass, but the master branch is updated so it does a down and and uppass by default, and the number of passes can be tweaked. I think in your case, this should probably be sufficient to produce a pleasing layout.

@gareth-lloyd
Copy link
Author

Thanks so much for this. I will try it out and let you know how it goes.

@erikbrinkman
Copy link
Owner

@gareth-lloyd also, I just figured out how to add the minimization to the two "opt" methods. It makes the optimization more expensive, so you have to enable it with e.g. decrossOpt().dist(true). So if you're checking out / cloning master, that should work too.

Note unfortunately, none of this is documented yet, because I haven't pushed the update, so feel free to ping here if things aren't clear.

@gareth-lloyd
Copy link
Author

gareth-lloyd commented May 19, 2021

I've been trying to experiment with these changes, but I keep hitting a bug that seems to come up from the javascript-lp-solver dependency.

TypeError: Cannot set property 'lastSolvedModel' of undefined

From the stack trace, I can see it's happening during layering simplex.

Here's what I did:

  • Checked out master at ab464f3d33ac78025732f1ac8ff3d1264339f7fc
  • Build the project
  • npm pack
  • Installed the library referencing the .tgz file produced by previous step

It seems to fail for me on any data, but here's a concrete example.

import {
  coordQuad,
  dagConnect,
  decrossOpt,
  layeringSimplex,
  sugiyama,
} from "d3-dag";

export function minimalRepro() {
  const data = [
    {id: '1', children: ['2', '3']},
    {id: '2', children: []},
    {id: '3', children: []},
  ];
  const linkages = [];
  for (let node of data) {
    for (let id of node.children) {
      linkages.push([node.id, id]);
    }
  }

  let dag = dagConnect()(linkages);

  let treeFn = sugiyama()
    .nodeSize(() => [20, 20])
    .layering(layeringSimplex())
    .decross(decrossOpt())
    .coord(coordQuad());

  // Throws "TypeError: Cannot set property 'lastSolvedModel' of undefined"
  treeFn(dag);
}

Screenshot 2021-05-19 at 11 59 18

@erikbrinkman
Copy link
Owner

wow! thanks for taking the time to post this. I switched master to esbuild, but don't really have tests set up for the bundle. It seems that rollup is setting window to an empty object, and that was allowing the wrapper around javascript-lp-solver to work. I'd still like to figure out how to get esbuild working, and maybe add some bundled tests in the process, but in the meantime, I added a branch "rollup" that just reverts that change. I tested that building it that way allows your example to run, so hopefully that will get you working. (note, you don't need to pack it, you can add the directory e.g. npm add path/to/d3-dag after it's built.).

@gareth-lloyd
Copy link
Author

Awesome, thanks. I'll try again.

@gareth-lloyd
Copy link
Author

gareth-lloyd commented May 25, 2021

I'm still loving this library and having generally great results.

However, I do keep hitting trees that crash the decrossing step. It doesn't seem to be purely related to the size of the graph - I can quite easily find graphs which are acceptable on the "medium" setting for which the layout algorithm will never complete.

I've sketched a minimal reproduction of this issue.

When you run thisWillWork, it uses decrossTwoLayer to prove that the data is valid.

Here's an image of this algo's output:

Screenshot 2021-05-25 at 18 35 42

Running either of the other two functions will cause the process to use maximum processor, and it has never returned a result (I've waited max 10 minutes).

import {
  coordQuad,
  dagConnect,
  sugiyama,
  layeringSimplex,
  decrossTwoLayer,
  twolayerOpt,
  decrossOpt,
} from "d3-dag";

function setUpGraph() {
  const visibleNodeDirectory = new Map();
  VISIBLE_NODES.forEach((i) => visibleNodeDirectory.set(i.id, i));

  const edges = [];
  for (let node of visibleNodeDirectory.values()) {
    for (let id of node.children) {
      // Add the relationship if the node is in the visible set.
      if (visibleNodeDirectory.has(id)) {
        edges.push([node.id, id]);
      }
    }
  }

  return dagConnect()(edges);
}

/*
 * LAYS OUT SUCCESSFULLY
*/
export function thisWillWork() {
  const dag = setUpGraph();

  let treeFn = sugiyama()
    .layering(layeringSimplex())
    .nodeSize(() => [100, 100])
    .decross(decrossTwoLayer().order(twolayerOpt().large("medium")))
    .coord(coordQuad());

  console.log("Attempting layout");
  treeFn(dag);
  console.log("Layout succeeded")
  return dag;
}

/*
 * NEVER COMPLETES
 * I have attempted this on the 0.7.0 release and the latest code on master
*/
export function thisWillNotWorkWithV070() {
  const dag = setUpGraph();

  let treeFn = sugiyama()
    .layering(layeringSimplex())
    .nodeSize(() => [100, 100])
    .decross(decrossOpt().large("medium"))
    .coord(coordQuad());

  console.log("Attempting layout");
  treeFn(dag);
  console.log("This will never be logged")
  return dag;
}

/*
 * NEVER COMPLETES
 * I have attempted this on the latest code on master
*/
export function thisWillNotWorkWithLatestCommits() {
  const dag = setUpGraph();

  let treeFn = sugiyama()
    .layering(layeringSimplex())
    .nodeSize(() => [100, 100])
    .decross(decrossTwoLayer().order(twolayerOpt().dist(true).large("medium")))
    .coord(coordQuad());

  console.log("Attempting layout");
  treeFn(dag);
  console.log("This will never be logged")
  return dag;
}

const VISIBLE_NODES = [
  { id: "@P9@", children: ["@P10@-@P9@"] },
  { id: "@P10@-@P9@", children: ["@P18@", "@P17@", "@P3@"] },
  { id: "@P10@", children: ["@P10@-@P9@"] },
  { id: "@P18@", children: ["@P18@-@P58@"] },
  { id: "@P18@-@P58@", children: ["@P59@", "@P57@", "@P56@", "@P55@"] },
  { id: "@P58@", children: ["@P18@-@P58@"] },
  { id: "@P59@", children: ["@P59@-@P73@"] },
  { id: "@P57@", children: ["@P57@-@P79@", "@P57@-@P80@"] },
  { id: "@P56@", children: ["@P56@-@P84@"] },
  { id: "@P55@", children: ["@P55@-@P88@"] },
  { id: "@P17@", children: ["@P17@-@P54@"] },
  { id: "@P17@-@P54@", children: ["@P53@", "@P52@"] },
  { id: "@P54@", children: ["@P17@-@P54@"] },
  { id: "@P53@", children: ["@P53@-@P65@"] },
  { id: "@P52@", children: ["@P52@-@P69@"] },
  { id: "@P3@", children: ["@P2@-@P3@"] },
  { id: "@P2@-@P3@", children: ["@P1@", "@P11@"] },
  { id: "@P2@", children: ["@P2@-@P3@"] },
  { id: "@P1@", children: ["@P1@-@P6@"] },
  { id: "@P11@", children: ["@P11@-@P14@"] },
  {
    id: "@P19@-@P20@",
    children: ["@P588@", "@P9@", "@P586@", "@P587@", "@P590@", "@P589@"],
  },
  { id: "@P19@", children: ["@P19@-@P20@"] },
  { id: "@P20@", children: ["@P19@-@P20@"] },
  { id: "@P588@", children: [] },
  { id: "@P586@", children: [] },
  { id: "@P587@", children: ["@P587@-@P631@"] },
  { id: "@P587@-@P631@", children: [] },
  { id: "@P631@", children: ["@P587@-@P631@"] },
  { id: "@P590@", children: ["@P590@-@P604@", "@P590@-@P605@"] },
  { id: "@P590@-@P604@", children: [] },
  { id: "@P604@", children: ["@P590@-@P604@"] },
  { id: "@P590@-@P605@", children: ["@P607@", "@P606@"] },
  {
    id: "@P605@",
    children: ["@P590@-@P605@", "@P605@-@P765@", "@P605@-@P764@"],
  },
  { id: "@P607@", children: ["@P607@-@P768@"] },
  { id: "@P606@", children: ["@P606@-@P772@"] },
  { id: "@P589@", children: ["@P589@-@P608@"] },
  { id: "@P589@-@P608@", children: [] },
  { id: "@P608@", children: ["@P589@-@P608@"] },
  {
    id: "@P591@-@P592@",
    children: ["@P19@", "@P598@", "@P597@", "@P596@", "@P595@"],
  },
  { id: "@P591@", children: ["@P591@-@P592@"] },
  { id: "@P592@", children: ["@P591@-@P592@"] },
  { id: "@P598@", children: [] },
  { id: "@P597@", children: ["@P597@-@P599@"] },
  { id: "@P597@-@P599@", children: ["@P601@", "@P600@"] },
  { id: "@P599@", children: ["@P597@-@P599@"] },
  { id: "@P601@", children: [] },
  { id: "@P600@", children: [] },
  { id: "@P596@", children: [] },
  { id: "@P595@", children: [] },
  {
    id: "@P584@-@P585@",
    children: [
      "@P609@",
      "@P613@",
      "@P20@",
      "@P612@",
      "@P611@",
      "@P614@",
      "@P610@",
    ],
  },
  { id: "@P584@", children: ["@P584@-@P585@"] },
  { id: "@P585@", children: ["@P584@-@P585@"] },
  { id: "@P609@", children: ["@P609@-@P619@"] },
  { id: "@P609@-@P619@", children: [] },
  { id: "@P619@", children: ["@P609@-@P619@"] },
  { id: "@P613@", children: [] },
  { id: "@P612@", children: ["@P612@-@P620@"] },
  {
    id: "@P612@-@P620@",
    children: ["@P625@", "@P624@", "@P623@", "@P622@", "@P621@", "@P626@"],
  },
  { id: "@P620@", children: ["@P612@-@P620@"] },
  { id: "@P625@", children: [] },
  { id: "@P624@", children: [] },
  { id: "@P623@", children: [] },
  { id: "@P622@", children: ["@P622@-@P789@"] },
  { id: "@P621@", children: [] },
  { id: "@P626@", children: [] },
  { id: "@P611@", children: ["@P611@-@P627@"] },
  { id: "@P611@-@P627@", children: ["@P628@", "@P629@", "@P630@"] },
  { id: "@P627@", children: ["@P611@-@P627@"] },
  { id: "@P628@", children: ["@P628@-@P756@"] },
  { id: "@P629@", children: ["@P629@-@P754@"] },
  { id: "@P630@", children: ["@P630@-@P758@"] },
  { id: "@P614@", children: [] },
  { id: "@P610@", children: [] },
];

@erikbrinkman
Copy link
Owner

If I understand what you're saying correctly. This works for decross two-layer, but you want it to work for decross opt? If that's what you're asking, I think you're sol. Optimal decross minimization is an np-complete problem. The way it's implemented is currently extra bad (I think I talk about this more in another issue), but essentially there's no way around it. If you want to try an implement a faster optimal decrossing, I'm definitely open to it, but it's never going to be a scalable solution. If this is the route you want to go,almost all of this is implemented as a binary optimization, so a binary optimization solver would probably be better. An arbitrary non-convex optimizer would allow you to model the space more compactly, so maybe that's a better approach.

One alternative is to improve the two-layer decrossing. One technique from the original paper talks about just doing greedy swapping after an initial good layout (e.g. median). I think this probably makes sense, and have considered implementing it, but I'm not sure how to make it reasonably fast or implement it ellegantly, so I also haven't bothered.

As a final note, the medium and large flags are just there to allow you to shoot yourself in the foot if you want to. "small" should finish, "medium" should run but not finish, "large" should just crash. At least roughly, they are very rough estimates of how long it might take to finish.

@gareth-lloyd
Copy link
Author

gareth-lloyd commented May 26, 2021

Thanks for your reply. It's been a long time since my CS degree, and I never went that deep on these concepts in the first place. I have some research to do before I can properly process all this!

Let me tell you what I'm seeing, and try to relate it to your response as best I can.

I'm building a family tree explorer. You start with a default view and you can choose to expand roots and branches to look around the whole data set.

Say I start with 40 nodes in the view. By expanding a couple's offspring, I might add 4 nodes here, and by expanding an individual's parents I might add a couple more. Obviously the algorithm takes longer to lay out more nodes, but I've seen it handle 70-80 nodes in a few seconds or less.

At some (unpredictable) point, I will add some relationships into the layout that cause the algorithm never to finish. Even if it had been laying out the tree in <1s before, adding just a few more nodes can suddenly switch it to never completing. I've uncovered relatively small data sets (<50 nodes) that cause this behaviour.

I guess I was analysing this situation with reference to more conventional algorithms that run over their input in polynomial time. If such an algorithm returned a result when n=50 and never returned when n=52, I would look for bugs such as infinite loops caused by certain configurations of input.

If I understand correctly, you're telling me that there's no such bug. Instead, the results of the "opt" layouts is actually non-deterministic, and we can't know ahead of time if a certain input has any solution. And if it does not, nothing will ever be produced by the algorithm.

Thanks for explaining the purpose of the "medium" and "large" options.

I will do more research and perhaps try to come up with a new algorithm that works on the limited subset of graphs representing valid family trees.

EDIT: if the above is a correct reading, please consider this issue closed.

@erikbrinkman
Copy link
Owner

Yeah, that's basically correct. It's not that no solution exists, but it's that computing it can take exponential time. It's not clear exactly what might cause it, but yes it is totally expected that somehow adding two nodes will cause it to "never" complete. The medium hook was meant to catch you before you get that far. An obvious alternate would be to try both with a timeout, and if opt takes too long to go with something simpler.

I don't want to discourage you looking, but if a family tree can basically be any dag, then I don't think you'll find anything simpler.

Two questions out of curiosity:

  1. In practice do you find the two-layer median decrossing to be subpar? It's a new option, but have you tried adding more iterations to the median?
  2. It seems like a number of people like building incremental dags. The way this is designed wasn't really to support that, but it seems like a thing people vaguely want. Would you find an api, that functioned something like "add node to layout" useful? I'm still not sure what it would look like, but it's something I'm thinking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants