-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The implementation of _fruchterman_reingold
significantly deviates from the original algorithm with adverse effects for every graph that is not fully symmetric
#4885
Comments
Thanks @paulbrodersen . This hasn't garnered any discussion, likely because there's a lot to digest! Any chance you'd be able to open a PR with suggested changes and/or some of the above examples as test cases? |
Sorry, I now have a newborn (5 weeks) and won't have time for this for the foreseeable future. You can have a look at my implementation of the FR algorithm in netgraph here. |
As mentioned on the mailing list recently I will take over this issue. |
@991jo Feel free to tag me here or shoot me an email if you have any questions. |
Okay, quick progress report and a couple of observations: I implemented the FR algorithm with the forces as intended by FR, handled nodes in the same position by slightly randomizing their positions and added a simple frame implementation. Regarding the example with an the completely unconnected graph: I will come up with several other examples instead and I will make a comparison between the different frame implementations. |
A more general issue with the border is the combination of the value of k. However FR write that they used
So basically both implementations assume When we are doing this and apply a boundary the graph usually is to big for the frame. If I reduce the edge-length by a factor of e.g. 0.2 the results are small enough to fit into the frame and the results are as expected.
Choosing the strength of the forces means reducing An other example for large values of If we set For the chain to straighten out we can remove the boundary and give it ~500 iterations. The netgraph package also has no boundary. @paulbrodersen : Did you come to the same conclusion as I did, that a boundary is mostly useless when the graph has to be small enough to almost never touch it? If so I would propose to just ignore the boundary, set |
There is a (inelastic) boundary in netgraph: https://github.com/paulbrodersen/netgraph/blob/master/netgraph/_node_layout.py#L469 Basically, on each iteration, I compute candidate node positions and but only update positions that result in a valid positions as defined by the When I last looked at the implementation in networkx, it did not have a frame. The origin/scale parameters are only used to rescale the final node layout. As I explain above, I think that is a flaw, as for asymmetric graphs like my ball-and-chain example, this results in solutions that are far from the intended optimum (i.e. an even distribution of nodes on the canvas within the constraints imposed by the graph structure). I like your current test cases but I would urge you to include some highly asymmetric graphs as well. |
I just made a draft PR for the code I changed. One open question that should be discussed is the sparse implementation that is used for graphs with more than 500 nodes. An other question is how to handle fixed positions and the random generation of the positions of the remaining vertices. |
I would raise an error whenever fixed positions fall outside the dimensions specified by origin/scale. |
I misread the description of the fixed vertices. The fixed vertices don't get initial positions by the user, they are randomly generated at the beginning but stay fixed in that position. |
The docs say that the input |
Wow -- Thanks for this gallery of examples and proposal for moving forward. I think the proposed approach would allow people to stay with the current function, but offer an option that might help a few graphs with long strands of hair. The doc_string description should explain when someone might want to use this. To reduce keyword input pollution, could we combine the boundary and C values into a single keyword argument that is None by default (indicating no boundary) and turns on the boundary with C-value given when the input is not None. Are there parts of this I'm missing? I'm not actually sure how much better the layouts are with the restricted domain. The chain of the ball and chain is bent more as C increases. Does it also show more of the ball? I can't see that very well... Maybe I'm not looking for the right thing. I guess I usually don't expect a picture of a ball to show much useful information... :) |
The implementation of
_fruchterman_reingold
significantly deviates from the original algorithm with adverse effects for every graph that is not fully symmetricPreamble
A pdf of the original paper can be found here. There is a listing or pseudocode of the algorithm on page 1132. I believe that the implementation in networkx is largely based on this pseudocode. Before anyone raises objections to my points below, I would like to point out that the text of the paper significantly deviates from this pseudocode, and that I assume that the more detailed text -- not the pseudocode -- describes the intended algorithm, and thus constitutes ground truth.
Summary of the algorithm
The aims of the Fruchterman-Reingold algorithm are (1) to distribute vertices evenly within a given frame, while (2) having uniform edge lengths. It achieves this goal by simulating two forces between nodes:
For small distances (distance < k), the repulsive force is stronger than any attraction; for large distances, attraction dominates.
In the absence of connections, vertices only repel each other. In the presence of a frame, the system reaches equilibrium when all vertices are maximally far apart from their nearest neighbours (as only short distances matter for repulsion). This occurs when the vertices are distributed uniformly (aim (1)) over the area of the frame. In this case, the average distance between them will be
Area / sqrt(|V|)
(assuming rectangular frames). If two vertices are connected, the repulsive and attractive forces balance when the two vertices are k apart. This promotes uniform edge lengths (aim (2)). If k is further approximatelyArea / sqrt(|V|)
, then both forces act in concert such all neighbouring nodes will be approximately k apart. If k is smaller, the average distance between nearest neighbour nodes will be between k andArea / sqrt(|V|)
, depending on connection density.Significant deviations
The implementation in networkx differs in two, seemingly innocuous but significant details:
1. Application of the temperature
As in many other annealing algorithms, the Fruchterman-Reingold algorithm has a temperature parameter that influences the magnitude of displacements and decreases over time. This is supposed to dampen oscillations, and to eventually lock a single solution into place. Specifically, and from the paper:
"The idea is that the displacement of a vertex is limited to some maximum value, and that this maximum value decreases over time [...]."
However, in the implementation in networkx, all displacement magnitudes are set to the current temperature
t
, and the previously computed displacements are only used to determine the direction:Dividing the displacement vector by its length results in a unit vector. All of these unit vectors are then rescaled by
t
.Furthermore, the cooling scheme in the paper is not linear, but 2-part: first, a rapid decrease in temperature ("quenching") followed by further updates at low but constant temperature ("simmering"). However, in my own experiments, the cooling scheme does not have a large effect (whereas how the temperature is applied does matter).
2. Enforcement of the frame
In the paper, on each iteration, for vertices with displacements that pushes them outside the frame, the frame negates all components of displacements normal to the frame (see Fig. 6 for a much better and easier to parse pictorial explanation). In my experiments, this approach actually results in issues as the vertices then often slide into the corners. When multiple vertices slide into the same corner, the repulsion terms lead to a numerically unstable behaviour. Personally, I much prefer the approach in Fig. 3, inelastic collision, where vertices colliding with the frame simply do not move until all components of the displacement are pushing the vertex off the frame.
However, the implementation in networkx simply ignores the frame (given by the scale and origin parameters) until the very end, when the positions are rescaled to be within the frame iff there are no fixed nodes.
When does this matter?
The spring layout in networkx often results in node placements that seem extremely plausible:
Graph is a cube. The plot shows a cube. All is well that ends well. Case closed.
However, there are counter-examples that clearly show that there is something rotten in the state of Denmark:
The Fruchterman-Reingold is a force directed algorithm. All nodes repel each other proportional to the inverse distance. As outlined above, for a fully unconnected graph, the equilibrium configuration should be attained when the nodes are evenly distributed within the frame. Clearly, this is not the case in networkx. Notably, other implementations of the FR algorithm show a different behaviour, e.g. see this SO post.
So what is happening here? The net force acting on each vertex pushes it away from the centroid of all other vertices. If the number of vertices is large enough, all of these centroids are approximately in the same location. As the displacement length is the same for all nodes, namely the current temperature, this results in a uniform push outwards, and the ring structure seen above. Crucially, the rings are fairly symmetric such that the centroids remain in place. On subsequent iterations, the ring hence simply expands.
Apart from this clearly pathological case, and more generally, I think these two issues affect every graph that is not symmetric with uniform degree distribution. For example, observe the layout of a ball-and-chain graph:
Repulsion from the 5 nodes in the chain should not be sufficient to push all 30 nodes of the ball into one corner. However, as the absolute magnitude of the repulsion is irrelevant (temperature), and expansion is ultimately limitless (no frame), that is exactly what happens.
I am still struggling to find good example graphs that are sufficiently asymmetric to show the effect while being simple enough such that anyone can have some intuition what the graph should look like given a force directed layout (within a frame !). However, I hope that these examples can get the ball rolling.
I have a different implementation of the FR algorithm here. It differs from the original algorithm in the exact manner how the frame is applied but follows the paper otherwise closely (it also handles fixed nodes a bit more efficiently but that is another conversation).
Using my implementation, I get the following layout:
The text was updated successfully, but these errors were encountered: