Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draw MultiDiGraph edges and labels qa7008 #7010

Merged
merged 32 commits into from Feb 6, 2024
Merged

Draw MultiDiGraph edges and labels qa7008 #7010

merged 32 commits into from Feb 6, 2024

Conversation

dgrigonis
Copy link
Contributor

@dgrigonis dgrigonis commented Oct 11, 2023

Following on my Q&A

All tests running. Fairly minimal changes. See discussion for my concerns regarding rad argument sourcing.

Everything is working as expected, except self-loop labels are not positioned properly.

Minimal Example:

import networkx as nx
import matplotlib.pyplot as plt
from netgraph import MultiGraph

G = nx.MultiDiGraph()
for i, (u, v) in enumerate([('a', 'b'), ('a', 'b'), ('a', 'b'), ('b', 'a'),
                            ('b', 'c'), ('b', 'c'), ('c', 'a'), ('c', 'c')]):
    G.add_edge(u, v, weight=round(i / 3, 2))


pos = nx.spring_layout(G, seed=7)
nx.draw_networkx_nodes(G, pos)
nx.draw_networkx_labels(G, pos, font_size=20)
nx.draw_networkx_edges(G, pos, connectionstyle=[
    'arc3,rad=0.1', 'arc3,rad=0.2', 'arc3,rad=0.3'
])

nx.draw_networkx_edge_labels(G, pos, rad=[0.1, 0.2, 0.3])
plt.show()

Negative radians also work, and makes a nicer graph in this case.

rad = [-0.1, -0.25, -0.35]
nx.draw_networkx_edges(G, pos, connectionstyle=[f'arc3,rad={r}' for r in rad])
nx.draw_networkx_edge_labels(G, pos, rad=rad)

@dgrigonis
Copy link
Contributor Author

dgrigonis commented Oct 12, 2023

Implemented correct label drawing for self-loops.

The above example looks like:

@dgrigonis
Copy link
Contributor Author

dgrigonis commented Oct 13, 2023

Some notes for future work:

FancyArrowPatch.get_path().vertices returns CODES in value space from which positioning of the label can be inferred by finding central (or percentile) point of bezier curve.

Arc3 & Angle3 are simple 3-point curves, while Arc and Angle (more general classes) would require some conditions, but also possible.

It is very difficult to keep edge drawing and label drawing in sync due to the fact that there are parameters that are calculated from input sequence of edges, which means that edge input sets of label and edge drawing must be the same, while in practice, one might not want to draw labels for all of the edges (and possibly vice versa).

So to achieve complete sync:

  1. draw_networkx_edges should remove dependency of such parameters (such as h, which represents height of the graph and is calculated from edges provided to the function. I would argue that this it is currently not robust anyway, as it doesn't take into account nodes, that were not provided to the function)
  2. Factor out matpotlib patch construction for curved arcs and self loops
  3. Reconstruct the same patches in draw_networkx_edge_labels using the same factored out methods
  • draw_networkx_edge_labels then should accept argument connectionstyle instead of rad (same as draw_networkx_edges). Otherwise, draw_networkx_edges can accept separate rad, which is in line with one of the ways to initialise: ConnectionStyle("Arc3", rad=0.2)
  1. Infer label positions from FancyArrowPatch.get_path().vertices. Also, Draw Edge Labels in Multi Graph with curved edges #3813
    has made a nice new class for Curved text, which can be used for this.

However, currently all is working correctly for straight lines, Arc3 objects of curved lines and self-loops, which I think is sufficient for most of the cases.

@dgrigonis
Copy link
Contributor Author

I am happy with the result. Simple and does what it needs to.

@dgrigonis
Copy link
Contributor Author

dgrigonis commented Oct 22, 2023

import networkx as nx
import matplotlib.pyplot as plt

G = nx.MultiDiGraph()
for i, (u, v) in enumerate([('a', 'b'), ('a', 'b'), ('a', 'b'), ('b', 'a'),
                            ('b', 'c'), ('b', 'c'), ('c', 'a'), ('c', 'c')]):
    G.add_edge(u, v, weight=round(i / 3, 2))


pos = nx.spring_layout(G, seed=7)
nx.draw_networkx_nodes(G, pos)
nx.draw_networkx_labels(G, pos, font_size=20)
connectionstyle=['arc3,rad=-0.1', 'arc3,rad=-0.2', 'arc3,rad=-0.3']
nx.draw_networkx_edges(G, pos, connectionstyle=connectionstyle)
kwds = dict(label_pos=0.3, bbox=dict(alpha=0))
nx.draw_networkx_edge_labels(G, pos, connectionstyle=connectionstyle, **kwds)
plt.show()

All done from previous comment.
Closes #3813

@dgrigonis
Copy link
Contributor Author

dgrigonis commented Oct 22, 2023

To sum up the latest work.

The main goal was to factor out FancyArrowPatch, because #3813 has developed a CurvedArrowText class which takes it as an argument. So both functions (draw_edges and draw_edge_labels) now use the same FancyArrowFactory class, which is outside function body.

Currently, in draw_edge_labels straight line labels are drawn using the same method as curved ones, while draw_edges function uses LineCollection for undirected graphs or if explicitly asked. I have decided to not separate it in draw_edge_labels function as it makes code simpler and there is minimal decrease in performance. Edge drawing is much more expensive than labels.

To give some perspective. Run times:

0.069 - Fancy Arrows (with labels)
0.059 - Fancy Arrows (without labels)
0.049 - Line Collection (with labels)
0.041 - Line Collection (without labels)

So having only one exception for self-loops makes things simpler.

Self loops get the data point from constructed FancyArrowPatch as well, however, it only gets necessary data point from ConnectionStyle's Path object and then goes on using simple text without angles.

Although things are not perfect and there is scope for making things better, functions are now better structured and any further work should be much easier.

Possible discrepancies can still arise due to one of the following:

  1. If min_source_margin and min_target_margin are set to different value than 0 in draw_edges, as these are not present in label drawing function. I did not include these to not overcrowd draw_edge_labels with new parameters.
  2. If the input edges differ of 2 functions. Reason being that self-loop height is calculated from distance between the lowest and the highest input node, so if inputs differ, this value could be different too.
  3. If more exotic ConnectionStyle functions are used for drawing edges, such as Arc or Angle. These have more than 1 bezier curve in their Path, which is not currently supported by CurvedArrowText in label drawing function. However, given structure is in place, incorporating these shouldn't be very difficult if there is a need for it.

But these are unlikely scenarios.

Last chart to show how things look:

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice!
I see you have changed one test now that multiedge labels are possible.
Can we add tests for the new features too? We don't have a good way to see if they look nice, but I guess I'm looking for some cases that exercise the code and make sure it doesn't blow up. A "smoke test" for curved text maybe? The examples you put into the PR comments would probably be a good start.

The imports inside __init__ methods are probably better to move into the class definitions (before def __init__) rather than inside a method. self.mpl will still work that way. I think that still protects them from the case when someone doesn't have numpy or scipy installed also.

From your comment in the #7008 discussion

One question that I am sure I can not make myself is about rad input. It is either:
a) Strictly Number | list | tuple, where i'th item is rad for i`th edge_id
b) Input is a scalar, which is accumulated as many times as needed

For edge lines, it is a bit trickier as it is part of connectionstyle argument. Here it is either:
a) connectionstyle is strictly str | list | tuple, where i'th item is applied to i'th edge_id
b) extra rad argument is introduced and it is appended to connection style. Here the above question regarding numeric input remains as in labels.

I think a) and a) for both is safer.

I think choices a) and a) for rad and connectionstyle are good.

Sometimes it is hard for me to see "what is missing" in a PR like this. Was any functionality removed? Thanks for the detailed comments... might have made the response time longer than you wanted, and I can't promise I've internalized everything. But the labels and multiedges look quite good! And I can at least read the code. :}

We should also think about examples for the doc_strings, and/or maybe an example in the examples gallery (in the repo's examples/ directory). You've created a bunch of content in your posts that would be very good to get into the docs somehow.

networkx/drawing/nx_pylab.py Outdated Show resolved Hide resolved
networkx/drawing/nx_pylab.py Outdated Show resolved Hide resolved
networkx/drawing/nx_pylab.py Outdated Show resolved Hide resolved
networkx/drawing/tests/test_pylab.py Outdated Show resolved Hide resolved
networkx/drawing/nx_pylab.py Outdated Show resolved Hide resolved
networkx/drawing/nx_pylab.py Outdated Show resolved Hide resolved
networkx/drawing/nx_pylab.py Show resolved Hide resolved
Co-authored-by: Dan Schult <dschult@colgate.edu>
@dschult
Copy link
Member

dschult commented Oct 22, 2023

I guess I should also ask you to merge with the main branch so you can get the circle-ci tests to pass and we can see what the documentation looks like in the "artifact" test (by clicking on the details for that task).
Thanks!

@dgrigonis
Copy link
Contributor Author

dgrigonis commented Oct 22, 2023

You mean merge to upstream main? Do I have permissions to do it? Do I need to do commit squash beforehand? Or merge main into my branch?

@dgrigonis
Copy link
Contributor Author

dgrigonis commented Oct 22, 2023

If I move imports into class body, they do not protect someone who doesn't have numpy installed. This is probably the first time I have done this, but I couldn't think of a better way.

Also, rad inputs are in sync now, they are sourced as part of connectionstyle argument in both functions.

@dschult
Copy link
Member

dschult commented Oct 22, 2023

Yes -- you are right that moving imports into the class body don't work. Sorry-- I meant to take that out of my post.

And by "merge main", I meant to merge it into this branch:

git ch <this-branch>
git merge main

I think it should just "fast-forward" the merge (leaving your commits the same but basically rebasing them onto the latest version of main) which will capture the new settings that allow the docs to build.

@dgrigonis
Copy link
Contributor Author

I am not sure about the docs, I already had it in sync before and they were failing. Some help here would be appreciated.

@dgrigonis
Copy link
Contributor Author

dgrigonis commented Oct 22, 2023

I have reverted:

if len(connectionstyle) > 1:
    ...

Even if it is 1, this check shows if maximum number of edges per pair is greater than length of this input. See tests, where it is expected to fail on length == 1.

Also, wrote some tests, tested the key argument here - again - connectionstyle and it's valid inputs for 3 graphs: DiGraph, MultiGraph and MultiDiGraph. And tested for correct return values, including check for CurvedArrowText.__name__. Couldn't do a proper isinstance as it is inside function's body.

I don't think there is more to test here apart from visual experimentation/validation.

@rossbar rossbar added the Visualization Related to graph visualization or layout label Nov 8, 2023
@dschult
Copy link
Member

dschult commented Dec 5, 2023

Let's see if we can get this reviewed over the next week or two. :}

@dgrigonis
Copy link
Contributor Author

bump

@dschult
Copy link
Member

dschult commented Feb 1, 2024

We have talked about making an effort to get this merged in the last two community meetings. No promises of course, but we do really want this in -- we just haven't made the time to work through it yet. My laptop wrist-rest now has a sticky note with "7010" written on it -- so that ought to help right? :}

@dgrigonis
Copy link
Contributor Author

Thanks for the update. Good to know it isn't completely forgotten. :)

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how the multiedges look with this PR. :) As always, the pictures start to look like hairballs with more than N edges (N~50 or so I guess). But they still look better with this system. This is a great addition to NetworkX and I really appreciate this work.

The only blocker to merging that I have found so far involves figuring out the connectionstyle for each multiedges in ConnectionStyleFactory. This leads to the documentation failure in the CI tests I think/hope. There are two problems that I think could be avoided by deciding which connectionstyle to use in FancyArrowFactory and pass that into .curved() or self_loop() rather than passing in the edge key.

Issues:

  • The connectionstyles argument can be a string or a container of strings. But the code only works if it is a container of strings. If it is a string, it creates a bunch of one letter self.base_connection_styles. I think a fix would be a quick check isinstance(connectionstyles, str) and if so, put it into a list.
  • In general, edge_key can be any hashable -- not just an integer. The modulo arithmetic edge_key % self.n at line 494 (used to pick out the connectionstyle) doesn't work well when edge_key is not a number. Specifically, when edge_key is a string, it leads to a TypeError for modulo.
  • I'm not sure the connectionstyle input is designed to be used modulo edge keys. With most MultiGraphs, the edge keys are integers e..g 0, 1, 2 and these values are reused (eg. not unique) throughout the graph. Each node-pair typically starts from 0 and increments. This means that when connectionstyle is a list, the styles do not correspond to the edges in edge_list. Rather they correspond to edge_keys in the multigraph -- and there wouldn't be a way to make the 2nd edge between (1,2) a different style from the second edge between (2,3).

I think the solution to this is probably to figure out the connectionstyle in FancyArrowFactory before calling curved() or self_loop() and doing so based on the order of edges in edgelist and connectionstyle. Maybe dict(zip(edgelist, connectionstyle)) but I haven't thought about that much.

I also have some silly questions/comments (mostly questions) about how to work with multiedge labels.

  • When drawing the labels for multiedges I need to provide the same connectionstyle that I used for drawing the edges. Otherwise the labels get positioned as if the edges are straight. Am I understanding this correctly? It took me a bit to figure out that connectionstyle was needed. :}
  • It looks like label_pos must be a scalar and not a list of values. Is this true? It's fine to have it that way. I am just asking if I am missing anything.

Thanks again!

@dgrigonis
Copy link
Contributor Author

  1. Hm, I guess this is only true for FancyArrowFactory connectionstyle argument as this sort of check is done in both main functions, so high level code should be working correctly. Or am I missing something???? Either way, I will add this check to the Factory too.
  2. Haven't thought about that. I think your suggestion will work here well. I am thinking of doing it in main function and pass edge_indices (as opposed to edge_keys) to Factory. I would rather leave Factory more generic class, which receives all arguments nicely prepared.
  3. That was just more info about what is wrong with 2. Completely agree, my edge_key approach just doesn't work here.

Last 2 bullets:

  1. Yes, exactly the point there. Otherwise there is no way of knowing where those arrows are. That was the whole point of factoring out FancyArrowFactory - so that the code can be reused in both functions so they can be synchronized more easily.
  2. Yes, label_pos is a scalar now. Could allow a list of length = len(edgelist). Do you think this could be useful? This one could be done very easily later if the need came up.

@dgrigonis
Copy link
Contributor Author

Fixed those 2. I think tests pass, just something/someone cancelled one of them.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that CurvedArrowText or ConnectionStyleFactory might be useful outside their definition function/class? If so we should make them available by moving them outside. But I can see why they might not be. And of course we can always do that later if it is requested.

For imports of collections and itertools, I think we can leave out the abbreviations in this module. The two lines where they are used are short and can easily fit the whole name, presumably easing confusion. (For the Scientific-Python packages we have been working to standardize 2-or-3-letter abbreviations. But I don't know of an effort to do that for the builtin libraries. We have used itertools as it elsewhere in networkx. But I'd prefer since we only use them twice to leave them unabbreviated.)

There is a keyword arg for draw_networkx_edge_labels named rotate that is no longer used, should we make the default for labels_horizontal be not rotate instead of False when calling CurvedArrowText? Also, can you walk me through the logic of why rotation=0 for the self-loop case? It looks like we now avoid the whole trans_angle cruft from before, so that's nice. But I'd like to understand how that works.

BTW we can change the name of the public facing kwarg from rotate to something else if you like (with a deprecation cycle) but let's do that in another PR.

I think I'm done with requests/questions now. :}

Comment on lines 905 to 908
if G.is_multigraph():
edgelist = list(G.edges(data=False, keys=True))
else:
edgelist = list(G.edges(data=False))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, G.edges without the () is the same as what you prescribe here:

  • MultiGraph.edges yields 3-tuples (u, v, ekey)
  • Graph.edges yields 2-tuples (u, v)
    I'm not saying you have to change it, but those are for .edges. Calling the attribute as a method yields 2-tuples for both Graph and MultiGraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Useful to know, thank you. I will change it and add a comment regarding these workings.

@dgrigonis
Copy link
Contributor Author

dgrigonis commented Feb 5, 2024

Do you think that CurvedArrowText or ConnectionStyleFactory might be useful outside their definition function/class? If so we should make them available by moving them outside. But I can see why they might not be. And of course we can always do that later if it is requested.

Ok, so ConnectionStyleFactory has a low chance of being reused. CurvedArrowText is much more likely. However, I can not move this one to global namespace as it inherits from matplotlib class. It is not perfect (having local class definitions), but I think it's ok to leave it as it is, as even CurvedArrowText is not very likely to be re-used. For the future, possible way to factor out to global ns without introducing hard matplotlib dependency, would be to make it lazy class definition using similar approach to lazyasd.

For imports of collections and itertools, I think we can leave out the abbreviations in this module. The two lines where they are used are short and can easily fit the whole name, presumably easing confusion. (For the Scientific-Python packages we have been working to standardize 2-or-3-letter abbreviations. But I don't know of an effort to do that for the builtin libraries. We have used itertools as it elsewhere in networkx. But I'd prefer since we only use them twice to leave them unabbreviated.)

Ok, removing those.

I have standardised a large part of these for myself. I changed it to itl because it is very often used as a variable name for iterables/iterators. I standardised to mostly 3-or-4 letter abbreviations for standard library.

There is a keyword arg for draw_networkx_edge_labels named rotate that is no longer used, should we make the default for labels_horizontal be not rotate instead of False when calling CurvedArrowText? Also, can you walk me through the logic of why rotation=0 for the self-loop case? It looks like we now avoid the whole trans_angle cruft from before, so that's nice. But I'd like to understand how that works.

Good catch. This is how it should be.

Regarding self-loops. Previously, there was only 1 self-loop possible. Which was at the top of the node. So horizontal text was the solid choice that fit all cases. Now, as more of them are possible, it is a choice. And neither of options are ideal. But having 0-rotation is a slightly more reliable one IMO.

If rotation is non-0: Bottom label will be upside down & if text is long enough, side label text might cross with top and bottom labels of self-loops.

If rotation is 0: Top and bottom labels will have no issues. Side labels might overlap each other.

BTW we can change the name of the public facing kwarg from rotate to something else if you like (with a deprecation cycle) but let's do that in another PR.

I think rotate is quite appropriate here - I have no issues with it.

I think I'm done with requests/questions now. :}

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

Thanks!! There is a lot of potential in the drawing module. This is a big step in the right direction.

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spearheading the review @dschult - this has been on my list for a long time as well but I've been daunted by the magnitude of the change and the fact that the visualization is not particularly well-tested.

That said, this is clearly a huge improvement for multigraph visualization! I've just finished another read through and am very +1 on this feature. Rather than clutter this up with more comments, I vote to put it in and deal with any follow-up in separate PRs.

Thanks so much @dgrigonis for the work + iterations and apologies on my part for not getting to this sooner!

networkx/drawing/nx_pylab.py Outdated Show resolved Hide resolved
networkx/drawing/nx_pylab.py Show resolved Hide resolved
@rossbar rossbar merged commit 1fa2414 into networkx:main Feb 6, 2024
39 checks passed
@jarrodmillman jarrodmillman added this to the 3.3 milestone Feb 6, 2024
@dgrigonis
Copy link
Contributor Author

Great news, thank you for support on this - it was a pleasant experience. :)

@dgrigonis dgrigonis deleted the draw_mdg_edges_and_labels_qa7008 branch February 6, 2024 19:02
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
Adds drawing of multiedges and edge labels for multigraphs.
Multiedge styles are specified using connectionstyle and fully
support a subset of simple matplotlib curve-patch styles (i.e.
3-point arc).

Includes significant reorganization of FancyArrowPatch generation.

Co-authored-by: Dan Schult <dschult@colgate.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: Enhancements Visualization Related to graph visualization or layout
Development

Successfully merging this pull request may close these issues.

None yet

5 participants