Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic summary stat plotting #26

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

camillescott
Copy link
Contributor

@camillescott camillescott commented Apr 3, 2018

WIP: Addresses #7.

Currently supports weighted and unweighted adjacency matrices. For now, uses a seaborn heatmap. I've added a decorator for an optional dependency, and the function imports seaborn and pandas at function scope; this way, they don't have to have them installed, and will get a warning and return value of None if they call the function without them. Looking for thoughts on whether this is an antipattern, and general thoughts on the default style I've chosen.

Some example plots:

Regular adjacency matrix:
Imgur

Weighted:
weighted

@camillescott camillescott added enhancement New feature or request question Further information is requested labels Apr 3, 2018
@dschult
Copy link
Member

dschult commented Apr 3, 2018

The figures look good. It seems like plt.matshow
could do much of this -- unless seaborn is using an algorithm to order the nodes.
Does anyone know matshow well enough to say whether it could do more than the seaborn heatmap for showing matrices?

@camillescott
Copy link
Contributor Author

camillescott commented Apr 3, 2018 via email

@dschult
Copy link
Member

dschult commented Apr 3, 2018

It look like they also cluster the nodes using scipy clustering algorithms, so its more than just showing the matrix. Very cool... :)

@jarrodmillman
Copy link
Member

It might make sense to use a diverging colormap for the weighted adjacency matrix. For example, this is what I used for one of my slides in the intro talk:

# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)

@NelleV
Copy link
Contributor

NelleV commented Apr 3, 2018

I'm concerned about adding seaborn as a dependency. If seaborn already supports this well, then why not just let users depend on it on their own. Adjancency matrices are not hard to plot.

@jarrodmillman
Copy link
Member

@NelleV They aren't hard to plot. However, it was the plot that I spent the most time on (granted I didn't have many plots) and I wasn't very happy with the final result. We should at least give them a couple of examples (in the gallery) even if we don't provide a helper function. For example, one with the names on the sides (like you provide me for my talk) would be nice to include in the gallery. It would have saved me a significant amount of time, while preparing my talk.

Is there any particular difficulty installing seaborn? For example, does it need a compiler or special libraries? (Like you, I use Linux so I am not always aware of installation difficulties.) Or is your concern more a general reluctance to having too many dependencies? What do you think about adding extra dependencies to some of the examples? Another option would be to make the seaborn dependency optional (e.g., matplotlib is optional for networkx).

Thoughts?

@jarrodmillman
Copy link
Member

Alternatively, we might also add any desired functionality to matplotlib. I ended up using seaborn because I couldn't easily figure out how to get the plots I wanted using matplotlib.

Thoughts?

@camillescott
Copy link
Contributor Author

My reasons here were:

  1. Adjacency matrices aren't that hard to plot, but for that matter neither are heatmaps; and yet, people still prefer seaborn. If we can take a very commonly used plot and bring it from 20ish lines of boilerplate down to 1, I think that's a win.
  2. Seaborn is easy to install and well maintained (it has no weird dependencies or build reqs), and a lot of people using Python for sciencey things already use it.
  3. There's no sense reinventing the wheel if there's already a good implementation that meets our needs.

One argument I could see against (3) is if we want to more easily support linking our interactivity support in with the adjacency matrix.

@NelleV
Copy link
Contributor

NelleV commented Apr 3, 2018

I just don't see the point in adding a functionality that both matplotlib and seaborn has, and this seems sufficiently different from what we already support to at least wait before including it in the package. I think we should first focus on plotting graphs well before adding other types of plots, and that includes adjacency plots.

I'm also in general concerned about adding dependencies, in particular one such as seaborn, which relies on very few contributors. The cost in terms of maintenance is always higher than what one would expect and seaborn is not a standard dependency.

Adding seaborn as a dependency in the documentation seems more reasonable to me, if it is only a question of showing to people how to do this using the package (although it might be even better to add this example to the seaborn gallery).

@jarrodmillman
Copy link
Member

I would prefer adding the example to grave. When I started trying to figure out how to plot this, I first looked to see if it was supported by networkx drawing, then I looked at matplotlib, and then finally (after over a day of trying to get a plot I liked) I looked at the seaborn documentation.

Including something in the library or in the examples would have saved me an hour or two. To produce the adjacency matrix for my talk involved Stefan writing a prototype using matplotlib, you improving it, and then me completely redoing everything in seaborn. And at the end of that process, I had plots I like less than the one in this PR.

Given that it took as much time as it took Stefan, you, and me to produce something less desirable than this, I suspect either we aren't very familiar with matplotlib and seaborn as others or it is not clear how one should plot nice adjacency matrices for graphs.

I don't do a lot of graph plots, but the first thing I normally look at is the adjacency matrix. It seems like a useful plot to make easy for people to figure out how to use.

I don't share your concern about adding seaborn as a dependency. But if it is a widely shared concern, I am happy to defer to others.

@camillescott
Copy link
Contributor Author

I think Jarrod has provided a nice example for my reasoning here. Also note that this implementation doesn't require seaborn to be added to grave's requirements; it fails gracefully and warns if it isn't found -- though I still am not sure if this is an antipattern.

@NelleV
Copy link
Contributor

NelleV commented Apr 3, 2018

My 2 cent is that this is doing something orthogonal to the current scope of the library, and that there are plenty of features we should first be adding to the plot_network function before tackling this. It's also going to break very easily very fast (considering Matplotlib's internal), and is really hard to test. It relies on a library which is currently barely maintained, and adds not one, but two hidden dependencies (pandas is not currently a dependency of grave).

I still don't think this is a feature nice enough to justify the cost of maintenance this will put on us.

If we are to add this, I think we should not have hidden dependencies in our code. Pandas and seaborn should then be explicit dependencies.

@camillescott
Copy link
Contributor Author

As a compromise, let's just say for now that we'll leave this open for a month or so and see where things are; if there's been sufficient movement in our core functionality and there's still support for summary stat plotting, we'll discuss again.

@tacaswell
Copy link
Collaborator

I had this open to post something just now as well, I am 👍 on @camillescott 's suggestion.

This sort of thing is in the list of things to support, but I agree with @NelleV 's concerns about dependencies (and hidden dependencies).

When this gets re-addressed we should loop @ericmjl into the discussion.

@jarrodmillman
Copy link
Member

We should also discuss whether we will allow optional dependencies (I am not sure what hidden dependencies are, but they sound bad). I don't have any problems with optional dependencies. For example, in NetworkX by default pip will install just the core dependencies. In order to install the optional dependencies as well requires:

$ pip install networkx[all]

The bigger question is what the scope of Grave is and who it is for. For me, the most interesting graph plots are the ones we are talking about whether to support (e.g., adjacency matrices and other summary statistics plots). Without them, Grave will be of limited interest to me. Perhaps some of this functionality could be added to Matplotlib, if it is out of scope for Grave.

The other thing that Grave could do (which would be useful for me) is to allow us to remove the plotting code from NetworkX. I am also interested in whether that is in scope for Grave.

Base automatically changed from master to main March 2, 2021 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Development

Successfully merging this pull request may close these issues.

5 participants