Layouts meta-issue #736

bdpedigo · 2021-03-19T18:52:45Z

This issue is to organize some thoughts on changes I'd like to see for our layout code. Happy to have feedback.

Is your feature request related to a problem? Please describe.

Layouts should run on directed networks (see [BUG] Layouts fail for directed graphs #655). If the user wants to color by automatically detected community, then I'd suggest that we just symmetrize the graph internally (only for the community detection step) and proceed. I don't think this is overstepping our bounds as to what the user wanted, personally, because this is typically the notion of community for directed graphs.
Automatic layouts should also be possible using ASE/LSE. I have had a lot of success generating layouts that I like using ASE as opposed to n2v so I think "algorithm" could just be a parameter.
Allow passing dictionaries of arguments to stage 1 (n2v/ASE/LSE) and stage 2 (umap/tsne) (see Add more configuration options to automatic layouts #612)
Should have an interface to the layouts code that feels like what we have in .plot and seaborn. I think we could refactor the code such that we can still have all-in-one functions that eat graphs and spit out plots like layouts.save_graph, but there's no reason we can't also let people interact with some of these functions in a way that's more like matplotlib/seaborn.
NoOverlap should be an option the user can turn on or off.
- Addressed in Optional layout steps #758
Layouts should run on adjacency matrices also

Provide references (if applicable)

I have some code here which is closer to the flavor of what I'd want to see for graph plotting code that works with seaborn/matplotlib. Just a reference, not necessarily how we should do anything here. One thing I never figured out for that code which would be nice is automatic scaling of edge/node widths/sizes etc. based on the size of the graph, so that an initial shot at an embedding at least looks reasonable.

The text was updated successfully, but these errors were encountered:

dfrancisco1998 · 2021-03-19T18:56:51Z

sorry. Commented on wrong one... but also interested in this one

daxpryce · 2021-03-19T18:58:22Z

https://media.giphy.com/media/Fwn1auHR2LUGI/giphy.gif

daxpryce · 2021-03-22T18:32:17Z

@bryantower what do you think about adding ASE/LSE instead of n2v?

@bdpedigo I think we should clarify what an "automatic layout" should have at the conceptual level. Automatic layout, to me, should be something that just takes a graph and spits out a reasonable and useful expression of it in 2d space with some colors that mean something, sizes that mean something, relative locations that mean something, and ... I think that's it. Having used our layouts without no overlaps turned on, it was hard to get anything useful out of it? Maybe there's a use that I'm not anticipating, though (e.g. maybe you want to handle overlaps by having a low alpha level so that super hot areas stack and look more vibrantly colored?)

It's worth a conversation! I don't think I'm the best person to discuss all the ramifications of having a graph with tons of overlaps and anything principled. Maybe @natoverse or our network visuals team would have some ideas on what it means to specifically want to opt into having overlaps?

bryantower · 2021-03-22T18:59:14Z

I like the idea of the embedding type being specified with a parameter. I am sure we could come up with a reasonable way to encode the parameters required for each of the embedding types.
I also think that having no-overlap as an option is reasonable.
I think that having reasonable defaults for all of this stuff so you can just get a decent layout is important.

bdpedigo · 2021-03-22T19:02:48Z

Definitely down to discuss further, especially because @dfrancisco1998 is interested in working on some of this in the near futurre.

@bdpedigo I think we should clarify what an "automatic layout" should have at the conceptual level.

Yes, good point. I think in the above I mostly just meant "layout" so I will clarify my language above. For my use cases, I usually have node or edge metadata that I also want to understand, as well as a bunch of other considerations. So I think what I am more or less talking about is having many of the constituent parts for "automatic layout" actually exposed as functions so that someone who wants to do a deep dive/customize can play with them, but then of course we'll still have what you call "automatic layout" which is a 1-shot, reasonable default pipeline.

Having used our layouts without no overlaps turned on, it was hard to get anything useful out of it?

Are you saying you're surprised I want the no overlaps = off option? Reasons for this are 1. removing overlaps is often much of the time it takes to do the whole pipeline in my experience with no-overlap stuff, 2. UMAP tends to produce layouts that don't overlap a crazy amount in my experience, and 3. (this is somewhat of a minor anticipated use case) if I see something weird in an embedding/layout I usually try turning off things, especially steps with randomness, and this is one of the things. To be clear, I think for a final product removing overlaps will often be best, but I don't think that's a reason to always force it to be on cause of the reasons above.

Also I'd just add that I've been able to get layouts I'm fairly happy with using ASE/UMAP and without no-overlap (tho I think the no-overlap code would improve this picture, that code isn't actually exposed in graspologic right now, so I haven't done it). see example below (please ignore the squigly lines on the border for the purposes of graph layout conversation).

daxpryce · 2021-03-22T19:09:44Z

That's interesting that umap without overlaps doesn't have a lot of overlaps for you - in our experience it's quite the opposite! But that is good to know - if it isn't rife with overlaps for you, then it would absolutely make sense to disable it.

daxpryce · 2021-03-23T17:23:50Z

so I think this is something we def. could/should do - do you want to take a stab at doing it and add @bryantower and me to the PR specifically? it'll probably get done faster that way

edit: mostly because I don't know what sorts of sensible defaults ase/lse should have to get a pretty good layout right out of the box

bryantower · 2021-03-23T17:49:51Z

One option is to not provide defaults for ASE / LSE. The user would have to provide the proper configs for those embedding algorithms.

daxpryce · 2021-03-23T17:54:57Z

Maybe a better route overall would be to offer a semi-automatic method; you bring the embedding, after that we'll take over and do the umap/tsne/nooverlap (optional)/coloring/sizing/rendering?

daxpryce · 2021-03-23T17:55:51Z

At the least this would be something we could just extract out real fast and then you can bring your own embeddings that you probably have or will be generating anyway later in whatever workflow you have. Then we can take the next step and rope ASE/LSE into the automatic portion?

dfrancisco1998 · 2021-04-06T18:11:41Z

@dwaynepryce I am currently trying to make a layouts tutorial (before I start addressing some of the meta-issues just to gain familiarity with the layouts ode base). Do you have a dataset in mind that would be cool for this?

daxpryce · 2021-04-06T18:14:14Z

uhm... maybe in the shortest term, run with https://raw.githubusercontent.com/microsoft/graspologic/dev/tests/test_data/large-graph.csv ? We took some statistical characteristics of an actual graph based on actual data to use an SBM to fabricate a synthetic graph. I'm not actually sure how it'll look, but let's start with that and see if it looks reasonable or not?

dfrancisco1998 · 2021-04-06T18:33:20Z

@dwaynepryce I assume its node_name1, node_name2, weight?

daxpryce · 2021-04-06T18:34:02Z

Yep!

Edit: More specifically:
source,target,weight\n

dfrancisco1998 · 2021-04-08T14:44:23Z

So I am running TSNE on the file you sent to me like this
from graspologic.layouts import layout_tsne

tupl = layout_tsne(g, perplexity = 3, n_iter = 250, random_seed = 23)
But TSNE is taking a long time to complete. Are there any parameters i could change to fix this?

Did not take longer than 7 min but any way to optimize runtime would be appreciated!

daxpryce · 2021-04-08T18:05:56Z

@bryantower is on spring break and I'm about to check out for the week too, but a quick check of the docs says that you may want a larger perplexity for bigger graphs (though I'm not sure that particular sbm really qualifies)

That said, you should expect this to take a fair amount of time - especially with the no overlap portion on!

bdpedigo · 2021-04-08T18:50:56Z

reducing n_iter may also help, though hard to say if it will look nice.

How big is the graph (number of nodes and number of edges)?

dfrancisco1998 · 2021-04-08T21:19:18Z

There are 7183 nodes and 151981 edges. I think I need a min of 250 for n_iter . I will try to increase perplexity and see if that changes runtime. It really only takes about 7 min for either UMAP or TSNE, but it is still a bit.

dfrancisco1998 · 2021-04-14T17:30:42Z

@dwaynepryce I am trying to use git to upload the layouts tutorial and the file is too large. Do you happen to have a smaller graph csv?

bdpedigo · 2021-04-14T17:34:41Z

@dfrancisco1998 are you mainly just working on #613 ? if so, can we move this conversation there instead (since it's the specific thing you're working on for the time being)?

bdpedigo · 2021-05-06T18:39:52Z

just posting a link to some recent layouts I did using UMAP on ASE/LSE on a brain connectome with ~25,000 nodes, I thought the ASE/LSE stuff looked good! Have not yet compared to n2v, but I thought it is worth looking at

https://docs.neurodata.io/notebooks/pedigo/graspologic/connectome/2021/05/06/hemibrain-layout.html

diane-lee-01 · 2021-06-02T14:08:15Z

interested in this!

bdpedigo · 2021-06-02T14:30:07Z

Hi @dlee0156 - here's a high level overview of what we are doing for graph layouts conceptually.

Input

network (could be networkx object, could be a sparse or dense adjacency)
(optional) metadata. This could be labels for nodes, labels for edges, etc.

High-level algorithm

Do some kind of network embedding (typically Adjacency/laplacian spectral embedding or Node2vec)
Run UMAP/TSNE on the network embedding. This yields a 2-dimensional representation for each node
(Optional) Run no-overlap
Plot the nodes:
- We often size each node by its degree
- We may color the nodes either by some sort of node metadata or we might run Leiden to predict communities and color by that
Plot the edges:
- We often color each node by the incident source node color, but could also color by something else
- Sometimes if there are far to many edges, I will subsample them for plotting
- Usually have to play with line thickness/alpha values to make it look nice
(Experimental) Annotations
- Sometimes I put labels on top of known or predicted groups of nodes
- Might want to highlight particular nodes or edges

bdpedigo · 2021-06-02T14:40:45Z

@dlee0156 this is my janky prototype code for layouts:
https://github.com/bdpedigo/giskard/blob/e88dbdb67c1c373f946f7295a598abaf375bd579/giskard/plot/graph_layout.py#L25

and this is the code in graspologic currently: https://github.com/microsoft/graspologic/blob/2125f27bc3f2739f4f2c784d5b700417df63c5d7/graspologic/layouts/auto.py#L112

bdpedigo · 2021-06-07T16:40:17Z

@dwaynepryce @bryantower how do we feel about making a graph undirected (by averaging i -> j and j -> i edge weights) JUST for the purposes of running Leiden for coloring nodes in the layout process? We could also throw a warning if we have to do this? Leiden is one of the rare cases where I think it is actually a sensible behavior to symmetrize the network prior to running and this is always what I want when I hand it a directed graph.

daxpryce · 2021-06-07T16:42:32Z

Is there any benefit to averaging instead of just aggregating? I think we do the latter now, so I'm curious what averaging gets us instead

bdpedigo · 2021-06-07T16:47:18Z

I see we talked about this #655

my aggregate you mean sum?

bdpedigo · 2021-06-07T16:49:18Z

since sum is literally avg * 2 I don't expect there to be any difference in modularity, leiden, etc. We could do either, I just said average because that's what we do in symmetrize currently but sum could also be an option

daxpryce · 2021-06-07T17:39:55Z

my aggregate you mean sum?

Yep!

I have no problem averaging, summing is simpler when it comes to multigraphs but I don't even think that applies here (off the top of my head, I think we balk at a multigraph in general in what we currently have)

bdpedigo · 2021-06-07T17:45:55Z

Do you think we need to warn the user that this is happening or just do it automatically and explain in the docs?

bdpedigo · 2021-06-07T17:46:22Z

@dlee0156 I think this confirms what we were thinking/talking about?

daxpryce · 2021-06-07T17:47:50Z

Do you think we need to warn the user that this is happening or just do it automatically and explain in the docs?

We probably should warn. I don't like automatically inferring/doing things without warning someone - just in case our "we think this is probably safe" ends up being not safe, but the user won't ever know we're trying to be too clever for our own good

bdpedigo added enhancement New feature or request ndd tasks that may be appropriate for Neuro Data Design course discussion graspologic team discussion labels Mar 19, 2021

bdpedigo assigned dfrancisco1998 Mar 19, 2021

bdpedigo unassigned dfrancisco1998 Jun 2, 2021

bdpedigo assigned diane-lee-01 Jun 2, 2021

diane-lee-01 mentioned this issue Jun 30, 2021

Making layouts/auto.py work for directed graphs #807

Merged

4 tasks

bdpedigo added this to New feature in bdpedigo Jan 28, 2022

bdpedigo unassigned diane-lee-01 Jan 28, 2022

bdpedigo closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layouts meta-issue #736

Layouts meta-issue #736

bdpedigo commented Mar 19, 2021 •

edited

dfrancisco1998 commented Mar 19, 2021

daxpryce commented Mar 19, 2021

daxpryce commented Mar 22, 2021

bryantower commented Mar 22, 2021

bdpedigo commented Mar 22, 2021

daxpryce commented Mar 22, 2021

daxpryce commented Mar 23, 2021 •

edited

bryantower commented Mar 23, 2021

daxpryce commented Mar 23, 2021

daxpryce commented Mar 23, 2021

dfrancisco1998 commented Apr 6, 2021

daxpryce commented Apr 6, 2021

dfrancisco1998 commented Apr 6, 2021

daxpryce commented Apr 6, 2021 •

edited

dfrancisco1998 commented Apr 8, 2021 •

edited

daxpryce commented Apr 8, 2021

bdpedigo commented Apr 8, 2021

dfrancisco1998 commented Apr 8, 2021

dfrancisco1998 commented Apr 14, 2021

bdpedigo commented Apr 14, 2021

bdpedigo commented May 6, 2021

diane-lee-01 commented Jun 2, 2021

bdpedigo commented Jun 2, 2021

bdpedigo commented Jun 2, 2021 •

edited

bdpedigo commented Jun 7, 2021

daxpryce commented Jun 7, 2021

bdpedigo commented Jun 7, 2021

bdpedigo commented Jun 7, 2021

daxpryce commented Jun 7, 2021

bdpedigo commented Jun 7, 2021

bdpedigo commented Jun 7, 2021

daxpryce commented Jun 7, 2021

Layouts meta-issue #736

Layouts meta-issue #736

Comments

bdpedigo commented Mar 19, 2021 • edited

Is your feature request related to a problem? Please describe.

Provide references (if applicable)

dfrancisco1998 commented Mar 19, 2021

daxpryce commented Mar 19, 2021

daxpryce commented Mar 22, 2021

bryantower commented Mar 22, 2021

bdpedigo commented Mar 22, 2021

daxpryce commented Mar 22, 2021

daxpryce commented Mar 23, 2021 • edited

bryantower commented Mar 23, 2021

daxpryce commented Mar 23, 2021

daxpryce commented Mar 23, 2021

dfrancisco1998 commented Apr 6, 2021

daxpryce commented Apr 6, 2021

dfrancisco1998 commented Apr 6, 2021

daxpryce commented Apr 6, 2021 • edited

dfrancisco1998 commented Apr 8, 2021 • edited

daxpryce commented Apr 8, 2021

bdpedigo commented Apr 8, 2021

dfrancisco1998 commented Apr 8, 2021

dfrancisco1998 commented Apr 14, 2021

bdpedigo commented Apr 14, 2021

bdpedigo commented May 6, 2021

diane-lee-01 commented Jun 2, 2021

bdpedigo commented Jun 2, 2021

bdpedigo commented Jun 2, 2021 • edited

bdpedigo commented Jun 7, 2021

daxpryce commented Jun 7, 2021

bdpedigo commented Jun 7, 2021

bdpedigo commented Jun 7, 2021

daxpryce commented Jun 7, 2021

bdpedigo commented Jun 7, 2021

bdpedigo commented Jun 7, 2021

daxpryce commented Jun 7, 2021

bdpedigo commented Mar 19, 2021 •

edited

daxpryce commented Mar 23, 2021 •

edited

daxpryce commented Apr 6, 2021 •

edited

dfrancisco1998 commented Apr 8, 2021 •

edited

bdpedigo commented Jun 2, 2021 •

edited