Improve simplification of networks #120

clhunsen · 2018-05-02T12:07:33Z

Description

Currently, the code in the function simplify.network() is quite complex: To preserve an edge for each existing relation, we need to obtain the complete edge list, split it, construct networks from the separate different edge lists again, simplify these, and merge the separate networks in the end (by obtaining their edge lists...).

We need to find a way to apply the EDGE.ATTR.HANDLING on the raw edge lists (i.e., data.frames). Maybe, using the existing list when constructing an appropriate sqldfstatement is a worth a try. Also, trying to use "unique" as a value in the EDGE.ATTR.HANDLING list may solve the issue right away. 😉

https://github.com/se-passau/codeface-extraction-r/blob/2241817620505a396846da1d2afa0264afafb55a/util-networks.R#L1192-L1233

https://github.com/se-passau/codeface-extraction-r/blob/2241817620505a396846da1d2afa0264afafb55a/util-networks.R#L44-L63

Versions

This affects the upcoming version v3.2 as it contains the code of PR #115.

The text was updated successfully, but these errors were encountered:

maxloeffler · 2024-06-26T09:54:35Z

I've come to look at this issue. After inspection, I find that the described issue is indeed present, i.e., we do indeed need to perform the described steps to simplify a network properly. Im unsure what the performance impact of this convolution is though.

Towards the proposed solutions: I don't think we can use "unique" in the EDGE.ATTR.HANDLING, for the reason that I am just unsure what the expected outcome would be. Same goes for the sqldf suggestion. The underlying issue is that the igraph::simplify function (that we use) has no notion of grouping over certain attributes. Therefore, we have to pre-group ourselves. In what way we perform pre-grouping can only have minor impacts on performance and readability I suppose.

Possible solutions I see are: Reimplementing the igraph::simplify function with a notion of grouping. This approach may not be a fan favorite 😂. Then, we could also go to the drawing board and figure out how to include grouping into custom lambda functions that we then set as values into the EDGE.ATTR.HANDLING. This may work, but I have not yet thought about the specifics.

maxloeffler · 2024-07-01T19:03:57Z

Preamble: In our last meeting, we discussed, that any form of performing simplification on the edge list directly involves us reimplementing igraphs simplification logic, which as enticing it may sound for the performing of our code, is definitely not something we want to do. Nevertheless, we do want to reduce the amount of times, where we have to obtain, edit, set, obtain, edit, set ... the edge lists of the individual networks. A good step into that direction is to replace the call to merge.networks with something implemented by the igraph people, in the hope that they may implemented it in a more performant way.

So I have been searching the docs for a function to merge networks natively. igraph::union does not do what we want, as it creates duplicate columns in the edges for every column that appears multiple times in the input networks, i.e., the unified network would have columns type_1, type_2, type_3 ... kind_1, kind_2, kind_3, etc. which would require us to once again iterate the edge list and merge together these artifacts. However, I found that the igraph::disjoint_union function does what we need. The function assumes that input nets are disjoint and creates an output network in which there are copies of every vertex of all input nets with a single nicely formated edgelist. This means in our case, that if we merge 4 networks and all have these vertices A, B, C, then the disjoint union will have vertices A, B, C, A, B, C, A, B, C, A, B, C but the edge list will appear "correctly". We can then contract all vertices that are effectively equivalent through igraph::contract and then achieve the same output as with merge.networks (all tests ☑️).
I am not here to judge for the efficiency of this solution. Under the assumption, that the igraph people did a nice job in the implementation of disjoint_union and contract, then we may achieve a significant benefit, otherwise, Im not sure. Also I don't have the big testing data on me to evaluate performance. Nevertheless, this solution does reduces code duplication and would make merge.networks and merge.network.data mostly obsolete.

My code (util-network.R line 1720ff):

## merge the simplified networks
network = igraph::disjoint_union(networks)
vertex.count = igraph::vcount(networks[[1]])
network = igraph::contract(network,
                           rep(1:vertex.count, length(networks)),
                           vertex.attr.comb = "first")

bockthom · 2024-08-01T18:16:31Z

My code (util-network.R line 1720ff):

## merge the simplified networks
network = igraph::disjoint_union(networks)
vertex.count = igraph::vcount(networks[[1]])
network = igraph::contract(network,
                           rep(1:vertex.count, length(networks)),
                           vertex.attr.comb = "first")

I have tested the code for merging networks suggested by @MaLoefUDS on three different projects.
First of all: In all cases, the resulting networks of using the previous code and of using the suggested new code are equal. So, the suggested code seems to be a valid replacement of the merge.networks function.

Then, I checked the performance. I've chosen a rather naive approach having only 0 or 1 repetitions – consequently, my measurements are not really reliable. In particular, I did not measure how long the actual merging step takes, but I measured the time of executing the complete simplify.network function. Nevertheless, here are my results:

size of the resulting network	old impl	new suggestion	performance improvement
39275 edges	14.34056 s	12.48395 s	1.85661 s
133698 edges	32.26324 s	28.59812 s	3.66512 s
133698 edges	30.15977 s	28.11135 s	2.04842 s
312671 edges	74.86754 s	76.97657 s	-2.10903 s
312671 edges	76.93353 s	77.89671 s	-0.96318 s

So, the performance improvement seems to be marginal or even non-existent. For smaller projects, we have observed an improvement of around 2 seconds, but for the largest project in my test set, the performance has slightly worsened.

@MaLoefUDS @hechtlC: Do you think it is worth to repeat with more projects (even larger projects) and more repetitions, to find out whether there is a real performance improvement on average, or should we just stop the experiment here as the result of my naive approach are not really promising?

And another issue I've spotted: The new code produces the following warning:

Warning message:
In igraph::disjoint_union(networks) :
  Duplicate vertex names in disjoint union

As we actually want to reduce the number of warnings and not increase it, I dislike this warning...

So, in general, I am not sure if should continue here or just stop because there is not really an observable improvement... What are your thoughts?

maxloeffler · 2024-08-06T09:30:51Z

The preliminary differences show no clear pattern, so I guess we can say that igraph does not implement disjoint_union with their secret backend sauce and that our current approach is in general not noticeably worse. From my side, we can conclude here, but im open for objections.

bockthom · 2024-08-07T20:24:19Z

As we cannot see any promising performance improvement from our preliminary performance checks, there is agreement that we don't follow up on the idea of replacing the implementation of merge.networks.

As there were no other promising ideas on how to significantly improve the simplification of networks, I'll close this issue.
Nevertheless, many thanks go to @MaLoefUDS for thinking about this issue and for experimenting with possible ideas.

clhunsen added the enhancement label May 2, 2018

clhunsen added this to the Future milestone May 2, 2018

bockthom assigned maxloeffler Jun 19, 2024

bockthom closed this as not planned Won't fix, can't repro, duplicate, stale Aug 7, 2024

bockthom added the closed/wontfix label Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve simplification of networks #120

Improve simplification of networks #120

clhunsen commented May 2, 2018 •

edited

Loading

maxloeffler commented Jun 26, 2024

maxloeffler commented Jul 1, 2024 •

edited

Loading

bockthom commented Aug 1, 2024

maxloeffler commented Aug 6, 2024

bockthom commented Aug 7, 2024

Improve simplification of networks #120

Improve simplification of networks #120

Comments

clhunsen commented May 2, 2018 • edited Loading

Description

Versions

maxloeffler commented Jun 26, 2024

maxloeffler commented Jul 1, 2024 • edited Loading

bockthom commented Aug 1, 2024

maxloeffler commented Aug 6, 2024

bockthom commented Aug 7, 2024

clhunsen commented May 2, 2018 •

edited

Loading

maxloeffler commented Jul 1, 2024 •

edited

Loading