Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve simplification of networks #120

Closed
clhunsen opened this issue May 2, 2018 · 5 comments
Closed

Improve simplification of networks #120

clhunsen opened this issue May 2, 2018 · 5 comments

Comments

@clhunsen
Copy link
Collaborator

clhunsen commented May 2, 2018

Description

Currently, the code in the function simplify.network() is quite complex: To preserve an edge for each existing relation, we need to obtain the complete edge list, split it, construct networks from the separate different edge lists again, simplify these, and merge the separate networks in the end (by obtaining their edge lists...).

We need to find a way to apply the EDGE.ATTR.HANDLING on the raw edge lists (i.e., data.frames). Maybe, using the existing list when constructing an appropriate sqldfstatement is a worth a try. Also, trying to use "unique" as a value in the EDGE.ATTR.HANDLING list may solve the issue right away. 😉

https://github.com/se-passau/codeface-extraction-r/blob/2241817620505a396846da1d2afa0264afafb55a/util-networks.R#L1192-L1233

https://github.com/se-passau/codeface-extraction-r/blob/2241817620505a396846da1d2afa0264afafb55a/util-networks.R#L44-L63

Versions

This affects the upcoming version v3.2 as it contains the code of PR #115.

@clhunsen clhunsen added this to the Future milestone May 2, 2018
@maxloeffler
Copy link
Contributor

I've come to look at this issue. After inspection, I find that the described issue is indeed present, i.e., we do indeed need to perform the described steps to simplify a network properly. Im unsure what the performance impact of this convolution is though.

Towards the proposed solutions: I don't think we can use "unique" in the EDGE.ATTR.HANDLING, for the reason that I am just unsure what the expected outcome would be. Same goes for the sqldf suggestion. The underlying issue is that the igraph::simplify function (that we use) has no notion of grouping over certain attributes. Therefore, we have to pre-group ourselves. In what way we perform pre-grouping can only have minor impacts on performance and readability I suppose.

Possible solutions I see are: Reimplementing the igraph::simplify function with a notion of grouping. This approach may not be a fan favorite 😂. Then, we could also go to the drawing board and figure out how to include grouping into custom lambda functions that we then set as values into the EDGE.ATTR.HANDLING. This may work, but I have not yet thought about the specifics.

@maxloeffler
Copy link
Contributor

maxloeffler commented Jul 1, 2024

Preamble: In our last meeting, we discussed, that any form of performing simplification on the edge list directly involves us reimplementing igraphs simplification logic, which as enticing it may sound for the performing of our code, is definitely not something we want to do. Nevertheless, we do want to reduce the amount of times, where we have to obtain, edit, set, obtain, edit, set ... the edge lists of the individual networks. A good step into that direction is to replace the call to merge.networks with something implemented by the igraph people, in the hope that they may implemented it in a more performant way.

So I have been searching the docs for a function to merge networks natively. igraph::union does not do what we want, as it creates duplicate columns in the edges for every column that appears multiple times in the input networks, i.e., the unified network would have columns type_1, type_2, type_3 ... kind_1, kind_2, kind_3, etc. which would require us to once again iterate the edge list and merge together these artifacts. However, I found that the igraph::disjoint_union function does what we need. The function assumes that input nets are disjoint and creates an output network in which there are copies of every vertex of all input nets with a single nicely formated edgelist. This means in our case, that if we merge 4 networks and all have these vertices A, B, C, then the disjoint union will have vertices A, B, C, A, B, C, A, B, C, A, B, C but the edge list will appear "correctly". We can then contract all vertices that are effectively equivalent through igraph::contract and then achieve the same output as with merge.networks (all tests ☑️).
I am not here to judge for the efficiency of this solution. Under the assumption, that the igraph people did a nice job in the implementation of disjoint_union and contract, then we may achieve a significant benefit, otherwise, Im not sure. Also I don't have the big testing data on me to evaluate performance. Nevertheless, this solution does reduces code duplication and would make merge.networks and merge.network.data mostly obsolete.

My code (util-network.R line 1720ff):

## merge the simplified networks
network = igraph::disjoint_union(networks)
vertex.count = igraph::vcount(networks[[1]])
network = igraph::contract(network,
                           rep(1:vertex.count, length(networks)),
                           vertex.attr.comb = "first")

@bockthom
Copy link
Collaborator

bockthom commented Aug 1, 2024

My code (util-network.R line 1720ff):

## merge the simplified networks
network = igraph::disjoint_union(networks)
vertex.count = igraph::vcount(networks[[1]])
network = igraph::contract(network,
                           rep(1:vertex.count, length(networks)),
                           vertex.attr.comb = "first")

I have tested the code for merging networks suggested by @MaLoefUDS on three different projects.
First of all: In all cases, the resulting networks of using the previous code and of using the suggested new code are equal. So, the suggested code seems to be a valid replacement of the merge.networks function.

Then, I checked the performance. I've chosen a rather naive approach having only 0 or 1 repetitions – consequently, my measurements are not really reliable. In particular, I did not measure how long the actual merging step takes, but I measured the time of executing the complete simplify.network function. Nevertheless, here are my results:

size of the resulting network old impl new suggestion performance improvement
39275 edges 14.34056 s 12.48395 s 1.85661 s
133698 edges 32.26324 s 28.59812 s 3.66512 s
133698 edges 30.15977 s 28.11135 s 2.04842 s
312671 edges 74.86754 s 76.97657 s -2.10903 s
312671 edges 76.93353 s 77.89671 s -0.96318 s

So, the performance improvement seems to be marginal or even non-existent. For smaller projects, we have observed an improvement of around 2 seconds, but for the largest project in my test set, the performance has slightly worsened.

@MaLoefUDS @hechtlC: Do you think it is worth to repeat with more projects (even larger projects) and more repetitions, to find out whether there is a real performance improvement on average, or should we just stop the experiment here as the result of my naive approach are not really promising?

And another issue I've spotted: The new code produces the following warning:

Warning message:
In igraph::disjoint_union(networks) :
  Duplicate vertex names in disjoint union

As we actually want to reduce the number of warnings and not increase it, I dislike this warning...

So, in general, I am not sure if should continue here or just stop because there is not really an observable improvement... What are your thoughts?

@maxloeffler
Copy link
Contributor

The preliminary differences show no clear pattern, so I guess we can say that igraph does not implement disjoint_union with their secret backend sauce and that our current approach is in general not noticeably worse. From my side, we can conclude here, but im open for objections.

@bockthom
Copy link
Collaborator

bockthom commented Aug 7, 2024

As we cannot see any promising performance improvement from our preliminary performance checks, there is agreement that we don't follow up on the idea of replacing the implementation of merge.networks.

As there were no other promising ideas on how to significantly improve the simplification of networks, I'll close this issue.
Nevertheless, many thanks go to @MaLoefUDS for thinking about this issue and for experimenting with possible ideas.

@bockthom bockthom closed this as not planned Won't fix, can't repro, duplicate, stale Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants