Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knotty Centrality by Shanahan and Wildie #10

Closed
wants to merge 4 commits into from

Conversation

swederik
Copy link

Hello!

This is an implementation of the Knotty Centrality measure by Murray Shanahan and Mark Wildie.
I have ported it from their MATLAB code in the hopes of getting it into NetworkX.

"Attempts to find the sub-graph of G with the highest value for
knotty-centrality. Carries out a series of exhaustive searches on
subsets of the nodes ranked by "indirect" betweenness centrality, then
carries out a phase of hill-climbing to see whether the sub-graph can
be improved by adding further nodes."

The relevant reference is here:

Shanahan M, Wildie M (2012) Knotty-Centrality: Finding the Connective Core of a Complex Network. PLoS ONE 7(5): e36579. doi:10.1371/journal.pone.0036579

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0036579#pone.0036579.s003

and it returns results similar to the rich club coefficient.

UPDATE: I can now confirm that this works as intended...

Testing script and data can be found here:
https://www.dropbox.com/sh/irsau7ojcgtv3z9/ZZvOfckfEX

@bjedwards
Copy link
Member

I know this has been up for a while, but I'll make a few comments if the authors are still interested in getting it incorporated.

The algorithm seems sound, though I haven't read the original paper in depth. I'll make a few inline comments on the code. I think this could probably be included in networkx/algorithms/centrality/knotty.py instead of just algorithms. Your test script looks good. It could be converted into the type of tests we have for other centrality measures, if you take a look in networkx/algorithms/centrality/tests.

nodes, kc = _find_knotty_centre(G, compact)
g = G.subgraph(nodes)

return g, nodes, kc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just return g or nodes. One can be found from the other easily. I would advocate for just nodes

@bjedwards
Copy link
Member

It looks like you use numpy for gml reading and writing, but networkx has a gml reader. Would it be possible to include some simpler examples?

@swederik
Copy link
Author

Your comments are great! I'll fix it as soon as I get a chance.

@hagberg
Copy link
Member

hagberg commented Jan 10, 2015

Needs update or closing.

@swederik
Copy link
Author

Sorry for never getting back to this, my Ph.D. got in the way.

I made the changes suggested above by @bjedwards but when testing I keep getting:

RuntimeError: maximum recursion depth exceeded

from the _best_perm function. I didn't get that error two years ago, so I guess the system recursion depth needs to be changed (it's default 999 apparently). I suppose it needs to be written iteratively rather than recursively.

@hagberg hagberg marked this pull request as draft June 25, 2020 21:03
Base automatically changed from master to main March 4, 2021 18:20
@MridulS MridulS marked this pull request as ready for review February 1, 2023 11:17
@MridulS
Copy link
Member

MridulS commented Feb 2, 2023

@swederik Sometimes getting back to code after 8 years is the charm :D

I went through the PR, made the changes to be usable in the current version, rebased to main and did some minor clean ups. The resulting changes are here main...MridulS:networkx:knotty_cent_review

I tried to run the code but I'm getting the RuntimeError: maximum recursion depth exceeded too. I have tested by increasing the recursion depth limit too but it's stuck in a _best_perm recursion loop. Do let me know if you have the bandwidth or interest to pick this up again and we can get this merged in :)

# value of knotty-centrality

if choices:
choices2 = [choices[i] for i in range(0, len(choices))]
Copy link
Contributor

@boothby boothby Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the original paper, this should be

choices2 = choices[1:]

And the above will remedy the endless recursion. But I question the name of the function and its description. It's clearly iterating over combinations, not permutations.

it would be more pythonic (and not recursive) to write

def _best_perm(given, choices, G, CIJ, compact, BC):
    best_nodes = given
    best_kc = _compute_knotty_centrality(G, CIJ, given, compact, BC)
    for r in range(1, len(choices)+1):
        for subset in combinations(given, r):
            nodes = np.hstack((given, subset))
            kc = _compute_knotty_centrality(G, CIJ, nodes, compact, BC)
            if kc > best_kc:
                best_kc = kc
                best_nodes = nodes
    return best_nodes, best_kc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @boothby!! That indeed fixes the recursion error.

Just to check the algorithm I did a naive implementation and the results don't seem to match up:

from itertools import chain, combinations


def knotty_centrality_naive(G):
    graph_subsets = _graph_powerset(set(G))
    bc = nx.betweenness_centrality(G)
    bc_sum = sum(bc.values())
    bc = {n: val/bc_sum for n, val in bc.items()}
    current_max = 0
    current_max_set = {}
    count = 0
    for subset in graph_subsets:
        kc_val = _knotty_cent(G, subset, bc)
        count += 1
        if kc_val > current_max:
            current_max = kc_val
            current_max_set = subset
            print(current_max, current_max_set)
    return current_max, current_max_set, count


def _graph_powerset(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(2, len(s)+1))
    
def _knotty_cent(G, subset, bc):
    subset_graph = G.subgraph(subset)
    edges = subset_graph.number_of_edges()
    nodes = subset_graph.number_of_nodes()
    bc_subset = sum(bc[i] for i in subset)
    return (edges*bc_subset) / ((nodes - 1) * (nodes))

Gives the highest knotty_centrality of 0.333 but the implementation in the PR gives 0.666.

And interestingly the paper states that knotty centrality is for directed graphs, @swederik do you remember by any chance what was the motivation to put in the error for a directed graph?

Copy link
Contributor

@boothby boothby Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the paper, they define the knotty centrality for subgraphs which have at least two nodes, as you have implemented in your naive version. It appears that the implementation here does not respect that condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was probably because I was not working with directed graphs at the time and did not test it. I ported over the MATLAB code after meeting the authors at the Brain Connectivity Workshop conference at the time. I don't recall if I ever ended up using it in an actual publication.

return nodes, kc

def _compute_knotty_centrality(G, CIJ, nodes, compact, BC):
# Returns knotty-centrality of the subgraph of CIJ comprising only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing CIJ around seems largely wasteful here -- it looks like it's only being used to get the size n = len(G), after a quite expensive "binarise" step which redundant because that happens at the top level.

In retrospect, doing the powerset in a recursive (or pseudo-recursive to avoid blowing the stack limit) manner would be good, because BCtot and density could be done in constant / n time respectively instead of n and n^2 time per call.

@boothby
Copy link
Contributor

boothby commented Feb 5, 2023

On naming: the function _compute_knotty_centrality does what I would expect a function named knotty_centrality would do. The function knotty_centrality, according to the reference, finds the "connective core" or "knotty center" of a graph; the subset of nodes with maximum knotty centrality. I think that it would make sense to expose a function named knotty_centrality to compute the knotty centrality measure for a given subgraph, as well as a function knotty_center to find a knotty center.

@swederik
Copy link
Author

swederik commented Feb 7, 2023

Hi guys,

Thanks so much for putting in the effort to get this merged. I am astonished that this PR still exists after eleven years! I'm not working on this topic anymore (I started this PR during my PhD when working on brain networks) and I am currently on paternity leave so I can't really help too much with this, but I think you are all amazing for picking it up and running with it, and I apologize if I was a terrible programmer 11 years ago :-)

I'll tag @mpshanahan here in case he's interested in helping, though I have no idea if he is active on Github.

edit: I've checked the Allow Edits by Maintainers checkbox so you are free to push to the PR branch if you want.

@MridulS
Copy link
Member

MridulS commented Mar 9, 2023

@swederik I have created a new issue which deals with clean up of this PR #6475 and if anyone is interested they can pick it up from here. Thanks again for submitting your work and sorry for the long delay :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

5 participants