Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Closeness Centrality #19007

Closed
sagetrac-borassi mannequin opened this issue Aug 10, 2015 · 53 comments
Closed

Refactor Closeness Centrality #19007

sagetrac-borassi mannequin opened this issue Aug 10, 2015 · 53 comments

Comments

@sagetrac-borassi
Copy link
Mannequin

sagetrac-borassi mannequin commented Aug 10, 2015

Now, the closeness centrality algorithm simply calls the corresponding NetworkX routine (only for undirected graphs).

This ticket will move this routine to file generic_graph (closeness centrality is defined also in this case ![1]), and it will modify input/output variables as in routine shortest_path. Then, it will call our shortest path new routines instead of NetworkX routines.

This ticket is a first step towards the inclusion of the algorithm in ![2].

![1] !http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6816651&tag=1

![2] !http://arxiv.org/pdf/1507.01490v1.pdf

Depends on #18931
Depends on #18876
Depends on #18910

CC: @nathanncohen @dcoudert

Component: graph theory

Keywords: Closeness centrality

Author: Michele Borassi

Branch/Commit: 1551c6f

Reviewer: David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/19007

@sagetrac-borassi sagetrac-borassi mannequin added this to the sage-6.9 milestone Aug 10, 2015
@sagetrac-borassi sagetrac-borassi mannequin added the p: major / 3 label Aug 10, 2015
@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 10, 2015

Author: Michele Borassi

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 10, 2015

Changed keywords from none to Closeness centrality

@sagetrac-borassi

This comment has been minimized.

@sagetrac-borassi sagetrac-borassi mannequin changed the title refactor_centrality_closeness Refactor Closeness Centrality Aug 10, 2015
@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 13, 2015

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 13, 2015

Dependencies: #18931, #18876, #18910

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 13, 2015

comment:3

Hello!

This is the first version of the refactoring of closeness_centrality. Let me know if you like it!

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 13, 2015

Commit: 5c85793

@dcoudert
Copy link
Contributor

comment:4

I don't really understand what you are doing in this commit
https://github.com/sagemath/sagetrac-mirror/commits/5c85793ccb8234ac668173620523f45dbae4b985

In the ticket description you say that you will move everything in generic_graph, but it might be more readable/clean to create a new file for centrality measures, no?
Also, you moved only references to the methods, right? (or the modifications are hidden in the merge with #18931).

+            - :meth:`~sage.graphs.graph.Graph.centrality_degree`
+            - :meth:`~centrality_betweenness`

David.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 14, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

5853eb7Merged with beta2

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 14, 2015

Changed commit from 5c85793 to 5853eb7

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 14, 2015

Changed commit from 5853eb7 to a19addb

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 14, 2015

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

a19addbRewrote closeness centrality according to new standards

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 14, 2015

comment:7

Hello!

Thank you for the comment, I will try to address the main issues raised.

I have put everything in a single commit, which simply removes the closeness centrality routine from graph.py and rewrites it in generic_graph.py. Now the ticket should be much easier to review. Since closeness centrality highly depends on shortest paths, the new routine follows all conventions in ticket #18931.

About the creation of a new file: we already have file centrality.pyx, containing fast algorithms for betweenness centrality. However, these routines are called from a centrality_betweenness routine in file generic_graph.py, which should be used by the user. In this ticket, I just copied this policy.

In my opinion, if we want to create a new file, we should at least move also betweenness centrality and degree centrality: in my opinion, this goes beyond the scope of this ticket. Do you agree?

Best,

Michele

Replying to @dcoudert:

I don't really understand what you are doing in this commit
https://github.com/sagemath/sagetrac-mirror/commits/5c85793ccb8234ac668173620523f45dbae4b985

In the ticket description you say that you will move everything in generic_graph, but it might be more readable/clean to create a new file for centrality measures, no?
Also, you moved only references to the methods, right? (or the modifications are hidden in the merge with #18931).

+            - :meth:`~sage.graphs.graph.Graph.centrality_degree`
+            - :meth:`~centrality_betweenness`

David.

@dcoudert
Copy link
Contributor

comment:8

Something is wrong here

+            try:
+                v_iter = iter(vert)
+            except TypeError:
+                v_iter = [v_iter]
+                v_iter = iter(vert)

I assume you want first to have `v_vert = iter(list(vert))

The default of calling distances = self.shortest_path_all_pairs(by_weight,algorithm, weight_function, check_weight)[0] is that you get a dictionary of dictionary, that is a huge data structure for large graphs (lost of memory and long to create). At least the second part of this method should be cythonized to get the array of distances, as is done for centrality_betweenness.

David.

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Aug 14, 2015

comment:9

Something is wrong here

+            try:
+                v_iter = iter(vert)
+            except TypeError:
+                v_iter = [v_iter]
+                v_iter = iter(vert)

+1.

In some cases a vertex can be a pair, and thus is iterable. This would react badly with this code.

sage: g=graphs.KneserGraph(5,2)
sage: v=g.vertices()[0]
sage: g.centrality_closeness(v)
<crash>

Nathann

@dcoudert
Copy link
Contributor

comment:10

Right. My previous proposal is not a good solution.
Something like that might also be unsafe if some labels in vert are not in G, but this can be checked when trying to associate computed values and vertices.

v_vert = iter([vert]) if vert in G else iter(vert)

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 16, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

7dfa2d2Corrected v_iter, cythonized Johnson

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 16, 2015

Changed commit from a19addb to 7dfa2d2

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 16, 2015

comment:12

Hello!

I hope you spent a great Assumption day!

With this new commit, (hopefully) I solved the problems with v_vert and I cythonized the Johnson algorithm. In my opinion, it is not worth cythonizing Floyd-Warshall, because it is much slower than Johnson, especially on sparse graphs.

Thank you very much!

Michele

@dcoudert
Copy link
Contributor

comment:13

Hello,

I'm trying to understand the behavior of the method on digraphs.
You wrote that the centrality of vertices of degree 0 is not reported. Does it means that for digraphs you don't report the centrality of vertices with out-degree 0?
Also, what's the expected behavior on strongly connected digraphs?

David.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 17, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

7a4059aImproved documentation

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 17, 2015

Changed commit from 7dfa2d2 to 7a4059a

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 17, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

5c67f79Small mistake

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 17, 2015

Changed commit from 7a4059a to 5c67f79

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 17, 2015

comment:19

Done!

However, I had to do some "magic" to avoid duplicate references... Hope it is ok!

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 17, 2015

Changed commit from 48514ad to cf9cf87

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 17, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

cf9cf87Trailing whitespaces

@dcoudert
Copy link
Contributor

comment:21

Yep, I add to make doc-clean && make doc to get it compiled.

I just saw in the doc of centrality_degree in graph.py, in section See also, that the link to centrality_closeness is broken. In fact, the code is :meth:`centrality_closeness`. Could you check that?
In fact we should also move this methode to centrality.pyx, but in another ticket.

After that I think it will be ok.

David.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 18, 2015

Changed commit from cf9cf87 to 90638aa

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 18, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

90638aaCorrected link

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 18, 2015

comment:23

Done!

@dcoudert
Copy link
Contributor

comment:24

Hello,

do you understand why the patchbot reports some errors?
I have no error when testing this patch on my computer (install, long tests, trials, docbuild html et pdf,...).

Best,
David.

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 18, 2015

comment:25

What is the patchbot? How can I find these errors?

@dcoudert
Copy link
Contributor

comment:26

on the top of this page, you can see a blue disk (or red or green, etc.) with a link to http://patchbot.sagemath.org/ticket/19007/
It reports on automatic tests of your patch. Tests are performed on all modules.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 18, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

73dd6d5Solve patchbot - first attempt

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 18, 2015

Changed commit from 90638aa to 73dd6d5

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 18, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

f1c7fd5Solve patchbot - second attempt

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 18, 2015

Changed commit from 73dd6d5 to f1c7fd5

@dcoudert
Copy link
Contributor

comment:29

Note that the patchbot needs a lot of time: it is not tested instantaneously and it takes some hours of computations...

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 18, 2015

comment:30

Ok, I will wait until it answers... In my opinion, the problem is that we need to import centrality.pyx in generic_graph at startup, and this slows down the startup time. I tried to change the imports, hoping the total time will decrease. If any of the two attempts work, we are fine, otherwise I probably have to mail sage-devel...

@dcoudert
Copy link
Contributor

comment:31

Well, generic graph is becoming really really big with lots of tests.
For instance, do we really need the tests with for i in range(100): . Can't we have only one test we randomly chosen number of edges? That should be enough.

David.

@sagetrac-borassi
Copy link
Mannequin Author

sagetrac-borassi mannequin commented Aug 19, 2015

comment:32

I'm afraid that all the attempts I have made were unsuccessful. The point is that importing module centrality.pyx at startup is expensive, and I need it to import it to add method centrality_closeness to GenericGraph. Hence, the startup time becomes 2 seconds (too much).

In my opinion, this might be a reason why it is better to leave method centrality_closeness in generic_graph, and put only Cython-related methods inside centrality.pyx, in order to use lazy imports. May I move method centrality_closeness back?

If we still want to move centrality_closeness and related methods to centrality.pyx, I think we should open another ticket, in order to find a way to use lazy imports (I have no idea how, though).

@dcoudert
Copy link
Contributor

comment:33

ok, move it back to generic graph.
David.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 19, 2015

Changed commit from f1c7fd5 to 1551c6f

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Aug 19, 2015

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

1551c6fUndo commit moving closeness_centrality to centrality.pyx. Corrected links.

@dcoudert
Copy link
Contributor

Reviewer: David Coudert

@dcoudert
Copy link
Contributor

comment:35

For me the patch is good to go, but I prefer to wait for the next patchbot answer.
David.

@dcoudert
Copy link
Contributor

comment:36

Since the patchbot reports no error and that the patch passes all tests/docbuild/etc. on my computer, I set it to positive review.
Thanks.
David.

@vbraun
Copy link
Member

vbraun commented Aug 21, 2015

Changed branch from u/borassi/refactor_centrality_closeness to 1551c6f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants