Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Sage to compute clustering coefficient #18834

Closed
nathanncohen mannequin opened this issue Jul 1, 2015 · 28 comments
Closed

Use Sage to compute clustering coefficient #18834

nathanncohen mannequin opened this issue Jul 1, 2015 · 28 comments

Comments

@nathanncohen
Copy link
Mannequin

nathanncohen mannequin commented Jul 1, 2015

Will have to be rebased+adapted over #18811.

sage: g = graphs.CompleteGraph(500)
sage: %time _=g.clustering_coeff(implementation="boost")
CPU times: user 7.62 s, sys: 8 ms, total: 7.62 s
Wall time: 7.63 s
sage: %time _=g.clustering_coeff()
CPU times: user 340 ms, sys: 0 ns, total: 340 ms
Wall time: 328 ms
sage: %time _=g.clustering_coeff(implementation="dense_copy")
CPU times: user 56 ms, sys: 4 ms, total: 60 ms
Wall time: 61.3 ms

Depends on #18811

CC: @sagetrac-borassi @dcoudert

Component: graph theory

Author: Nathann Cohen

Branch/Commit: 8c25063

Reviewer: David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/18834

@nathanncohen nathanncohen mannequin added this to the sage-6.8 milestone Jul 1, 2015
@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 1, 2015

Branch: u/ncohen/18834

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 1, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

5a40d92trac #18834: Use Sage to compute clustering coefficient

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 1, 2015

Commit: 5a40d92

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 1, 2015

Author: Nathann Cohen

@nathanncohen nathanncohen mannequin added the s: needs review label Jul 1, 2015
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 1, 2015

Changed commit from 5a40d92 to dba89c2

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 1, 2015

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

283c61bFirst draft
18bd01aApplied Nathann's suggestions
99bd994Few corrections in the reference manual
40df844trac #18811: Review
6c8d49bMore reviewer comments
dba89c2trac #18834: Use Sage to compute clustering coefficient

@nathanncohen

This comment has been minimized.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 2, 2015

Changed commit from dba89c2 to e25d6a4

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 2, 2015

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

c365d02Removed "vertices is None" from boost_clustering_coeff
582ce34Changed for v in G.vertices() => for v in G
e25d6a4trac #18834: Use Sage to compute clustering coefficient

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 2, 2015

comment:7

rebased on top of #18811.

@dcoudert
Copy link
Contributor

dcoudert commented Jul 3, 2015

comment:8

For me the patch is good to go (and the boost method is pretty fast in fact).
However, the patch could be rebased on beta7 to solve the patchbot issue.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 3, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

6274bfftrac #18834: Merged with 6.8.beta7

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 3, 2015

Changed commit from e25d6a4 to 6274bff

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 3, 2015

comment:10

Done! But what do you mean by 'boost is pretty fast'?? Sage is 100x faster O_o

@dcoudert
Copy link
Contributor

dcoudert commented Jul 3, 2015

comment:11

On a clique that's for sure, but on other kinds of graphs...

sage: g = graphs.RandomBarabasiAlbert(1000,2)
sage: %timeit _=g.clustering_coeff(implementation="boost")
1000 loops, best of 3: 1.9 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="dense_copy")
100 loops, best of 3: 7.34 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="sparse_copy")
100 loops, best of 3: 13.9 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="networkx")
100 loops, best of 3: 19.3 ms per loop

sage: g = graphs.RandomBarabasiAlbert(10000,2)
sage: %timeit _=g.clustering_coeff(implementation="boost")
10 loops, best of 3: 38 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="dense_copy")
10 loops, best of 3: 143 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="sparse_copy")
10 loops, best of 3: 148 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="networkx")
1 loops, best of 3: 248 ms per loop

@dcoudert
Copy link
Contributor

dcoudert commented Jul 3, 2015

comment:12

Also, I have an issue:

sage: g = graphs.RandomGNP(1000,.1)
sage: N = 1000
sage: g = graphs.RandomGNP(N, log(N)/N)
sage: g.order(), g.size()
(1000, 3567)
sage: %timeit _=g.clustering_coeff(implementation="boost")
100 loops, best of 3: 2.78 ms per loop
sage: %timeit _=g.clustering_coeff(implementation="dense_copy")
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-31-cd519bf674aa> in <module>()
----> 1 get_ipython().magic(u'timeit _=g.clustering_coeff(implementation="dense_copy")')

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
   2305         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2306         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2307         return self.run_line_magic(magic_name, magic_arg_s)
   2308 
   2309     #-------------------------------------------------------------------------

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
   2226                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2227             with self.builtin_trap:
-> 2228                 result = fn(*args,**kwargs)
   2229             return result
   2230 

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
   1034             number = 1
   1035             for _ in range(1, 10):
-> 1036                 time_number = timer.timeit(number)
   1037                 worst_tuning = max(worst_tuning, time_number / number)
   1038                 if time_number >= 0.2:

/Users/dcoudert/sage/local/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, number)
    130         gc.disable()
    131         try:
--> 132             timing = self.inner(it, self.timer)
    133         finally:
    134             if gcold:

<magic-timeit> in inner(_it, _timer)

/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/graphs/generic_graph.pyc in clustering_coeff(self, nodes, weight, implementation, return_vertex_weights)
  12437             from sage.rings.integer import Integer
  12438             return {v:Integer(count)/((self.degree(v)*(self.degree(v)-1))/2)
> 12439                     for v,count in triangles_count(self).iteritems()}
  12440 
  12441     def cluster_transitivity(self):

/Users/dcoudert/sage/local/lib/python2.7/site-packages/sage/graphs/generic_graph.pyc in <dictcomp>((v, count))
  12437             from sage.rings.integer import Integer
  12438             return {v:Integer(count)/((self.degree(v)*(self.degree(v)-1))/2)
> 12439                     for v,count in triangles_count(self).iteritems()}
  12440 
  12441     def cluster_transitivity(self):

/Users/dcoudert/sage/src/sage/structure/element.pyx in sage.structure.element.RingElement.__div__ (/Users/dcoudert/sage/src/build/cythonized/sage/structure/element.c:17310)()
   1991         if have_same_parent_c(self, right):
   1992             return (<RingElement>self)._div_(<RingElement>right)
-> 1993         return coercion_model.bin_op(self, right, div)
   1994 
   1995     cpdef RingElement _div_(self, RingElement right):

/Users/dcoudert/sage/src/sage/structure/coerce.pyx in sage.structure.coerce.CoercionModel_cache_maps.bin_op (/Users/dcoudert/sage/src/build/cythonized/sage/structure/coerce.c:8195)()
    994         try:
    995             xy = self.canonical_coercion(x,y)
--> 996             return PyObject_CallObject(op, xy)
    997         except TypeError as err:
    998             if xy is not None:

/Users/dcoudert/sage/src/sage/structure/element.pyx in sage.structure.element.RingElement.__div__ (/Users/dcoudert/sage/src/build/cythonized/sage/structure/element.c:17293)()
   1990         """
   1991         if have_same_parent_c(self, right):
-> 1992             return (<RingElement>self)._div_(<RingElement>right)
   1993         return coercion_model.bin_op(self, right, div)
   1994 

/Users/dcoudert/sage/src/sage/rings/integer.pyx in sage.rings.integer.Integer._div_ (/Users/dcoudert/sage/src/build/cythonized/sage/rings/integer.c:11621)()
   1728         # This is vastly faster than doing it here, since here
   1729         # we can't cimport rationals.
-> 1730         return the_integer_ring._div(self, right)
   1731 
   1732     def __floordiv__(x, y):

/Users/dcoudert/sage/src/sage/rings/integer_ring.pyx in sage.rings.integer_ring.IntegerRing_class._div (/Users/dcoudert/sage/src/build/cythonized/sage/rings/integer_ring.c:4894)()
    420         cdef rational.Rational x = rational.Rational.__new__(rational.Rational)
    421         if mpz_sgn(right.value) == 0:
--> 422             raise ZeroDivisionError('Rational division by zero')
    423         mpz_set(mpq_numref(x.value), left.value)
    424         mpz_set(mpq_denref(x.value), right.value)

ZeroDivisionError: Rational division by zero

I'll send you the graph by mail if you want to investigate.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 3, 2015

comment:13

On a clique that's for sure, but on other kinds of graphs...

Oh.. Wow..

HMmmm.. It seems that those graphs are stuffed with degree-2 vertices. I wonder how they do that O_o

Nathann

@dcoudert
Copy link
Contributor

dcoudert commented Jul 3, 2015

comment:14

I assume they iterate over the edges, which is what I would do on a sparse graph.

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 3, 2015

comment:15

They probably do a dichotomic search where I do a linear search. I'll try to see tomorrow evening (I will sleep in an airport) if I can do something about that.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 4, 2015

Changed commit from 6274bff to a5b7092

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 4, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

a5b7092trac #18834: Division by zero

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 5, 2015

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8c25063trac #18834: Division by zero

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 5, 2015

Changed commit from a5b7092 to 8c25063

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 5, 2015

comment:18

A more correct fix.

About the difference in timings:

sage: g=graphs.RandomBarabasiAlbert(10000,2)
sage: %timeit _ = g.clustering_coeff()
10 loops, best of 3: 96.8 ms per loop
sage: from sage.graphs.base.static_sparse_graph import triangles_count
sage: %timeit _ = triangles_count(g)
10 loops, best of 3: 70 ms per loop

It seems that a nontrivial part of the difference in timing is taken by the construction of the.... final dictionary. Profiling the method does not show a major cost of the actual algorithm.. So well, it seems that this graph is so easy to deal with that returning the result is not negligible :-P

Nathann

@dcoudert
Copy link
Contributor

dcoudert commented Jul 6, 2015

comment:19

For me the patch is now good to go!

@dcoudert
Copy link
Contributor

dcoudert commented Jul 6, 2015

Reviewer: David Coudert

@nathanncohen
Copy link
Mannequin Author

nathanncohen mannequin commented Jul 7, 2015

comment:20

Thanks !

Nathann

@vbraun
Copy link
Member

vbraun commented Jul 7, 2015

Changed branch from u/ncohen/18834 to 8c25063

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants