Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster matroid 3 connectivity #18539

Closed
chaoxu mannequin opened this issue May 28, 2015 · 72 comments
Closed

faster matroid 3 connectivity #18539

chaoxu mannequin opened this issue May 28, 2015 · 72 comments

Comments

@chaoxu
Copy link
Mannequin

chaoxu mannequin commented May 28, 2015

Implement the efficient 3 connectivity algorithm.

R. E. Bixby, W. H. Cunningham, Matroids, Graphs, and 3-Connectivity. In Graph theory and related topics (Proc. Conf., Univ. Waterloo, Waterloo, ON, 1977), 91-103

Depends on #18448

CC: @sagetrac-Stefan @sagetrac-yomcat @sagetrac-Rudi

Component: matroid theory

Author: Chao Xu

Branch/Commit: 482ce85

Reviewer: Michael Welsh, Rudi Pendavingh

Issue created by migration from https://trac.sagemath.org/ticket/18539

@chaoxu chaoxu mannequin added this to the sage-6.8 milestone May 28, 2015
@chaoxu chaoxu mannequin added the p: major / 3 label May 28, 2015
@chaoxu
Copy link
Mannequin Author

chaoxu mannequin commented May 28, 2015

@chaoxu

This comment has been minimized.

@chaoxu
Copy link
Mannequin Author

chaoxu mannequin commented May 28, 2015

Commit: c4f38eb

@chaoxu
Copy link
Mannequin Author

chaoxu mannequin commented May 28, 2015

New commits:

c4f38ebbasic algorithm framework for further modification

@chaoxu chaoxu mannequin added t: enhancement labels May 28, 2015
@chaoxu chaoxu mannequin self-assigned this May 28, 2015
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 28, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

5167f3dremove excess code from copy & paste

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 28, 2015

Changed commit from c4f38eb to 5167f3d

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 30, 2015

Changed commit from 5167f3d to 313de5d

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 30, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

f9c59fcMerge branch 'develop' into t/18539/faster_matroid_3_connectivity
313de5dintermediate code, seems to be a correct implementation

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 30, 2015

Changed commit from 313de5d to 23cfab1

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 30, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

23cfab1complete special cases in stage 0

@chaoxu
Copy link
Mannequin Author

chaoxu mannequin commented May 30, 2015

comment:8

Running the following code shows the new implementation matches the current behavior on named matroids.

def test(M): 
	return M.is_3connected_beta()==M.is_3connected()
results = []
for M in matroids.named_matroids.__dict__.values():
    if callable(M):
        results.append(test(M()))
all(results)

I will generate some larger examples to test the running time and start optimizing.

For the base case, I need to access Uniform(r,n). A direct import seems to cause circular dependency. If I use

from sage.matroids.catalog import Uniform
# some code that calls Uniform(r,n)

then the error ImportError: cannot import name Uniform appears during runtime.
Somehow using from sage.matroids.catalog import * and write sage.matroids.catalog.Uniform(r,n) for every call works out. I wonder if there are better ways.

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented May 30, 2015

comment:9

Replying to @chaoxu:

For the base case, I need to access Uniform(r,n). A direct import seems to cause circular dependency. If I use

from sage.matroids.catalog import Uniform
# some code that calls Uniform(r,n)

then the error ImportError: cannot import name Uniform appears during runtime.
Somehow using from sage.matroids.catalog import * and write sage.matroids.catalog.Uniform(r,n) for every call works out. I wonder if there are better ways.

A matroid of rank 1 is uniform if and only if it has no loops, and dually a matroid of corank 1 is uniform iff it has no coloops. So you could also replace your test

     if (self.size() <= 1 or
            self.is_isomorphic(sage.matroids.catalog.Uniform(1,2)) or
            self.is_isomorphic(sage.matroids.catalog.Uniform(1,3)) or
            self.is_isomorphic(sage.matroids.catalog.Uniform(2,3))):
            return True

with this simpler one

     if self.size() <= 1:
            return True
     if self.size() <= 3 and self.full_rank()==1 and not self.loops():
            return True
     if self.size() <= 3 and self.full_corank()==1 and not self.coloops():
            return True

@sagetrac-Stefan
Copy link
Mannequin

sagetrac-Stefan mannequin commented May 30, 2015

comment:10

While I agree that Rudi's method is better (isomorphism testing is VERY expensive), let's look at the import issue too. Was your import inside the method instead of global? You might want to look at the way it was done in, for instance, the _minor method in matroid.pyx. I think you can't do it better than that.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 1, 2015

Changed commit from 23cfab1 to 0033334

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 1, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

0033334Optimization: Clever choice of basis and fundamental cocircuits.

@chaoxu
Copy link
Mannequin Author

chaoxu mannequin commented Jun 1, 2015

comment:12

Thanks Rudi, I have incorporated your suggestions. It's much faster now.

One can use the following code to test the running time of the current implementation and the original one.

def bench(beta=False):
	for M in matroids.named_matroids.__dict__.values():
  		if callable(M):
  			if beta:
				M().is_3connected_beta()
			else:
				M().is_3connected()
sage: %time bench(False)
CPU times: user 298 ms, sys: 10.5 ms, total: 309 ms
Wall time: 301 ms
sage: %time bench(True)
CPU times: user 36.9 ms, sys: 1.83 ms, total: 38.8 ms
Wall time: 37.5 ms

I also benchmarked on 3-connected binary matroids listed here. The old algorithm takes 925 ms, the new one takes 51 ms.

For correctness, I will try to see if the original and the new implementation agrees for all matroids with at most 9 elements. (data from Database of Matroids).

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 1, 2015

all matroids on n<= 9 elements with rank <= n/2

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 1, 2015

comment:13

Attachment: matroids9_low_rank.sobj.gz

Replying to @chaoxu:

One can use the following code to test the running time of the current implementation and the original one.

sage: %time bench(False)
CPU times: user 298 ms, sys: 10.5 ms, total: 309 ms
Wall time: 301 ms
sage: %time bench(True)
CPU times: user 36.9 ms, sys: 1.83 ms, total: 38.8 ms
Wall time: 37.5 ms

I also benchmarked on 3-connected binary matroids listed here. The old algorithm takes 925 ms, the new one takes 51 ms.

That really is very nice work. Cheers!

What part of your code is currently using most of the time on these test cases?

For correctness, I will try to see if the original and the new implementation agrees for all matroids with at most 9 elements. (data from Database of Matroids).

I have attached the set of matroids on at most 9 elements, perhaps that will save you some time translating their data. If you want I can also send the Sage code I used to generate these matroids, just let me know.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 1, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

e4917cefix bug.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 1, 2015

Changed commit from 0033334 to e4917ce

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 1, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

84a81fdreduce number of candidate cocircuits

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 1, 2015

Changed commit from e4917ce to 84a81fd

@chaoxu
Copy link
Mannequin Author

chaoxu mannequin commented Jun 1, 2015

comment:16

Replying to @sagetrac-Rudi:

I have attached the set of matroids on at most 9 elements, perhaps that will save you some time translating their data.

Thanks, that's really helpful. I found a bug and now it have been fixed. Now it works for all test cases. I'm pretty convinced of the correctness.

What part of your code is currently using most of the time on these test cases?

For the named cases, _corank and minor operation taking the most of the time.
For the small 3 connected binary matroids, most time was spent on is_circuit, is_cosimple, is_simple.
For matroids with large rank (I generated a random rank 500 binary matroid), components take the most time.
Here is an time breakdown for a rank 500 binary matroid.

         136529 function calls in 18.426 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      501    6.947    0.014   11.109    0.022 matroid.pyx:4477(components)
   125515    4.162    0.000    4.162    0.000 matroid.pyx:1351(circuit)
        1    3.968    3.968    3.968    3.968 matroid.pyx:4441(is_cosimple)
      500    3.216    0.006    3.246    0.006 matroid.pyx:3438(minor)
        1    0.027    0.027   14.356   14.356 matroid.pyx:4807(_is_3connected_beta)
     1000    0.026    0.000    0.044    0.000 matroid.pyx:1772(fundamental_cocircuit)
        1    0.026    0.026    0.026    0.026 matroid.pyx:4408(is_simple)
     1000    0.014    0.000    0.014    0.000 matroid.pyx:820(_is_basis)
      500    0.013    0.000    0.013    0.000 {method '_max_coindependent' of 'sage.matroids.basis_exchange_matroid.BasisExchangeMatroid' objects}
      500    0.005    0.000    0.030    0.000 utilities.py:184(sanitize_contractions_deletions)
        1    0.005    0.005    0.005    0.005 matroid.pyx:2040(coloops)
      500    0.005    0.000    0.005    0.000 {method '_max_independent' of 'sage.matroids.basis_exchange_matroid.BasisExchangeMatroid' objects}
     1000    0.003    0.000    0.017    0.000 matroid.pyx:1887(is_basis)
     1000    0.003    0.000    0.003    0.000 {method 'issuperset' of 'frozenset' objects}
     1000    0.002    0.000    0.002    0.000 {method 'difference' of 'frozenset' objects}
     1000    0.002    0.000    0.002    0.000 {method 'union' of 'frozenset' objects}
      500    0.001    0.000    3.248    0.006 matroid.pyx:3631(delete)
      500    0.000    0.000   11.060    0.022 matroid.pyx:4518(is_connected)
      500    0.000    0.000    0.000    0.000 {method 'isdisjoint' of 'frozenset' objects}
     1000    0.000    0.000    0.000    0.000 {method 'groundset' of 'sage.matroids.basis_exchange_matroid.BasisExchangeMatroid' objects}
        1    0.000    0.000    0.000    0.000 matroid.pyx:1811(loops)
        1    0.000    0.000   18.431   18.431 <string>:1(<module>)
        1    0.000    0.000   18.431   18.431 {method 'is_3connected_beta' of 'sage.matroids.matroid.Matroid' objects}
        1    0.000    0.000   18.431   18.431 matroid.pyx:4740(is_3connected_beta)
        3    0.000    0.000    0.000    0.000 matroid.pyx:1216(size)
        1    0.000    0.000    0.000    0.000 matroid.pyx:1236(rank)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

I will get it to return a separator and write some docs for the next update.

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 2, 2015

comment:17

Replying to @chaoxu:

What part of your code is currently using most of the time on these test cases?

For the named cases, _corank and minor operation taking the most of the time.
For the small 3 connected binary matroids, most time was spent on is_circuit, is_cosimple, is_simple.
For matroids with large rank (I generated a random rank 500 binary matroid), components take the most time.
Here is an time breakdown for a rank 500 binary matroid.

Thanks for sending that data. The many calls to circuits() are probably happening in the Matroids.components() routine. I think I may have written that routine, and looks as if I did not really optimize it for speed.

  • the method calls .circuit() in it's innermost loop, and then the use of _circuit() is preferred. The underscored versions of Matroid methods do the same as the original, but without the sanity checking of the input (in this case, testing if the input is a set of elements from the ground set).
  • circuit() is used to get a fundamental circuit, and I think we later created that optimized method for that: fundamental_circuit

So the following tweaks to components() may make a difference.

        B = self.basis()
        components = []
        for e in self.groundset() - B:
            C = set(self._fundamental_cocircuit(B,e))
            components2 = []
            for comp in components:
                if (C & comp):
                    C.update(comp)
                else:
                    components2.append(comp)
            components2.append(frozenset(C))
            components = components2
        components.extend([frozenset([e]) for e in self.loops()])
        return components

You could also consider rewriting a bit more, use the above when r(M) <= r^*(M) and the dual version (circuits) in the other case. That way the inner loop always has the least number of iterations. And if this routine really stays a bottleneck, a bitpacked version is always an option.

Since your routine is_3connected_beta also computes fundamental cocircuits, you would also gain a bit by using the unguarded _fundamental_cocircuit. It is perhaps wasteful to compute those fundamental cocircuits again in components and indirectly, in is_connected. You could consider rewriting these routines so that they take a set of fundamental circuits and/or a set of fundamental cociruits as input. Note that if Y is a fundamental cocircuit as in step 2, then the fundamental cocircuits of self.delete(Y) are obtained by deleting Y from the other fundamental cocircuits of self. So you could even avoid creating self.delete(Y) in step 2.

It is very, very strange indeed that a single call to is_cosimple takes that much time. I cannot explain how it can take any more time than is_simple. I'll look into it.

A quicker method to create the parallel class of e is:

parallel_class = M._closure([e])

The minor-taking/deletion is also expensive in your method. Improving the inefficiency of minor itself is probably more involved, and this would really deserve it's own ticket. It may be possible to reduce the number of times you construct a minor, e.g. by testing the rank of the minor before actually constructing it.

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 2, 2015

comment:18

Hi Chao,

somehow the inefficiency of components() became a major earworm for me today, so I coded a more efficient version for BasisExchangeMatroid (ticket #18591) to get it out of my system. The real inefficiency turned out to be in the set union operations, and calculating with bitsets rather than python sets works miracles for that.

I also felt a bit bad for luring you into my bad habits of premature code optimization, but by just using my patch you can stay focussed on testing correctness and enhancing functionality, which is probably better.

I realized why is_cosimple() is that much slower than is_simple() for BinaryMatroids (and Ternary and Quaternary). There is a specialized ultra fast method to get a fundamental cocircuit for such matroids, essentially copying the row of a bitpacked matrix (which is a bitset) into another bitset, which is the internal representation of the cocircuit. For circuits you are back to touching each of the groundset elements one by one, which is way slower. This asymmetry makes closure() faster that coclosure, and in turn is_simple faster than is_cosimple. I still need to see what is a clean way to solve that.

Rudi

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 17, 2015

comment:42

Hi Chao,

I think the code as such looks great and works great. Repair these last few issues and I'll give a positive review:

The docstring of is_3connected only mentions BC under ..ALGORITHM:. Please also describe the other method.

When _is_3connected_BC throws NotImplementedError, give a description of the problem , e.g. NotImplementedError("The Bixby-Cunningham algorithm does not return a separation")

In the docstring of _is_3connected_CE and _is_3connected_BC, there are several X and one T that need to have single accolades (now double). You can check the appearance of the docstring in the notebook by entering M._is_3connected_BC? on a matroid M.

Finally, I don't believe keeping&depricating the option separation is necessary in this case. This option was introduced in 6.8 beta and so never saw an official release. So if we make sure it's gone before 6.8 is released we should be fine.

If the others agree you can perhaps still change this. Otherwise I'll give the positive review after you settled the above 3 issues.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 17, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

183b216updated docs

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 17, 2015

Changed commit from 6a3484d to 183b216

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 17, 2015

comment:44
Error building the documentation.
Traceback (most recent call last):
  File "/Users/rudi/Documents/Development/sage-6.7/src/doc/common/builder.py", line 1626, in <module>
    getattr(get_builder(name), type)()
  File "/Users/rudi/Documents/Development/sage-6.7/src/doc/common/builder.py", line 292, in _wrapper
    getattr(get_builder(document), 'inventory')(*args, **kwds)
  File "/Users/rudi/Documents/Development/sage-6.7/src/doc/common/builder.py", line 503, in _wrapper
    x.get(99999)
  File "/Users/rudi/Documents/Development/sage-6.7/local/lib/python/multiprocessing/pool.py", line 558, in get
    raise self._value
OSError: [matroids ] docstring of sage.matroids.matroid.Matroid.is_3connected:33: WARNING: Bullet list ends without a blank line; unexpected unindent.

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 17, 2015

Changed reviewer from Michael Welsh to Michael Welsh, Rudi Pendavingh

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 17, 2015

Changed commit from 183b216 to 2372476

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 17, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

2372476indent problem

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 17, 2015

comment:46

Code compiles, tests succeed, docs build.

Positive review!

@vbraun
Copy link
Member

vbraun commented Jun 17, 2015

comment:47

merge conflict, probably #18448

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 18, 2015

Changed commit from 2372476 to 34be00b

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 18, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

34be00bMerge branch 'develop' into t/18539/faster_matroid_3_connectivity

@chaoxu
Copy link
Mannequin Author

chaoxu mannequin commented Jun 18, 2015

comment:50

What can I do at this point to resolve the merge conflict?

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 18, 2015

comment:51

Hi Chao,

I am also still learning git so perhaps I'm not the right person to give expert advice. But take a look at the way a merge conflict is resolved in this recent ticket #17492. The way to go seems to be to merge the conflicting ticket #18448 into your code. You can do so by a git trac pull 18448 while working on your ticket (this is destructive so if you want to be safe, perform a merge of the two patches in a separate branch). The automatic merge will partially fail and you have to repair some of the source (probably matroid.pyx). Places where the automerger could not decide what to do will be indicated in the source with <<< and >>>.

I doubt that the manual merge will be very difficult in this case, the issue is probably that the new code in both tickets are more or less in the same location. If you want, I will attempt the merge for you.

Just merging develop into your branch as you just did may not suffice, since 18448 is not yet part of develop.

Cheers,
Rudi

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 18, 2015

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

edaf4c7Improved docstrings, added .is_binary() for BinaryMatroid and RegularMatroid
7ad1f5bDocstring improvements
503b1e7Finishing touch docstrings
63767a1One more ALGORITHM block
8967234Fixed latex in docstring
29e84daMerge branch 'u/Rudi/add_test_if_a_matroid_is_binary' of trac.sagemath.org:sage into u/Rudi/add_test_if_a_matroid_is_binary
ada6742Some changes, tweaks, and made a method public.
b9c3268Added method .binary_matroid() for Matroid, BinaryMatroid, RegularMatroid
fab68ccdeleted a white space that messed up unindent in docs
482ce85Merge branch 'u/Rudi/binary_matroid-18448' of git://git.sagemath.org/sage into t/18539/faster_matroid_3_connectivity

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 18, 2015

Changed commit from 34be00b to 482ce85

@chaoxu
Copy link
Mannequin Author

chaoxu mannequin commented Jun 18, 2015

comment:53

Thanks Rudi, I followed your steps.
Hopefully this solves all the conflicts.

@chaoxu chaoxu mannequin added s: needs review and removed s: needs work labels Jun 18, 2015
@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 18, 2015

comment:54

Yes, that should do it. Positive review again.

@sagetrac-Rudi
Copy link
Mannequin

sagetrac-Rudi mannequin commented Jun 18, 2015

Dependencies: 18448

@vbraun
Copy link
Member

vbraun commented Jun 19, 2015

Changed branch from u/chaoxu/faster_matroid_3_connectivity to 482ce85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant