Integrate Cliquer library #888

Merged
merged 3 commits into from Apr 10, 2016

Projects

None yet

4 participants

@szhorvat
Contributor

This patch is not yet complete, but I'm posting early so I can get comments on how to proceed. In particular, I need comments on the proposed new functions.

Plan of attack:

  • Add the library files and use in igraph_cliques

  • Add Cliquer-specific functionality, namely:

    igraph_clique_size_hist(igraph_t *, igraph_vector_t *res, igraph_integer_t min_size, ingraph_integer_t max_size); // compute histogram of clique sizes
    igraph_cliques_callback(igraph_t *, igraph_integer_t min_size, igraph_integer_t max_size, igraph_clique_handler_t *handler); // call handler for each clique found
    
    // find cliques in given weight range
    igraph_weighted_cliques(const igraph_t *graph,
                const igraph_vector_t *vertex_weights, igraph_vector_ptr_t *res,
                igraph_real_t min_weight, igraph_real_t max_weight, igraph_bool_t maximal); 
    
     // call handler for each clique found, weighted version
    igraph_weighted_cliques_callback(igraph_t *, igraph_vector_int_t *vertex_weights, igraph_integer_t min_weight, igraph_integer_t max_weight, igraph_bool_t maximal, igraph_clique_handler_t *handler);
    
    // find clique(s) having largest weight
    int igraph_largest_weighted_cliques(const igraph_t *graph,
                const igraph_vector_t *vertex_weights, igraph_vector_ptr_t *res);
    
     // maximum weight any clique has
    igraph_weighted_clique_number(igraph_t *, igraph_vector_int_t *vertex_weights, igraph_integer_t *res);
    

    For the weighted versions, the maximal flag controls whether only maximal cliques should be considered. I opted to keep the weighted versions as separate functions, as this functionality is somewhat specialized. We want the flexibility to swap out the clique finder implementation to something else in the future. Also note that weights are integers.

  • Make it interruptible, possibly add progress reporting

  • Get rid of Unix-specific code in Cliquer

  • Update igraph_independent_vertex_sets to use the new implementation

  • Add some tests

Cliquer can find maximal cliques, but it is not faster than igraph in that area. Thus I only exposed maximal clique functionality for the weighted versions.

@ntamas ntamas referenced this pull request Oct 24, 2015
Closed

igraph_cliques is slow #887

@szhorvat
Contributor

I just want to note that the test failure is due to the different ordering of cliques as returned by Cliquer. It does not return them from smallest to largest. I'll take care of the tests at the end (I'm using Mathematica-based tests privately).

@ntamas
Member
ntamas commented Oct 25, 2015

The lucky thing is that (I think) we never declared that igraph_cliques will return the cliques from smallest to largest, so we do not need to preserve this behaviour.

The proposed signatures are okay, there's only one thing that struck my eye. Using an igraph_vector_int_t for the vertex weights implies that the weights are going to be integers (and if we ever find a library that supports fractional weights, we won't be able to swap out the implementation without breaking the API). Can we simply make it an igraph_vector_t and then complain with an error code if the weights are not integers (assuming that Cliquer supports integer weights only)?

@szhorvat
Contributor

@ntamas So then otherwise you're okay with the proposal?

Yes, Cliquer supports integer vertex weights only. Personally I don't need the weighted clique functionality, I just thought, why not expose it if it's there? I'll change the argument type to igraph_vector_t, and document that values will be rounded to integers.

Questions:

  1. Do we really want to check that vertex weights are integers, and warn if they're not? I'd rather leave this to the high-level interfaces.
  2. Cliquer can use several vertex-ordering heuristics to speed up the search. Do we want to expose these? I propose sticking with the default.
@ntamas
Member
ntamas commented Oct 27, 2015

Yes, I'm okay with the proposal.

Re weighted cliques: if Cliquer supports it, then it makes sense to expose this functionality throught the igraph API as well. The weights should be checked in the C layer - otherwise all the maintainers of all the higher level interfaces (R, Python, Mathematica etc) would have to add the checks in the higher level language to prevent invalid values from leaking into the C core, and there is a high chance that someone would forget about it.

Re vertex-ordering heuristics: let's stick to the defaults for the time being. I don't want to expose this functionality via a "generic" interface like igraph_cliques since it's a detail that is dependent on the internal implementation. If we want to expose this functionality in the future, we can always create a Cliquer-specific function (e.g., igraph_cliques_via_cliquer) that adds parameters for the heuristics.

@szhorvat
Contributor

I need a bit of help. How can I get the typedef igraph_clique_handler_t (added in igraph_cliques.h) to show up in the documentation? If I add <!-- doxrox-include igraph_clique_handler_t --> in cliques.xxml, I get an error when doing make html:

Traceback (most recent call last):
  File "../doc/doxrox.py", line 271, in <module>
    main()
  File "../doc/doxrox.py", line 133, in main
    docchunks[chunk.group(1)]
KeyError: 'igraph_clique_handler_t'
@ntamas
Member
ntamas commented Nov 12, 2015

There are two things that you could try in igraph_cliques.h - not sure if either of these makes a difference, but considering that the parser in doxrox is pretty crude, these changes could easily make a difference.

  1. Try removing the parentheses around igraph_clique_handler_t in the typedef - maybe doxrox parses it as (igraph_clique_handler_t) as it is now.
  2. Make sure that the line after the \typedef in the comment starts with \brief. See, e.g., the comment of igraph_progress_handler_t in include/igraph_progress.h as a template.
@szhorvat
Contributor

I'm a bit behind with this due to lack of time and due to last work because of a git mistake I made ...

I need some advice regarding function naming. At first I wanted to have weighted equivalents for both igraph_largest_cliques and for igraph_clique_number, but I am not sure how to name these functions to avoid confusion.

I am also open to suggestions about what functionality to include. Knowing the weight of the highest weight clique may not in itself be useful, without also knowing which vertices are part of that clique. There's also the fact that in the weighted case there will often be only a single largest weight clique (though not always, keep in mind that so far only integer weights will be supported). So we would probably want a subset of these the following: find all maximum weight cliques, find a single maximum weight clique, find the maximum weight without returning its vertices.

Opinions?

@gaborcsardi
Member

How about igraph_largest_weigthed_cliques and igraph_weighted_clique_number?

As for functionality, I don't know Cliquer, so I don't know what makes sense.

In general it is good to start with sg smaller, and then add stuff in subsequent PRs. I understand that you want to design the API of all functions together, but even then, you don't necessarily need to implement, test, etc. all of them for the first PR. This way you also get feedback earlier.

@szhorvat
Contributor

Thanks! I'd also prefer multiple smaller pull requests, so if you're okay with it, I'll do that. I thought that a single complete PR is preferred.

@gaborcsardi
Member

Well, to elaborate, ultimately complete PRs are preferred, i.e. with tests and docs, but they can still be small.

Also, initial PRs, to get feedback, don't need to be complete.

@szhorvat
Contributor

I am trying to get this pull request ready, but I need help with two things:

  1. Something strange happened when I tried to merge the changes from the master branch, and now all intermediate commits show up in the pull request (even though only the last three, labelled as Cliquer, are really part of this pull request and not yet on master). Is this a problem? If yes, do you have any hints on how I can fix it? Should I just use "rebase" and delete them (since they hopefully do not conflict with the Cliquer commits)?
  2. I am unable to put igraph_clique_handler_t in the documentation, even after following Tamás's suggesting from Nov 12 above. I get the same error as before (see previous comment). Could you help with fixing this? The current state of the pull request will trigger the error.

I still need to add C tests (I do have test in Mathematica, so I'm confident that everything works well), and I would like to leave updating igraph_independent_vertex_sets and adding igraph_weighted_cliques_callback to a future pull request. I updated the original pull request comment with the current function names and prototypes.

@ntamas
Member
ntamas commented Feb 23, 2016

Something strange happened when I tried to merge the changes from the master branch, and now all intermediate commits show up in the pull request (even though only the last three, labelled as Cliquer, are really part of this pull request and not yet on master). Is this a problem?

No, we'll mentally ignore them ;)

I am unable to put igraph_clique_handler_t in the documentation, even after following Tamás's suggesting from Nov 12 above.

Okay, I checked your branch and the problem is that when doxrox.py generates cliques.xml from cliques.xxml, it only looks at the files specified in the doxrox.py invocation as arguments (in our case, cliques.c and maximal_cliques.c). If the functions or types that you are referring to are documented in other files, you have to add the paths to these files as well when calling doxrox.py. So, you will need to update the doxrox.xml rule in doc/Makefile.am and add any new source files that you are referring to.

@szhorvat
Contributor
szhorvat commented Mar 1, 2016

@ntamas @gaborcsardi I consider this pull request ready for merging/review now.

A final summary:

  • non-maximal clique finding now uses Cliquer and is much faster
  • maximal clique finding is not updated to use Cliquer because the current igraph implementation is faster
  • examples/benchmarks now has a benchmark for igraph_cliques; bench.h needed a small fix to compile
  • no old code is removed (the old implementation is still there and used for finding independent vertex sets)
  • functions for weighted cliques (both non-maximal and maximal uses Cliquer)
  • only positive integer vertex weights are supported, but the API uses igraph_real_t for future extensibility; non-integer weights are truncated (not rounded) to their integer part
  • igraph_weighted_cliques_callback is not included (planned for later)
  • independent vertex set finding is not updated to use Cliquer (planned for later)

I fixed my git problem and the pull request is back to only 3 commits.

I ran the included tests on OS X only. I ran a Mathematica-based test suite for the same code also with MinGW-w64 (Windows), Linux x86_64 and Raspberry Pi (32-bit Linux).

@szhorvat
Contributor
szhorvat commented Apr 3, 2016

Hi guys, if there's anything I can do to make it easier to review this patch, please let me know! I have a couple of other things in the pipeline and I find it difficult to work with two parallel patches, so merging this would be helpful ...

@Qix-
Qix- commented Apr 4, 2016

@szhorvat you could always develop new patches on top of this PR and submit them with a note that states they depend on this PR. Github should pick up/omit the commits that are merged when this PR is landed.

@szhorvat
Contributor

@Qix- The problem is that after a review of this PR it may turn out that it needs additional work before it can be merged. Then I would have to edit the commits manually, which is both painful and risky if something else is also built on top of it.

Well, I should say it's risky if one is inexperienced with git like me. I started work on this before my previous PR was merged so I had to juggle two. I ended up losing quite a bit of work because I messed up a git rebase and I had no backup ... So I'm not really willing to work on the next one until this one is merged.

NB. Next one is an update of LAD which fixes bugs and significantly improves performance.

@gaborcsardi
Member

Very cool, thanks much!

@gaborcsardi gaborcsardi merged commit 0032faa into igraph:master Apr 10, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment