Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: igraph_is_complete #2510

Merged
merged 3 commits into from Mar 2, 2024
Merged

Conversation

aagon
Copy link
Contributor

@aagon aagon commented Feb 24, 2024

Follow-up of #2490.

Copy link

codecov bot commented Feb 24, 2024

Codecov Report

Attention: Patch coverage is 89.18919% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 84.37%. Comparing base (8def9e3) to head (13c78d1).
Report is 96 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2510      +/-   ##
==========================================
- Coverage   84.37%   84.37%   -0.01%     
==========================================
  Files         374      375       +1     
  Lines       61881    61608     -273     
  Branches        0    12055   +12055     
==========================================
- Hits        52212    51980     -232     
+ Misses       9669     9628      -41     
Files Coverage Δ
src/properties/complete.c 89.18% <89.18%> (ø)

... and 111 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5eabcd9...13c78d1. Read the comment docs.

@szhorvat
Copy link
Member

Let me know when this is ready for review. To fix the CodeQL CI failure, you can rebase on the latest master branch (but this is not critical).

@szhorvat szhorvat added this to the 0.10.11 milestone Feb 24, 2024
interfaces/functions.yaml Outdated Show resolved Hide resolved
@aagon
Copy link
Contributor Author

aagon commented Feb 25, 2024

It can be reviewed now, or I can add the support for directed graphs, or even igraph_is_clique, whatever you prefer.

@szhorvat
Copy link
Member

I had a look at the code (not the test though). Generally, it is very well written.

Regarding the ordering of neighbour lists: it is indeed guaranteed that these are sorted. This is documented for igraph_neighbors() and some igraph code relies on it, including the code that "simplifies" the neighbour lists.

We are missing support for directed graphs. Can you please add this?

Regarding running igraph_is_simple(): On a simple graph, this function needs to check each edge. It basically performs the same traversal as the following code does, iterating through all neighbour lists. Thus if we call this function, and the graph is non-simple, we end up doing twice the amount of work we'd need to. However, on simple graphs it has no disadvantage, and it is a cheap and safe way to make good use of the cache.

Thus I agree with you that this is a good way to implement is_complete().

One more thing that is missing is an overflow check when computing the largest possible number of edges. What if vcount*vcount overflows? A quick way would be to use the macros from math/safe_intop.h, which even has a IGRAPH_SAFE_N_CHOOSE_2. However, these macros all throw an error on overflow, and this is not what we want. If the vertex count is so high that vcount*(vcount-1)/2 is not representable, then the graph obviously can't be complete. It would be better to compare vcount against the largest vertex count with which a complete graph is still representable with igraph. But I'm not sure how to compute this value at compile time ...

Maybe instead of computing it, we can just assume that we're working with either standard 32-bit or 64-bit signed integers, and hard-code the limit for these two cases, similar to how it's done here:

#if IGRAPH_INTEGER_SIZE == 32
/* Nothing else to do. */
#elif IGRAPH_INTEGER_SIZE == 64
k |= k >> 32;
#else
/* If values other than 32 or 64 become allowed,
* this code will need to be updated. */
# error "Unexpected IGRAPH_INTEGER_SIZE value."
#endif

@szhorvat
Copy link
Member

szhorvat commented Feb 26, 2024

So, in the undirected case, the max possible vcount is $2^{32} = 4294967296$ with 64-bit and $2^{16} = 65536$ with 32-bit integers.

In the directed case, this will be 3037000500 and 46341, respectively. I hope this helps.

A graph with a vcount strictly larger than these cannot be complete.

@szhorvat
Copy link
Member

szhorvat commented Feb 26, 2024

Finally, we still need to be smart about computing (vcount * vcount) - vcount) / 2 in case vcount is too close to the above mentioned values. We could do something like:

complete_ecount = vcount % 2 == 0 ? (vcount/2) * (vcount - 1) : vcount * ((vcount-1)/2)

This won't overflow, assuming that the result is representable, which we will be checking earlier.

@ntamas
Copy link
Member

ntamas commented Feb 26, 2024

Maybe instead of computing it, we can just assume that we're working with either standard 32-bit or 64-bit signed integers, and hard-code the limit for these two cases

I agree. It's unlikely that we'll ever want to support architectures that are not 32-bit or 64-bit anyway.

src/properties/complete.c Outdated Show resolved Hide resolved
@aagon
Copy link
Contributor Author

aagon commented Feb 27, 2024

Thank you for the review.

Regarding the ordering of neighbour lists: it is indeed guaranteed that these are sorted.

I did not mean the neighbour list, but the list of all edges in the graph, iterated on according to edge ID. Those are not always sorted according to the "lowest adjacent vertex -- highest adjacent vertex" scheme I was mentioning, unless I'm mistaken. I asked because it would have allowed to count the edges directly on this list, rather than rely on igraph_i_neighbours. This would have avoided to check edges twice on undirected graphs. But this was too complicated anyway.

We are missing support for directed graphs. Can you please add this?

Done.

In the directed case, this will be 3037000500 and 46341, respectively. I hope this helps.

This was most useful, I am glad not to have had to do this myself. I've added a comment on how those numbers are produced.

complete_ecount = vcount % 2 == 0 ? (vcount/2) * (vcount - 1) : vcount * ((vcount-1)/2)

This was very smart indeed, it is guaranteed to be very cheap.

I think that the result of this function should be cached in a cache entry "is_complete". If you agree, I will add a placeholder comment to remind myself to add caching once you're finished refactoring it.

@aagon aagon force-pushed the feat/igraph_is_complete branch 2 times, most recently from 1babfbc to 83b87c5 Compare February 27, 2024 06:13
Copy link
Member

@szhorvat szhorvat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few comments, but haven't had time to examine the test in detail yet.

Can you please include this function in the docs? I suggest adding it in docs/cliques.xxml. Feel free to propose a different location if you find one.

I'll get back to this later, including answering your questions.

src/properties/complete.c Outdated Show resolved Hide resolved
src/properties/complete.c Outdated Show resolved Hide resolved
tests/unit/igraph_is_complete.c Outdated Show resolved Hide resolved
src/properties/complete.c Outdated Show resolved Hide resolved
src/properties/complete.c Outdated Show resolved Hide resolved
tests/unit/igraph_is_complete.c Outdated Show resolved Hide resolved
tests/unit/igraph_is_complete.c Outdated Show resolved Hide resolved
tests/unit/igraph_is_complete.c Outdated Show resolved Hide resolved
tests/unit/igraph_is_complete.c Outdated Show resolved Hide resolved
@szhorvat
Copy link
Member

I think that the result of this function should be cached in a cache entry "is_complete". If you agree, I will add a placeholder comment to remind myself to add caching once you're finished refactoring it.

I thought about this a bit, and it seems to me that this property would not be needed very often. If the graph is simple, caching that already achieved O(1). So caching whether the graph is complete is relevant only for multigraphs.

The original motivation for caching was to avoid running the most common checks multiple times. There are several functions which need to check if the graph is simple, or connected, before running their algorithm. Thus these checks were frequently run redundantly.

Do you have a use case in mind where it is important to not repeat complete graph checks on a multigraph?

@aagon aagon force-pushed the feat/igraph_is_complete branch 2 times, most recently from 1d512b3 to 836f278 Compare February 28, 2024 23:06
@aagon
Copy link
Contributor Author

aagon commented Feb 28, 2024

Can you please include this function in the docs? I suggest adding it in docs/cliques.xxml.

Done. I'm fine with adding it there, but then maybe I should change the header file in which it is declared ? I'll let you decide.

Can you please add a very brief comment about what this function does, especially starting at _ /* If directed, we must add the mutuals manually */_? I'm a bit confused about this part.

I refactored the unit test, added comments, and used the occasion to remove all the unneeded memory management and calls to IGRAPH_CHECK.

If the graph is simple, caching that already achieved O(1).

Agreed. But in that case, we should try and cache the simplicity of a graph wherever we can. For instance, a graph generated with igraph_grg_game is necessarily simple, but this is not cached (I thought of this because this is how I generate my graphs, but there may be a lot of other places).

Do you have a use case in mind where it is important to not repeat complete graph checks on a multigraph?

No. I do not use multigraphs. I agree with your decision to not cache the completeness. But then, see my other point about caching simplicity much more aggressively (I can help track the places where this can be done, if you want).

@szhorvat
Copy link
Member

szhorvat commented Mar 1, 2024

@ntamas What do you think about adding an igraph_bool_t directed parameter here, and making it possible to treat directed graphs as undirected? Clique functions ignore edge directions (although they do issue a warning). Perhaps there is value in making this possible?

@szhorvat
Copy link
Member

szhorvat commented Mar 1, 2024

But then, see my other point about caching simplicity much more aggressively (I can help track the places where this can be done, if you want).

I'd be lying if I said I wasn't very tempted to do this, but experience has taught us that there is a rather high risk of introducing bugs. Caching bugs can have very bad consequences and could be very difficult to debug.

Examples:

  • There was a recent bug in igraph_barabasi_game() where it incorrectly generated multigraphs. I was surprised by this, I assumed it couldn't happen. But with complex functions there's always a chance of bugs.
  • Some functions have edge cases that are hard to miss. Does igraph_lattice() produce simple graphs? Usually yes, but if the lattice is periodic, and of size 1 or 2 in one direction, then it will have self-loops of multi-edges. The same goes for make_ring (cycle graphs).

I do wonder if igraph_add_edges() could verify that the graph stays simple without a performance penalty. Checking for self-loops is trivial, and definitely doable. Checking for multi-edges should be doable since this function does sort the edge list internally. If we can localize these checks to a single function (add_eges()) then all graph generators gain the benefit, and the risk is significantly reduced.

IMO keeping track of whether graphs are simple would be a huge benefit. Lots of checks become easier for simple graphs. EDIT: Opened an issue, #2522

@ntamas
Copy link
Member

ntamas commented Mar 2, 2024

What do you think about adding an igraph_bool_t directed parameter here, and making it possible to treat directed graphs as undirected?

Seems OK to me, we do this in many other places. But I th9nk we can merge the current PR as is and then we can add this later, right?

@szhorvat
Copy link
Member

szhorvat commented Mar 2, 2024

Yes, it is marked as experimental.

Copy link
Member

@ntamas ntamas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this looks good to me then.

@szhorvat szhorvat marked this pull request as ready for review March 2, 2024 17:32
@szhorvat szhorvat merged commit 43f1468 into igraph:master Mar 2, 2024
25 checks passed
krlmlr added a commit to igraph/rigraph that referenced this pull request Mar 3, 2024
chore: update changelog
feat: igraph_is_complete() checks if a graph is complete (igraph/igraph#2510)
Merge pull request igraph/igraph#2525 from igraph/fix/linegraph-keep-loops
do not generate full scanner tables with flex
chore: improve readability of some code
refactor: igraph_degree() now uses the cache when checking for self-loops
refactor: minor readability cleanup for minimum_size_separators()
chore: add note about empty minimal st separator fix
tests: minor cleanup in regression tests
chore: update changelog
fix: correct `is_(minimal_)separator()` for disconnected graphs, fixes igraph/igraph#1480
tests: add more `is_separator()` tests
tests: fix premature memory deallocation in is_separator example
fix: igraph_all_minimal_st_separators() does not return empty separators any more, fixes igraph/igraph#2517
fix: GraphML reader does not accept duplicate attribute names any more, fixes igraph/igraph#2506
fuzzing: assertion readability in vertex separator fuzzer
fuzzing: improve vertex separator fuzzers
docs: doc improvements for ECC and Voronoi
refactor: minor prettification in trie code
fix: assert that keys in a trie are not NULL to make Coverity happy
refactor: improved error messages from C attribute handler (igraph/igraph#2513)
chore: updated changelog
fix: prevent insertion of empty keys in a trie
fix: prevent empty ID attributes for <key> tags in GraphML
docs: change mentions to pointer vectors to vector lists where appropriate & other improvements
fix: -CF causes buffer overflows with Flex, try -Cf instead
fix: bug_2506 test case should not be run if igraph is compiled without GraphML support
ci: run apt-get update in codeql-analysis pipeline
fix: fix attribute value comparison in igraph_read_graph_graphml()
fix: fix duplicate attribute detection in igraph_read_graph_graphml(), fixes igraph/igraph#2506
refactor: make use of IGRAPH_CHECK_OOM() in a few places
style: fix wording for a few warnings
refactor: more detailed error messages in C attribute handler
refactor: experimental `-Cf` option for Flex for better performance at the cost of larger code size. Hopefully this avoids fuzzer timeouts.
fix: `igraph_disjoint_union()` and `igraph_disjoint_union_many()` now check for overflow
chore: Update CONTRIBUTING.md
fuzzing: expand file format round-tripping fuzzers
fix: disallow writing invalid graph attributes to GML files, fixes igraph/igraph#2505
fuzzing: write_all fuzzer fixes
refactor: remove non-exposed igraph_maximum_matching() function whihc was already noted as removed in the 0.10 changelog
fuzzing: activate write_all fuzzer
fuzzing: add cohesive_blocks()
Merge pull request igraph/igraph#2504 from igraph/fix/writing-invalid-lgl-ncol-files
docs: document that `write_graph_graphml()` assumes that strings are UTF-8 encoded, fixes igraph/igraph#2503
docs: mention that the variation of information is in natural units
krlmlr added a commit to igraph/rigraph that referenced this pull request Mar 3, 2024
chore: update changelog
feat: igraph_is_complete() checks if a graph is complete (igraph/igraph#2510)
Merge pull request igraph/igraph#2525 from igraph/fix/linegraph-keep-loops
do not generate full scanner tables with flex
chore: improve readability of some code
refactor: igraph_degree() now uses the cache when checking for self-loops
refactor: minor readability cleanup for minimum_size_separators()
chore: add note about empty minimal st separator fix
tests: minor cleanup in regression tests
chore: update changelog
fix: correct `is_(minimal_)separator()` for disconnected graphs, fixes igraph/igraph#1480
tests: add more `is_separator()` tests
tests: fix premature memory deallocation in is_separator example
fix: igraph_all_minimal_st_separators() does not return empty separators any more, fixes igraph/igraph#2517
fix: GraphML reader does not accept duplicate attribute names any more, fixes igraph/igraph#2506
fuzzing: assertion readability in vertex separator fuzzer
fuzzing: improve vertex separator fuzzers
docs: doc improvements for ECC and Voronoi
refactor: minor prettification in trie code
fix: assert that keys in a trie are not NULL to make Coverity happy
refactor: improved error messages from C attribute handler (igraph/igraph#2513)
chore: updated changelog
fix: prevent insertion of empty keys in a trie
fix: prevent empty ID attributes for <key> tags in GraphML
docs: change mentions to pointer vectors to vector lists where appropriate & other improvements
fix: -CF causes buffer overflows with Flex, try -Cf instead
fix: bug_2506 test case should not be run if igraph is compiled without GraphML support
ci: run apt-get update in codeql-analysis pipeline
fix: fix attribute value comparison in igraph_read_graph_graphml()
fix: fix duplicate attribute detection in igraph_read_graph_graphml(), fixes igraph/igraph#2506
refactor: make use of IGRAPH_CHECK_OOM() in a few places
style: fix wording for a few warnings
refactor: more detailed error messages in C attribute handler
refactor: experimental `-Cf` option for Flex for better performance at the cost of larger code size. Hopefully this avoids fuzzer timeouts.
fix: `igraph_disjoint_union()` and `igraph_disjoint_union_many()` now check for overflow
chore: Update CONTRIBUTING.md
fuzzing: expand file format round-tripping fuzzers
fix: disallow writing invalid graph attributes to GML files, fixes igraph/igraph#2505
fuzzing: write_all fuzzer fixes
refactor: remove non-exposed igraph_maximum_matching() function whihc was already noted as removed in the 0.10 changelog
fuzzing: activate write_all fuzzer
fuzzing: add cohesive_blocks()
Merge pull request igraph/igraph#2504 from igraph/fix/writing-invalid-lgl-ncol-files
docs: document that `write_graph_graphml()` assumes that strings are UTF-8 encoded, fixes igraph/igraph#2503
docs: mention that the variation of information is in natural units
@szhorvat
Copy link
Member

szhorvat commented Mar 31, 2024

@aagon Would you be interested in implementing igraph_is_clique() as well? The issue is #2380

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants