Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that matrix sizes are in the supported range before passing them to R #1066

Merged
merged 2 commits into from Jan 2, 2024

Conversation

szhorvat
Copy link
Member

@szhorvat szhorvat commented Jan 2, 2024

Fixes #889

This fix is theoretical at the moment. I can't seem to test it on macOS. The test would use a very large amount of memory.

> g<-make_empty_graph(.Machine$integer.max + 2)
> distances(g, v=1)
Error: vector memory exhausted (limit reached?)

@krlmlr @Antonov548 Can either of you try to test this on a large-memory Linux machine? I don't have access to one at this moment. Or otherwise do we merge as-is without a through test?

Copy link
Contributor

aviator-app bot commented Jan 2, 2024

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged using Aviator.


See the real-time status of this PR on the Aviator webapp.

Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@szhorvat szhorvat marked this pull request as ready for review January 2, 2024 13:22
@krlmlr
Copy link
Contributor

krlmlr commented Jan 2, 2024

Instead of a one-off test, can we instead implement a function that checks matrix dimensions, test the behavior of that function (in an automated test), and use it where needed?

If we don't want to expose this function to R, we can use the catch testing framework in testthat: https://testthat.r-lib.org/reference/use_catch.html .

@szhorvat
Copy link
Member Author

szhorvat commented Jan 2, 2024

Each matrix type needs a different implementation, so this can't be reduced to calling a single function three times

@krlmlr
Copy link
Contributor

krlmlr commented Jan 2, 2024

Just the part where ncol and nrow is checked, perhaps?

@szhorvat
Copy link
Member Author

szhorvat commented Jan 2, 2024

test the behavior of that function (in an automated test),

How do you intend to test a C function from R? R doesn't even support the integer type that the function would be operating with. What would be the point of testing a function that just compares two integers and reports an error? I am not sure how to proceed here.

I managed to test manually that the functionality works after setting R_MAX_VSIZE=100Gb, using the command line instead of RStudio, and using a barebones function like layout_in_circle(). The R process used over 120 GB of memory.

> g<-make_empty_graph(n=.Machine$integer.max+1)
> l<-layout_in_circle(g)
Error in layout_in_circle(g) : 
  At rinterface_extra.c:2588 : igraph returned a matrix of size 2147483648 by 2. R does not support matrices with more than 2147483647 rows or columns. Failed

Comment on lines +2583 to +2590
/* Assuming that this function is called in a context where
* igraph_error() does not return. */
if (nrow > INT_MAX || ncol > INT_MAX) {
igraph_errorf("igraph returned a matrix of size %" IGRAPH_PRId " by %" IGRAPH_PRId ". "
"R does not support matrices with more than %d rows or columns.",
__FILE__, __LINE__, IGRAPH_FAILURE,
nrow, ncol, INT_MAX);
}
Copy link
Contributor

@krlmlr krlmlr Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be extracted into a function, say, R_igraph_check_matrix_dims(igraph_integer_t nrow, igraph_integer_t ncol) . This function becomes testable, e.g., through an R wrapper that we export just for testing. (Overkill, perhaps.)

We could also go one step further and make this function sensitive to a global switch (one that's perhaps only active in debug mode). The global switch would change the definition of INT_MAX for this function so that we can even test that (some) functions that return a matrix actually go through this code path, without allocating tons of memory. (Overkill, perhaps.)

But these two steps combined allow creating a one-off version of igraph that doesn't require allocating hundreds of gigabytes to check the functionality.

No action needed here, but for the future, should we strive to take testability into account when implementing?

Copy link
Member Author

@szhorvat szhorvat Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we strive to take testability into account when implementing

This PR fixes a bug that manifests itself in difficult to test scenarios. Nevertheless it is a bug that should be fixed. I did verify the fix manually.

This function becomes testable, e.g., through an R wrapper that we export just for testing.

The function would take arguments of type igraph_integer_t. This type is not supported by R. Using double makes things complicated and increases the chance of mistakes (not every igraph_integer_t value is representable as a double).

So I'm not sure how to do this testing in a productive way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the merge queue, thanks for working on this.

Happy to elaborate if this situation occurs on an issue from https://github.com/igraph/rigraph/milestone/14.

@krlmlr
Copy link
Contributor

krlmlr commented Jan 2, 2024

Thanks!

@aviator-app aviator-app bot merged commit f1df4c2 into main Jan 2, 2024
25 checks passed
@aviator-app aviator-app bot deleted the fix/matrix-checks branch January 2, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to guard against too-large matrices
2 participants