Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow for random DSCC matrices with a high dimension. #84

Closed
FlorentinD opened this issue Sep 21, 2020 · 11 comments
Closed

Integer overflow for random DSCC matrices with a high dimension. #84

FlorentinD opened this issue Sep 21, 2020 · 11 comments
Labels
verified Bug/Problem has been verified

Comments

@FlorentinD
Copy link
Collaborator

When trying to create a random sparse matrix of size 10^6 x 10^6, a bug occurs based on an integer overflow.

Test to validate:

    @Test
    public void hugeDimensionalMatrix() {
        int dim = 1_000_000;
        RandomMatrices_DSCC.rectangle(dim, dim, dim * 4,rand);
    }

The problematic lines are (in RandomMatrices_DSCC):

nz_total = Math.min(numCols*numRows,nz_total);
int[] selected = UtilEjml.shuffled(numRows*numCols, nz_total, rand);

Here numCols*numRows produces an Integer overflow.

@FlorentinD
Copy link
Collaborator Author

To catch the overflow, we could use Math.multiplyExact(numRows, numCols), but I didn't see, how to handle this case.

@lessthanoptimal
Copy link
Owner

How often do you need matrices larger than the integer max? I've been debating what to do here since the size of Java arrays being limited to 31-bits is getting more annoying and the Java development team doesn't seem to want to move quickly on this problem.

@FlorentinD
Copy link
Collaborator Author

FlorentinD commented Sep 25, 2020

For graphs it can be quite often, that the adjacency matrix has such dimensions.
A typical benchmark graph is the ldbc social network graph, with 3 Million vertices as scale-factor 1.
For example a matrix sized 46341 x 46341 already produces an overflow here.

However it occurs less often, that the adjacency matrix actually stores as 2^31-1 entries (e.g. number of edges).
Random matrices are mainly used for testing right?
F.i. for benchmarking my implementations, it would be useful if I could create matrices with higher dimensions.

I also think the method should throw, when nz_total is larger than Int.MAX_VALUE (or smaller than 0).

@FlorentinD
Copy link
Collaborator Author

I could also think of having a different kind of interface for generating sparse random matrices, e.g.generate(int numRows, int numRows, int avgDegree, someEnum degreeDistribution).
With degree I mean the average amount of entries per column (maybe we could find a better word).
Possible distributions would be uniform and powerlaw.
The reasoning here is, that a powerlaw distribution is common for many real world graphs.

This idea is based on https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/graph-generation/.

@lessthanoptimal
Copy link
Owner

lessthanoptimal commented Sep 29, 2020

It now checks to see if the shape exceeds the maximum size of an integer using the function below:

   /**
     * Checks to see if a matrix of this size will exceed the maximum possible value an integer can store, which is
     * the max possible array size in Java.
     */
    public static boolean exceedsMaxMatrixSize( int numRows, int numCols ) {
        if (numRows == 0 || numCols == 0)
            return false;
        return numCols > Integer.MAX_VALUE/numRows;
    }

@lessthanoptimal
Copy link
Owner

lessthanoptimal commented Sep 29, 2020

#91

Apologies for the PR I'm linking here. I formatted the code making it extremely difficult to identify the relevant change. I'm trying to clean up the code, but only when I manually inspect a class. I should be doing this in separate commits... I'll probably be regretting this decision when I'm hunting down regressions.

@lessthanoptimal
Copy link
Owner

I like the idea of a more structured random matrix. Always thought the simple approach here was klunky. I think I would need to look at their code to really understand what they are doing. Sounds a bit complex.

@FlorentinD
Copy link
Collaborator Author

FlorentinD commented Sep 29, 2020

Thank you, I reviewed your PR.

Than lets say, that the rectangle behaviour will not changed (just throwing an error as in your PR), and rather create a new method, which is based on avgDegree and distribution?
So we could track the progress on the new random matrix generator in a new issue 🙂

@FlorentinD
Copy link
Collaborator Author

The uniform distribution should be fairly simple and could be a starting point.
The col_idx[] follows a simple pattern: col_idx[i+1] = col_idx[i] + avgDegree.
For the row values java.util.Random.nextInt() could be used, as its uniformely distributed.

I would try to add this simple version on the weekend.

@lessthanoptimal
Copy link
Owner

yes let's do that! Let me know when the PR is ready.

@FlorentinD
Copy link
Collaborator Author

@lessthanoptimal The PR is ready 🙂

FlorentinD pushed a commit to FlorentinD/ejml that referenced this issue Oct 5, 2020
lessthanoptimal pushed a commit that referenced this issue Oct 6, 2020
- Add a random sparse matrix generator based on avg nz entries per column
- Drop/Select columns based on density
- based on issue #84
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
verified Bug/Problem has been verified
Projects
None yet
Development

No branches or pull requests

2 participants