Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid vector size bug during Tree-HDagg partitioning for certain input matrices #9

Open
learning-chip opened this issue Dec 21, 2022 · 1 comment

Comments

@learning-chip
Copy link
Contributor

learning-chip commented Dec 21, 2022

Bug description

The SpTrSv_LL_Tree_HDAGG case in SpTRSV_runtime.cpp example leads to error for certain input matrices:

Running LL HDAGG with BIN Code with #core: 4 - The runtime:7.7951e-05
core: 4; bin: 1
Compute DAG
terminate called after throwing an instance of 'std::length_error'
  what():  cannot create std::vector larger than max_size()
Aborted (core dumped)

This happens, for example, with input matrix A = random_square_sparse(200, 0.5, 1.0, 2U);, and with Metis reordering disabled. Depending on the random parameters, some matrices fail while others run fine.

Locating bug

The error occurs inside the HDAGG function, at the following call with invalid vector size nlevels = -1:

std::vector<std::vector<int>> chunk_sizes(nlevels);

The invalid value -1 is returned by the build_levelSet_CSC function at line 44:

cur_level++; //all nodes_ with zero indegree are processed.
//assert(cur_level < n);
if (cur_level >= n)
return -1; // The input graph has a cycle
levelPtr[cur_level] = cur_levelCol; // The levelPtr starts from level 1. Behrooz
while (inDegree[begin] == 1) // Why? Behrooz
{
begin++;
if (begin >= n)
break;
}
while (inDegree[end] == 1 && begin <= end) // The same why as above. Behrooz
end--;
//Updating degrees after removing the nodes_
for (int l = levelPtr[cur_level - 1]; l < levelPtr[cur_level]; ++l) // I don't get this part. Behrooz
{
int cc = levelSet[l];
for (int j = Lp[cc]; j < Lp[cc + 1]; ++j)
{
if (Li[j] != cc) //skip diagonals
inDegree[Li[j]]--; //removing corresponding edges
}
}
//print_vec("dd\n",0,n,inDegree);

When error occurs, the variable state is cur_level == n == 2, then -1 is returned. Moreover, the code in line 45~64 is placed after return so will never get executed, so I guess some logic must be wrong here. The assert at line 42 should probably be enabled?

Steps to reproduce

Modify SpTRSV_runtime.cpp with the following input matrix:

n = 200;
double density = 0.5;
matrix_name = "Random_" + std::to_string(n);
A = random_square_sparse(n, density, 1.0, 2U);

...
// disable METIS
#undef METIS
#ifdef METIS
...
#else
  // avoid name conflict with another `tmp` variable later
  CSC *tmp_csc =
      make_half(Lower_A_CSC->n, Lower_A_CSC->p, Lower_A_CSC->i, Lower_A_CSC->x);
  delete Lower_A_CSC;
  Lower_A_CSC = tmp_csc;
  Lower_A_CSR = csc_to_csr(Lower_A_CSC);
#endif

I can also send a draft PR to show such bug, and a parameterized unit test to sweep over various input matrices (update: see #10).

Then, build and run:

cmake -B build_debug -DCMAKE_BUILD_TYPE=Debug
cmake --build build_debug --target Hdagg_SpTRSV
./build_debug/example/Hdagg_SpTRSV   # get bug

Deeper look with GDB:

gdb ./build_debug/example/Hdagg_SpTRSV
(check state of `n` before entering loop)
break src/hdagg/hdagg.cpp:28
(check state of `cur_level` before returning -1)
break src/hdagg/hdagg.cpp:44

(the first break point is inside `SpTrSv_LL_HDAGG` case, not yet to `SpTrSv_LL_Tree_HDAGG`, so no bug)
run
print n
(n equals 200 for this input)

(the second break point is inside the buggy `SpTrSv_LL_Tree_HDAGG`)
c
print n
(n equals 2 because ngroups is used as first argument of HDAGG() function)

(the third break point is just before returning -1)
print cur_level
(cur_level equals 2, so -1 is returned as result of build_levelSet_CSC() function)

c
(crashes when creating a vector of size -1)
@cheshmi
Copy link
Collaborator

cheshmi commented Dec 21, 2022

Thanks for raising the issue.
@BehroozZare is the creator of HDagg. He might be able to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants