ERROR: bucket overflow #55

victoryc · 2021-04-04T01:02:22Z

I occasionally get the following error for the code that normally works. Any idea what causes this error?

526/512 ERROR: bucket overflow! (n=823, bucket_num=2/4, key=0)
512/512 ERROR: bucket overflow! (n=823, bucket_num=2/4, key=0)
514/512 ERROR: bucket overflow! (n=823, bucket_num=2/4, key=0)
516/512 ERROR: bucket overflow! (n=823, bucket_num=2/4, key=0)
528/512 ERROR: bucket overflow! (n=823, bucket_num=2/4, key=0)
530/512 ERROR: bucket overflow! (n=823, bucket_num=2/4, key=0)
...

An observation that may or may not be relevant: I noticed that this error seems to occur when the number of coordinates in the sparse tensor is small. The same code ran fine without producing the above error when a bigger input with more coordinates was fed.

The text was updated successfully, but these errors were encountered:

zhijian-liu · 2021-04-05T19:55:00Z

Could you provide us with a sample code and data that can reproduce this error? Thanks!

victoryc · 2021-04-05T20:43:19Z

As of now, my program is quite big (involving multiple files) and the error occurs only occasionally. So, I am not in a position to provide simple sample code to reproduce this as of now.

Instead, I have tried to narrow this down and this is what I have found so far.
The error occurs when the torchsparse_backend.query_forward function is called at line 21 of functional/query.py. The top of the callstack:

forward (\home\username\projects\detect3d\myproj\torchsparse\torchsparse\nn\functional\query.py:20)
sphashquery (\home\username\projects\detect3d\myproj\torchsparse\torchsparse\nn\functional\query.py:40)
conv3d (\home\username\projects\detect3d\myproj\torchsparse\torchsparse\nn\functional\conv.py:144)
forward (\home\username\projects\detect3d\myproj\torchsparse\torchsparse\nn\modules\conv.py:72)
_call_impl (\home\username\anaconda3\envs\myproj\lib\python3.8\site-packages\torch\nn\modules\module.py:889)

The variable values in that function when the error occurs: C=823, hash_query.shape=([27, 823])

victoryc · 2021-04-08T01:33:14Z

I found out the cause of this in the code.
In query.cpp, the following line is used to compute the size of the table:
int table_size = 2 * pow(2,ceil(log2((double)n)));
In hashmap.cuh, the number of buckets is computed to be table_size/BUCKET_SIZE, where BUCKET_SIZE is 512.

When the hash values of the coordinates are evenly distributed, there is no problem. Taking the table_size to be 2 times next power of 2 greater than n, as done in the line above, provides some amount of tolerance for the case where the hash values are not evenly distributed. But, that is only a heuristic. When the hash values are more un-evenly distributed than that, the code fails with the "ERROR: bucket overflow!" messages.

victoryc · 2021-04-08T16:51:50Z

For the benefit of others running into this, let me mention how I addressed it for now. There used to be the following line in query.cpp:
int table_size = 2 * pow(2,ceil(log2((double)n)));
I replaced it by:

  const int nextPow2 = pow(2,ceil(log2((double)n)));
  // When n is large, the hash values tend to be more evenly distrubuted and choosing table_size to be 
  // 2 * nextPow2 typically suffices. For smaller n, the effect of uneven distribution of hash values is more
  // pronounced and hence we choose table_size to be 4 * nextPow2 to reduce the chance of bucket overflow.
  int table_size =  (n < 2048) ? 4 * nextPow2 : 2 * nextPow2;

This way, for larger n, when memory size consideration is more important, the table_size is getting computed to be the same as it used to be. For smaller n, when the effect of uneven distribution of hash values is more pronounced and memory size consideration is not as important, we compute the table_size to be bigger than it used to be, to reduce the chances of bucket overflow error.

Original authors: I will submit a pull request with this change. But, I won't be offended :-) if you reject this pull request since this change just reduces the chances of the bucket overflow error and doesn't address the issue in a robust, thorough way.

zhijian-liu · 2021-04-08T17:25:37Z

@victoryc, thank you so much for investigating this issue! I was discussing with @kentangSJTU this morning about the same solution. I will be happy to take a look after you submit the PR. We are also thinking about making this a knob to tune so that users can change that based on their needs. Take one step further, it might also be possible to dynamically adjust this if there is overflow.

harrydobbs · 2022-02-11T01:56:42Z

Hi there,
I am getting the same error. Are you still planning on adding a knob to tune?
Cheers,
Harry

zhijian-liu · 2022-02-12T02:31:14Z

Hi @harry1576, it's not in our plan for the near future. As a workaround, you may follow #55 (comment) to enlarge the table size further.

Akola-Mbey-Denis · 2023-04-19T13:47:22Z

Please I followed through the suggestions for this issue, but that did not fix it. Is there an alternative solution to this problem.

zhijian-liu self-assigned this Apr 5, 2021

victoryc closed this as completed Apr 8, 2021

Akola-Mbey-Denis mentioned this issue Apr 19, 2023

ERROR: bucket overflow #200

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: bucket overflow #55

ERROR: bucket overflow #55

victoryc commented Apr 4, 2021 •

edited

Loading

zhijian-liu commented Apr 5, 2021

victoryc commented Apr 5, 2021 •

edited

Loading

victoryc commented Apr 8, 2021 •

edited

Loading

victoryc commented Apr 8, 2021

zhijian-liu commented Apr 8, 2021

harrydobbs commented Feb 11, 2022

zhijian-liu commented Feb 12, 2022

Akola-Mbey-Denis commented Apr 19, 2023

ERROR: bucket overflow #55

ERROR: bucket overflow #55

Comments

victoryc commented Apr 4, 2021 • edited Loading

zhijian-liu commented Apr 5, 2021

victoryc commented Apr 5, 2021 • edited Loading

victoryc commented Apr 8, 2021 • edited Loading

victoryc commented Apr 8, 2021

zhijian-liu commented Apr 8, 2021

harrydobbs commented Feb 11, 2022

zhijian-liu commented Feb 12, 2022

Akola-Mbey-Denis commented Apr 19, 2023

victoryc commented Apr 4, 2021 •

edited

Loading

victoryc commented Apr 5, 2021 •

edited

Loading

victoryc commented Apr 8, 2021 •

edited

Loading