Checking if bev_pool is compiled properly #63

Divadi · 2022-07-08T15:14:56Z

Hello, thank you for releasing the code.

I was trying to use bev_pool in other projects, but I found that my compilation of bev_pool doesn't seem to be yielding expected results. For a toy example:

device = "cuda:4"
bev_pool(
    torch.tensor([[5.0]], device=device),
    torch.tensor([[0, 0, 0, 0]], device=device),
    1, torch.tensor(1, device=device), torch.tensor(1, device=device), torch.tensor(1, device=device))

the output is

tensor([[[[[0.]]]]], device='cuda:4')

when I would expect it to be 5.0.

Please let me know if I have incorrectly used the function.
My environment is PyTorch 1.10.1, cudatoolkit 11.3.1, A6000 GPU.

Thank you!

The text was updated successfully, but these errors were encountered:

Divadi · 2022-07-08T15:44:39Z

Actually, it seems my compilation is okay; evaluating the camera-only baseline yields:

mAP: 0.3151                                                                                                                                                                                               
mATE: 0.7155
mASE: 0.2742
mAOE: 0.5419
mAVE: 0.8821
mAAE: 0.2595
NDS: 0.3902
Eval time: 92.3s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.498   0.570   0.161   0.127   0.989   0.241
truck   0.265   0.737   0.210   0.142   0.838   0.233
bus     0.341   0.728   0.197   0.083   1.578   0.299
trailer 0.147   0.970   0.232   0.529   0.659   0.062
construction_vehicle    0.076   0.955   0.487   1.043   0.106   0.391
pedestrian      0.348   0.748   0.304   1.388   0.863   0.755
motorcycle      0.272   0.720   0.260   0.557   1.620   0.084
bicycle 0.215   0.597   0.271   0.868   0.403   0.010
traffic_cone    0.495   0.593   0.332   nan     nan     nan
barrier 0.495   0.537   0.287   0.139   nan     nan

which is lower than expected (mAP 33.25, NDS 40.15) but still non-trivial.

Is my usage incorrect by any chance?

kentang-mit · 2022-07-08T16:19:55Z

That's quite interesting. I actually did not test bev_pool on small toy examples, I instead just integrated it into our pipeline and train the entire network, so there might be some boundary cases that I made some mistakes during the implementation.

Regarding the evaluation results, may I ask how many GPUs are you using? I also think the compilation should be correct, but such an accuracy drop looks unexpected to me.

Divadi · 2022-07-08T16:31:24Z

Evaluating is using 4 GPUs.

Actually, bev_pool is being really strange for me. When used as part of the pipeline, it yields reasonable results. So, I tried adding

import pickle
pickle.dump([feats, coords, B, D, H, W, x], open(PICKLE_PATH, 'wb+'))
assert False

right after

bevfusion/mmdet3d/ops/bev_pool/bev_pool.py

Line 96 in cb6cd78

x = x.permute(0, 4, 1, 2, 3).contiguous()

Then, I made another file loading the pickle results

import torch
from mmdet3d.ops import bev_pool
import pickle

def load_pickle(f):
    return pickle.load(open(f, 'rb'))

feats, coords, B, D, H, W, x = load_pickle(PICKLE_PATH)
k = bev_pool(feats, coords, B, D, H, W)

print((k != 0).sum(), (x != 0).sum())

And for some reason, the results are different!

tensor(0, device='cuda:2') tensor(4805600, device='cuda:2')

I've never had this issue with cuda operations before, and I'm not quite sure how to go about debugging this issue since it clearly works as part of the entire pipeline but not on its own

Divadi · 2022-07-08T16:44:04Z

Another detail: when I paste the toy example

device = x.device
a = bev_pool(
     torch.tensor([[5.0]], device=device),
     torch.tensor([[0, 0, 0, 0]], device=device),
        1, torch.tensor(1, device=device), torch.tensor(1, device=device), torch.tensor(1, device=device))
print(a)
assert False

and run it as part of the pipeline by pasting it after this line

bevfusion/mmdet3d/models/vtransforms/base.py

Line 167 in cb6cd78

x = bev_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])

the correct result is printed.

Is it possible that there's something wrong with my installation?

kentang-mit · 2022-07-10T20:08:11Z

I'm still working on that. Will get back to you once I finished investigating this issue.

kentang-mit · 2022-07-11T14:48:41Z

Hi @Divadi,

I looked into this issue recently. Would you mind trying out

CUDA_VISIBLE_DEVICES=4 python [your script].py

and modify the device to cuda:0? Besides, I've pushed a new commit to the repo, would you mind also trying out the latest version?

Best,
Haotian

kentang-mit · 2022-07-11T14:59:35Z

By the way, for multi-gpu evaluation, would you mind also exploring these two directions?

First, let's see whether things work out if you use all the available GPUs on your machine. I would assume that your machine has >4 GPUs because you have cuda:4.
Second, let's see whether the results are correct if you evaluate with only one GPU.

Divadi · 2022-07-11T17:32:53Z

Hi @Divadi,

I looked into this issue recently. Would you mind trying out
`
CUDA_VISIBLE_DEVICES=4 python [your script].py
and modify the device to cuda:0? Besides, I've pushed a new commit to the repo, would you mind also trying out the latest version?

Best, Haotian

Before the change, with the toy example above:

$ CUDA_VISIBLE_DEVICES=4 python tools/tmp.py 
tensor([[[[[5.]]]]], device='cuda:0')
$ python tools/tmp.py 
tensor([[[[[0.]]]]], device='cuda:4')

After the change:

$ CUDA_VISIBLE_DEVICES=4 python tools/tmp.py 
tensor([[[[[5.]]]]], device='cuda:0')
$ python tools/tmp.py 
tensor([[[[[5.]]]]], device='cuda:4')

Seems like that was the issue; really odd, but good catch!

By the way, for multi-gpu evaluation, would you mind also exploring these two directions?

First, let's see whether things work out if you use all the available GPUs on your machine. I would assume that your machine has >4 GPUs because you have cuda:4.

Second, let's see whether the results are correct if you evaluate with only one GPU.

I'll look into this soon, need a bit of time

Divadi · 2022-07-13T15:53:03Z

@kentang-mit

By the way, for multi-gpu evaluation, would you mind also exploring these two directions?

First, let's see whether things work out if you use all the available GPUs on your machine. I would assume that your machine has >4 GPUs because you have cuda:4.

Second, let's see whether the results are correct if you evaluate with only one GPU.

When evaluating with just one GPU or all GPUs, results are same as before.

kentang-mit · 2022-07-13T17:23:07Z

Thanks for the update. I'll investigate that.

Divadi · 2022-07-16T12:26:46Z

@kentang-mit
Hi, I have addressed the issue. The problem was my installation had Pillow 9.2.0, while the repository requires 8.4.0 to function properly. More details can be found
HuangJunJie2017/BEVDet#41

I think Pillow 8.4.0 should be listed as an important requirement (sorry if I missed it).

New results:

mAP: 0.3325
mATE: 0.6828
mASE: 0.2717
mAOE: 0.5379
mAVE: 0.9040
mAAE: 0.2505
NDS: 0.4015
Eval time: 89.4s

Per-class results:
Object Class    AP      ATE     ASE     AOE     AVE     AAE
car     0.523   0.541   0.159   0.124   0.969   0.225
truck   0.280   0.704   0.208   0.131   0.911   0.233
bus     0.353   0.681   0.191   0.084   1.559   0.296
trailer 0.167   0.985   0.233   0.504   0.660   0.052
construction_vehicle    0.082   0.859   0.481   1.056   0.121   0.364
pedestrian      0.367   0.724   0.303   1.393   0.863   0.753
motorcycle      0.296   0.721   0.256   0.547   1.768   0.073
bicycle 0.237   0.577   0.270   0.862   0.382   0.007
traffic_cone    0.517   0.524   0.332   nan     nan     nan
barrier 0.503   0.513   0.284   0.140   nan     nan

kentang-mit · 2022-07-16T14:29:30Z

Thank you for the very important hint. I'll add that to the README immediately!

kentang-mit self-assigned this Jul 8, 2022

Divadi closed this as completed Jul 16, 2022

kentang-mit mentioned this issue Aug 17, 2022

OptionalCUDAGuard #105

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checking if bev_pool is compiled properly #63

Checking if bev_pool is compiled properly #63

Divadi commented Jul 8, 2022 •

edited

Divadi commented Jul 8, 2022

kentang-mit commented Jul 8, 2022

Divadi commented Jul 8, 2022

Divadi commented Jul 8, 2022 •

edited

kentang-mit commented Jul 10, 2022

kentang-mit commented Jul 11, 2022 •

edited

kentang-mit commented Jul 11, 2022

Divadi commented Jul 11, 2022 •

edited

Divadi commented Jul 13, 2022

kentang-mit commented Jul 13, 2022

Divadi commented Jul 16, 2022

kentang-mit commented Jul 16, 2022

Checking if bev_pool is compiled properly #63

Checking if bev_pool is compiled properly #63

Comments

Divadi commented Jul 8, 2022 • edited

Divadi commented Jul 8, 2022

kentang-mit commented Jul 8, 2022

Divadi commented Jul 8, 2022

Divadi commented Jul 8, 2022 • edited

kentang-mit commented Jul 10, 2022

kentang-mit commented Jul 11, 2022 • edited

kentang-mit commented Jul 11, 2022

Divadi commented Jul 11, 2022 • edited

Divadi commented Jul 13, 2022

kentang-mit commented Jul 13, 2022

Divadi commented Jul 16, 2022

kentang-mit commented Jul 16, 2022

Divadi commented Jul 8, 2022 •

edited

Divadi commented Jul 8, 2022 •

edited

kentang-mit commented Jul 11, 2022 •

edited

Divadi commented Jul 11, 2022 •

edited