Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do you evaluate on cpu(not on gpu)? #29

Open
Jumabek opened this issue May 31, 2018 · 1 comment
Open

Why do you evaluate on cpu(not on gpu)? #29

Jumabek opened this issue May 31, 2018 · 1 comment

Comments

@Jumabek
Copy link

Jumabek commented May 31, 2018

Hi @kuangliu,
in the evaluation code
you are decoding boxes on the cpu . https://github.com/kuangliu/torchcv/blob/master/examples/fpnssd/eval.py#L57
My question is why don't use gpu?
UPDATE 1: below measurement is actually incorrrect, because in the beginning of the evaluation most of the time is spent on waiting to load the images.

I loaded anchor boxes to GPU and did box decoding on GPU for the bach_size=1.
Here is the result:

CPU : 17sec
GPU: 7 sec

This is for the batch size of 1, I am sure using larger batch_sizes will give more gain in time for GPU

@Jumabek
Copy link
Author

Jumabek commented Jun 5, 2018

When decoding boxes after network prediction with CUDA:

7/4462
net time: 0.00847935676574707
torch.Size([3, 512, 512]) torch.Size([1, 4]) torch.Size([1])
Box decode time: 2.547772169113159 
8/4462
net time: 0.008769035339355469
torch.Size([3, 512, 512]) torch.Size([1, 4]) torch.Size([1])
Box decode time: 2.870081663131714 
9/4462
net time: 0.008472204208374023
torch.Size([3, 512, 512]) torch.Size([1, 4]) torch.Size([1])
Box decode time: 2.3235297203063965 

Box Decoding on cpu:

7/4462
net time: 0.008519649505615234
torch.Size([3, 512, 512]) torch.Size([1, 4]) torch.Size([1])
Box decode time: 0.8019649982452393 
8/4462
net time: 0.008616447448730469
torch.Size([3, 512, 512]) torch.Size([1, 4]) torch.Size([1])
Box decode time: 1.4049720764160156 
9/4462
net time: 0.008512020111083984
torch.Size([3, 512, 512]) torch.Size([2, 4]) torch.Size([2])
Box decode time: 1.6696996688842773 

TL DR:

  1. When Evaluating main bottleneck is Box decoding,
    Since it works faster on cpu, it is decoded on cpu. (It should have been actually obvious, since gpus are good for parallel computation and box decoding is not parallel computation)
  2. If you want to speed up the evaluation process, you have to parallelize box_decoding code and then use gpu.
  3. For some reason I think even though it is not parallel, box_decode still taking long time than it should.

@Jumabek Jumabek mentioned this issue Jun 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant