Lower speed after using large batch size

Hi, authors! Thanks for your great work! But I have a question about the FPS at a large batch size. We have tested the latency at batchsize=1 on high-end GPUs, whose result is aligned with the reported speedup in Table 1. However, when we increase the batch size to 32 (or smaller, like 4, 16) as Table 1 does, the latency by dense or sparse inference is larger than cuDNN, which is against the reported results in Table 1. And the memory overhead is much larger than cuDNN. The experiment is conducted with YOLOv5s on MOT16, tested on a Tesla V100 GPU. The input size is set as (1088, 608), and we also tested the input size of (640, 640), whose result is similar. I'd appreciate it greatly if you could give some explanations!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lower speed after using large batch size #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Lower speed after using large batch size #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions