-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于yolov4的测速 #34
Comments
@qinxianglinya 时间这里你能描述的再清楚一点吗?cudaMemcpyAsync是异步拷贝数据到显存,因为是异步所以函数会立即执行完不阻塞。后面的cudaStreamSynchronize是用于同步的,等待执行完成。所以你统计出来的26 ms基本就是执行时间。还是不理解的话你可以查下那两个cuda函数的作用 |
下面是我打印的时间信息,不知道是不是打印的有问题。 |
|
ok,谢谢 |
@qinxianglinya 觉得有用点个start哈 |
嗯嗯 |
博主,你好。目前我已经在windows上将yolov4编译成功了。
环境:win10, tensorrt6.0.1.5, cuda10.0, cudnn7.6.5, 1080Ti。
目前我针对自己训练的模型进行了测速。
配置文件中图片大小为:800x800x3,tensorrt精度为FP16,batchsize为1。
enquequ()+cudaMemcpyAsync()的时间为1ms,但是cudaStreamSynchronize()操作花费了29ms,请问这个地方有没有能够改善了方法,非常感谢。
The text was updated successfully, but these errors were encountered: