Converted detection model for Caffe2 runs too slow on CPU #427
Comments
Your code looks good.
However, this figure is not shocking to me if you ran it against 4 cores or less. I converted a few others models and the order of magnitude was the same. It could make sense as NN inference can be highly parallelized. |
@gadcam $ nproc
56
$ tail -n 27 /proc/cpuinfo
processor : 55
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
stepping : 1
microcode : 0xb00002c
cpu MHz : 1202.578
cache size : 35840 KB
physical id : 1
siblings : 28
core id : 14
cpu cores : 14
apicid : 61
initial apicid : 61
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts
bugs :
bogomips : 4001.73
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management: I don't konw how to specify the core number (or thread number?) for inference; is it automatic ? But the problem is: |
@drcege how did you convert pkl to caffe2 pd files? i failed when convert , my python cmd: first time ,this error Traceback (most recent call last): @gadcam @daquexian |
@JaosonMa I converted R50-C4. I don't know whether FPN is supported now. |
Hello, about the performance , I also exported an I compiled caffe2 without NNPACK (not supported in lambda) on my own CPU (4 cores) , Inference time for an image of size (800x800) is between 10~ 13 seconds, another thing that my be slowing it down is that I tried to optimize the model to minimize memory consumption (lambda is limited to 3Gb ) using :
I'm also looking for an easy way to When I tested locally with NNPACK enabled , inference time was around 5~7 seconds |
@Bendidi So FPN export is now officially supported? Could you please also test the speed performance without FPN? |
Did you measure any improvement ? If so could share your tests please ? :) |
improvement in term of memory ? but it terms of memory, I had a loss of about 0.9 Gb (4.2 Gb -> 3.3 Gb) using |
I am stuck at converting detection model for Caffe2, I am using the cpu flag for conversion as I knew no Caffe inference can be done on GPU The conversion starts and is stuck at first layer after loading weights. Thanks |
I trained a detector for electricity meter based on
e2e_faster_rcnn_R-50-C4_1x.yaml
. The trained model works very well with Detectron on GPU. we have to deploy it on CPU, so I converted it to Caffe2 format byconvert_pkl_to_pb.py
.However, the
workspace.RunNet
executes for approximately 100 seconds per image. It is too slow.The attachment is my test code. ammeter_det.pdf
System information
python --version
output: 2.7.12The text was updated successfully, but these errors were encountered: