Question: How to Maximize throughput and fps #51

jxmelody · 2018-07-18T03:20:07Z

Hi,
I used DALI in the purpose of improving the performance of my deep learning application .But it seems that I can't take the fully advantage of CPU core when I use DALI , which is feasible while I use tensorflow slice_input_producer (multi-thread) to load file .

So my question :

Is it possible to maximize throughput and fps by making the best of both CPU and GPU(eg. 12 core and 2 GTX 1080)? If yes, how to do that?
what's the meaning of num_threads when defining own pipeline? Is there some relevance with GPU or CPU threads?
when I use DALI , I have to set per_process_gpu_memory_fractionthe to limit TF memory(see #21 ), and batch size can't be set as a big size (I've tried 32, but 64 does not work). It seems that DALI needs much of GPU memory . Will the memory issue affect the performance of deep learning applications?
Could you please providing more general performance report based on more general develop env(such as GTX 1080P other than DGX-2)?
One about nvJPEG: nvJPEG can only do jpeg open operation, but DALI can do a lots of image augmentation(such as resize) .Why don't you add these features to nvJPEG for more general use rather than Data Loading of Deep Learning application?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

JanuszL · 2018-07-18T20:17:06Z

Hi,

The current idea of DALI is to allow easy offload of data loading and augmentation to the GPU. It was designed for scenarios where CPU is the bottleneck. In your case CPU shouldn't be and what you can do is to construct pipeline by assigning some operations to CPU and the rest to GPU that CPU is also utilized.
This relates to a number of CPU thread that is used to perform CPU operators. When you create pipeline you may assign it to given GPU by providing device_id, by providing num_threads you tell how big CPU thread pool should be. There is one thing that we need to document better, nvJpeg is executed partially by CPU, partially by GPU. For CPU it also creates a thread pool which size can be defined by passing num_threads argument. If you set num_threads to low value it could hurt performance. Please check how your different values work for you.
It is true, additional memory is required so data processing could be performed by DALI on GPU. We are working on reducing memory pressure as @ptrendx stated in Can't process big size image #21. In your case it makes you use small batch sizes and this could affect overall performance.
If you are asking for speed results for configurations where CPU processing power is not a bottleneck (like 1xGTX1080P), it should be almost the same comparing to test without DALI (even may be a bit slower due to DALI overhead). In such case, the main benefit of DALI is flexibility and ease of pipeline construction. That is why we don't provide general performance reports. Nevertheless, it is a good point and we may prepare a more thorough performance report.
nvJpeg is designed to provide Jpeg loading and decoding (mostly), it is not planned to be image processing library. For that DALI can be used and it is not necessarily limited only to Deep Learning applications. If you really need to build an own and custom processing pipeline how about mixing nvJpeg and NPP for processing?"

jxmelody · 2018-07-19T07:06:11Z

@JanuszL Thank you！

JanuszL added the question Further information is requested label Jul 18, 2018

JanuszL closed this as completed Jul 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to Maximize throughput and fps #51

Question: How to Maximize throughput and fps #51

jxmelody commented Jul 18, 2018 •

edited

JanuszL commented Jul 18, 2018

jxmelody commented Jul 19, 2018

Question: How to Maximize throughput and fps #51

Question: How to Maximize throughput and fps #51

Comments

jxmelody commented Jul 18, 2018 • edited

JanuszL commented Jul 18, 2018

jxmelody commented Jul 19, 2018

jxmelody commented Jul 18, 2018 •

edited