You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried to recreate the benchmark results with the examples from the repository. The inference speed on my Jetson TX2 is much slower compared to the results in the table on the front page.
This is the log for classification.ipynb:
2018-07-01 22:18:34.878861: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-07-01 22:18:34.879005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.46GiB
2018-07-01 22:18:34.879066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-01 22:18:35.940353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-01 22:18:35.940441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-07-01 22:18:35.940466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-07-01 22:18:35.940661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4002 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Converted 230 variables to const ops.
2018-07-01 22:18:49.301345: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2018-07-01 22:18:50.402393: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2660] Max batch size= 1 max workspace size= 33554432
2018-07-01 22:18:50.402478: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2664] Using FP16 precision mode
2018-07-01 22:18:50.402500: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2666] starting build engine
2018-07-01 22:19:11.072290: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2671] Built network
2018-07-01 22:19:11.308241: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2676] Serialized engine
2018-07-01 22:19:11.318361: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2684] finished engine InceptionV1/my_trt_op0 containing 493 nodes
2018-07-01 22:19:11.318499: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2704] Finished op preparation
2018-07-01 22:19:11.339604: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2712] OK finished op building
2018-07-01 22:19:11.392810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-07-01 22:19:11.392929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-07-01 22:19:11.392958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-07-01 22:19:11.392980: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-07-01 22:19:11.393077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4002 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
(0.374037) golden retriever
(0.048114) miniature poodle
(0.042460) toy poodle
(0.036036) cocker spaniel, English cocker spaniel, cocker
(0.017122) standard poodle
Inference finished in 2712 ms
My only modification to the example code is time measurement around
The first call of tf_sess.run takes significantly longer than consecutive calls due to initialization. In the benchmark timings reported we averaged over several to calls to tf_sess.run, excluding the first call. Are you excluding the first call in your timing?
I have tried to recreate the benchmark results with the examples from the repository. The inference speed on my Jetson TX2 is much slower compared to the results in the table on the front page.
This is the log for classification.ipynb:
My only modification to the example code is time measurement around
I ran my tests after a reboot with
Without those commands the inference time is ~200 ms higher.
What am I missing here?
The text was updated successfully, but these errors were encountered: