Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT ResNet50 Segfaults with Telsa T4 #4

Closed
petertorelli opened this issue Aug 28, 2019 · 4 comments

Comments

@petertorelli
Copy link
Collaborator

commented Aug 28, 2019

User reports that MLMark abruptly segfaults when running TensorRT target on an x86 System with a Tesla T4, and not other warning messages given. See below.

-INFO- --------------------------------------------------------------------------------
-INFO- Welcome to the EEMBC MLMark(tm) Benchmark!
-INFO- --------------------------------------------------------------------------------
-INFO- MLMark Version       : 1.0.0
-INFO- Python Version       : 3.7
-INFO- CPU Name             : GenuineIntel Intel(R) Xeon(R) Platinum 8176 CPU @ 2.10GHz
-INFO- Total Memory (MiB)   : 127571
-INFO- # of Logical CPUs    : 112
-INFO- Instruction Set      : x86_64
-INFO- OS Platform          : Linux-4.4.0-131-generic-x86_64-with-debian-stretch-sid
-INFO- --------------------------------------------------------------------------------
-INFO- Models in this release:
-INFO-     resnet50       : ResNet-50 v1.0 [ILSVRC2012]
-INFO-     mobilenet      : MobileNet v1.0 [ILSVRC2012]
-INFO-     ssdmobilenet   : SSD-MobileNet v1.0 [COCO2017]
-INFO- --------------------------------------------------------------------------------
-INFO- Parsing config file config/trt-gpu-resnet50-fp32-throughput.json
-INFO- Task: Target 'tensorrt', Workload 'resnet50'
-INFO-     batch                : 1
-INFO-     concurrency          : 1
-INFO-     hardware             : gpu
-INFO-     iterations           : 1024
-INFO-     mode                 : throughput
-INFO-     precision            : fp32
failed to parse uff model
Entered in engine building part
Segmentation fault (core dumped)

@petertorelli petertorelli self-assigned this Aug 28, 2019

@petertorelli petertorelli added the bug label Aug 28, 2019

@petertorelli

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 4, 2019

Recommend to use TF1.13.1, TRT5.1.2, CUDA10.0, and version 410 of the driver. Although issues still reported.

Deferred until TRT6 target is released in 1.0.x.

@petertorelli

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 10, 2019

Appears related to these lines of code in the Net.py files for each model which import the library:

		resnetnet_lib=os.path.join(TRT_DIR,"cpp_environment","libs","libclass_resnet50.so")
		self.lib = cdll.LoadLibrary(resnetnet_lib)
		self.obj = self.lib.return_object()

Adding this line (prior to the self.lib.return_obect() call):

		self.lib.return_object.restype = ctypes.c_ulonglong

Fixes the problem on the target system. Since restype is a pointer, this was causing truncation errors. However, casting to ulonglong might introduce compatibility errors, need to investigate a pointer type instead that matches OS/arch.

@petertorelli

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 11, 2019

New branch trt-restype in progress.

@petertorelli

This comment has been minimized.

Copy link
Collaborator Author

commented Sep 13, 2019

The latest two merges (#7 and #8 ) solve T4-related problems on non-Jetpack OSes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.