Nvidia Jetson Nano SBC - No CUDA device detected #387

ManuSB90 · 2020-11-22T03:19:19Z

Hi,

First of all thanks for the amazing work putting together this app, it is through app that I am for the first time learning about ML models and the python world. The wiki is very well documented and relay helpful and giving great insight on the working of the models for a noob like me 😃.

Wanted to build a self hosted NC server after on a Raspberry pi after having successfully installed this in an VM and run a scan.
But after finding the recommendation of having a machine with a dedicated GPU looked for a small SBC and found the Nvidia Jetson nano with some examples of face recognition projects using dlib/python built on it and working.

I have manage to install dlib + face_recognition via pip3 on the device but this did not build the needed shared library, as when installed pdlib it did not find dlib installed. So also installed dlib manually as per the instructions. It automatically detected as CUDA device and installed in CUDA mode.

Some details of the system:

nvjet@nvjet-desktop:~$ python3
Python 3.6.9 (default, Oct  8 2020, 12:12:24) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import dlib
>>> dlib.__version__
'19.21.0'
>>> dlib.DLIB_USE_CUDA
True
>>> dlib.cuda.get_num_devices()
1
>>>

After installing everything setup the pdlib-min-test-suite with the below result:

Note: PHP test is not so successful also seeing very small spikes of GPU use through jtop.

nvjet@nvjet-desktop:~/pdlib-min-test-suite$ make php-test
php scripts/face_detect.php
Welcome to pdlib min test suite for Facerecognition app...

First we try to open the models... Done

Processing file: input/Big Bang Theory.jpg
Number of faces detected: 0
Processing file: input/Big Bang Theory.png
Number of faces detected: 4
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done

Python test is more successful also registering prolonged 100% use of GPU while running scan on both pics.

nvjet@nvjet-desktop:~/pdlib-min-test-suite$ make python-test
python3 scripts/face_detect.py
Welcome to pdlib min test suite for Facerecognition app

First we try to open the models... Done

Processing file: input/Big Bang Theory.jpg
Number of faces detected: 6
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Processing file: input/Big Bang Theory.png
Number of faces detected: 7
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done

Also ran pdlib test suite after build:

nvjet@nvjet-desktop:~/pdlib$ make test

Build complete.
Don't forget to run 'make test'.


=====================================================================
PHP         : /usr/bin/php7.3 
PHP_SAPI    : cli
PHP_VERSION : 7.3.24-3+ubuntu18.04.1+deb.sury.org+1
ZEND_VERSION: 3.3.24
PHP_OS      : Linux - Linux nvjet-desktop 4.9.140-tegra #1 SMP PREEMPT Tue Oct 27 21:02:37 PDT 2020 aarch64
INI actual  : /home/nvjet/pdlib/tmp-php.ini
More .INIs  :   
CWD         : /home/nvjet/pdlib
Extra dirs  : 
VALGRIND    : Not used
=====================================================================
TIME START 2020-11-22 01:01:18
=====================================================================
PASS Check for pdlib presence [tests/001.phpt] 
PASS Basic tests for chinese_whispers [tests/chinese_whispers_basic.phpt] 
PASS Edge given in edges array for chinese_whispers functions is associative array [tests/chinese_whispers_edge_associative_array_error.phpt] 
PASS Edge elements given in edges array for chinese_whispers functions are not of long type [tests/chinese_whispers_edge_elements_not_long.phpt] 
PASS Edge given in edges array for chinese_whispers functions is not having all values to be arrays with 2 elements [tests/chinese_whispers_edge_not_2_element_error.phpt] 
PASS Edge given in edges array is not array for chinese_whispers functions [tests/chinese_whispers_edge_not_array_error.phpt] 
PASS Args given to chinese_whispers functions is not correct [tests/chinese_whispers_wrong_arg_type_error.phpt] 
PASS Testing CnnFaceDetection constructor without arguments [tests/cnn_face_detection_ctor_error.phpt] 
PASS Testing CnnFaceDetection constructor with model that do not exist [tests/cnn_face_detection_ctor_model_not_found_error.phpt] 
PASS Frontal face detection. [tests/dlib_face_detection.phpt] 
PASS Testing FaceLandmarkDetection constructor without arguments [tests/face_landmark_detection_ctor_error.phpt] 
PASS Testing FaceRecognition constructor without arguments [tests/face_recognition_ctor_error.phpt] 
SKIP Full test for face recognition - download models, detect faces, landmark detection and face recognition. [tests/integration_face_recognition.phpt] reason: bz2 extension missing
PASS Basic tests for dlib_vector_length [tests/vector_length.phpt] 
PASS Just test php extension version [tests/version.phpt] 
=====================================================================
TIME END 2020-11-22 01:01:23

=====================================================================
TEST RESULT SUMMARY
---------------------------------------------------------------------
Exts skipped    :    0
Exts tested     :   15
---------------------------------------------------------------------

Number of tests :   15                14
Tests skipped   :    1 (  6.7%) --------
Tests warned    :    0 (  0.0%) (  0.0%)
Tests failed    :    0 (  0.0%) (  0.0%)
Expected fail   :    0 (  0.0%) (  0.0%)
Tests passed    :   14 ( 93.3%) (100.0%)
---------------------------------------------------------------------
Time taken      :    5 seconds
=====================================================================

This report can be automatically sent to the PHP QA team at
http://qa.php.net/reports and http://news.php.net/php.qa.reports
This gives us a better understanding of PHP's behavior.
If you don't want to send the report immediately you can choose
option "s" to save it.	You can then email it to qa-reports@lists.php.net later.
Do you want to send this report now? [Yns]: n
nvjet@nvjet-desktop:~/pdlib$ php -v
PHP 7.3.24-3+ubuntu18.04.1+deb.sury.org+1 (cli) (built: Oct 31 2020 16:59:59) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.24, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.3.24-3+ubuntu18.04.1+deb.sury.org+1, Copyright (c) 1999-2018, by Zend Technologies
nvjet@nvjet-desktop:~/pdlib$ php -m
[PHP Modules]
bz2

I believe face recognition NC app should work on the system as python test suites seem to run fine on the device and it is used out there with real time projects like Doorbell Camera Python. (Have not manage yet to build a small python test script myself and test yet, need to learn bit more about python scripting first)

But not sure if the device detection is failing as this not being an conventional pcie GPU device, but a Tegra X1 SoC architecture... but again, out of depth here with my limited knowledge.

Hope can get some clarification if pdlib is working as expected and the reason of the scan failing to detec the device when in the python environment dlib config/projects are running fine?

Expected behaviour

Face scan to complete.

Actual behaviour

Getting the flowing error:

In DlibCnnModel.php line 191:
                                                                                                                                                                
  Error while calling cudaGetDevice(&the_device_id) in file /home/nvjet/dlib/dlib/cuda/gpu_data.cpp:204. code: 100, reason: no CUDA-capable device is detected  
                                                                                                                                                                

face:background_job [-u|--user_id USER_ID] [-M|--max_image_area MAX_IMAGE_AREA] [-t|--timeout TIMEOUT]

Steps to reproduce

sudo -u www-data php occ face:background_job -t 900

Server configuration

Operating system:
Ubuntu 18.04.5 LTS
Pdlib version:
v1.0.2
How is DLib installed: Make sure it is working correctly with this tool
git clone https://github.com/davisking/dlib.git
cd dlib/dlib
mkdir build
cd build
cmake -DBUILD_SHARED_LIBS=ON ..
make
sudo make install

Also tried forcing cuda flag: cmake -DBUILD_SHARED_LIBS=ON -D DLIB_USE_CUDA=1 ..
Both methods detect cuda library's and compile successfully.

How is PDlib installed: Make sure it is working correctly with this tool
git clone https://github.com/goodspb/pdlib.git
cd pdlib
phpize
./configure
make
sudo make install
PHP version:
nvjet@nvjet-desktop:~$ php -v
PHP 7.3.24-3+ubuntu18.04.1+deb.sury.org+1 (cli) (built: Oct 31 2020 16:59:59) ( NTS )
Web server:
nvjet@nvjet-desktop:~$ /usr/sbin/apache2 -v
Server version: Apache/2.4.29 (Ubuntu)
Database:
nvjet@nvjet-desktop:~$ mysql -V
mysql Ver 15.1 Distrib 10.1.47-MariaDB, for debian-linux-gnu (aarch64) using readline 5.2
Nextcloud version:
nextcloud-20.0.2.tar.bz2

Client configuration

Browser:
Operating system:

Logs

Background task log with debug.

sudo -u apache php occ -vvv face:background_job

nvjet@nvjet-desktop:/var/www/html/nextcloud$ sudo -u www-data php occ -vvv face:background_job
[sudo] password for nvjet: 
1/10 - Executing task CheckRequirementsTask (Check all requirements)
	System: Linux
	System memory: 4148293632
	PHP Memory Limit: Unknown
2/10 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/10 - Executing task LockTask (Acquire lock so that only one background task can run)
4/10 - Executing task DisabledUserRemovalTask (Purge all the information of a user when disable the analysis.)
yielding
5/10 - Executing task StaleImagesRemovalTask (Crawl for stale images (either missing in filesystem or under .nomedia) and remove them from DB)
	Skipping stale images removal for user admin as there is no need for it
6/10 - Executing task CreateClustersTask (Create new persons or update existing persons)
	Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
	* have 1000 faces already processed
	* or you need to have 95% of you images processed
	Use stats command to track progress
yielding
7/10 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
	Skipping full image scan for user admin
8/10 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
yielding
9/10 - Executing task ImageProcessingTask (Process all images to extract faces)
	NOTE: Starting face recognition. If you experience random crashes after this point, please look FAQ at https://github.com/matiasdelellis/facerecognition/wiki/FAQ
Error during background task execution
If error is not transient, this means that core component of face recognition is not working properly
and that quantity and quality of detected faces and person will be low or suboptimal.
You probably want to file an issue (please include exception below) to: https://github.com/matiasdelellis/facerecognition/issues

In DlibCnnModel.php line 191:
                                                                                                                                                                
  [Exception]                                                                                                                                                   
  Error while calling cudaGetDevice(&the_device_id) in file /home/nvjet/dlib/dlib/cuda/gpu_data.cpp:204. code: 100, reason: no CUDA-capable device is detected  
                                                                                                                                                                

Exception trace:
  at /var/www/html/nextcloud/apps/facerecognition/lib/Model/DlibCnnModel/DlibCnnModel.php:191
 CnnFaceDetection->__construct() at /var/www/html/nextcloud/apps/facerecognition/lib/Model/DlibCnnModel/DlibCnnModel.php:191
 OCA\FaceRecognition\Model\DlibCnnModel\DlibCnnModel->open() at /var/www/html/nextcloud/apps/facerecognition/lib/BackgroundJob/Tasks/ImageProcessingTask.php:113
 OCA\FaceRecognition\BackgroundJob\Tasks\ImageProcessingTask->execute() at /var/www/html/nextcloud/apps/facerecognition/lib/BackgroundJob/BackgroundService.php:120
 OCA\FaceRecognition\BackgroundJob\BackgroundService->execute() at /var/www/html/nextcloud/apps/facerecognition/lib/Command/BackgroundCommand.php:138
 OCA\FaceRecognition\Command\BackgroundCommand->execute() at /var/www/html/nextcloud/3rdparty/symfony/console/Command/Command.php:255
 Symfony\Component\Console\Command\Command->run() at /var/www/html/nextcloud/3rdparty/symfony/console/Application.php:1000
 Symfony\Component\Console\Application->doRunCommand() at /var/www/html/nextcloud/3rdparty/symfony/console/Application.php:271
 Symfony\Component\Console\Application->doRun() at /var/www/html/nextcloud/3rdparty/symfony/console/Application.php:147
 Symfony\Component\Console\Application->run() at /var/www/html/nextcloud/lib/private/Console/Application.php:215
 OC\Console\Application->run() at /var/www/html/nextcloud/console.php:100
 require_once() at /var/www/html/nextcloud/occ:11

face:background_job [-u|--user_id USER_ID] [-M|--max_image_area MAX_IMAGE_AREA] [-t|--timeout TIMEOUT]

The text was updated successfully, but these errors were encountered:

matiasdelellis · 2020-11-22T12:42:14Z

Hi @ManuSB90
I can't help you much because I never use CUDA. 😅
What I can tell you is that if the installation with PIP3 works, you should see the build logs to see how it is compiled, and reproduce the options when compiling dlib..

Sorry.. 😞

matiasdelellis · 2020-11-22T15:10:31Z

Hi @ManuSB90 ,
I just added the native c++ test., thich works exactly the same as the other two, but using pure dlib c++.

So, before testing with php, you must make sure this test pass to make sure you compiled dlib correctly. 😉

[matias@nube pdlib-min-test-suite]$ make cpp-test 
g++ -o face_detect -std=c++11 -O3 `pkg-config --libs dlib-1` scripts/face_detect.cpp
./face_detect input/Big\ Bang\ Theory.jpg input/Big\ Bang\ Theory.png
Welcome to pdlib min test suite for Facerecognition app

First we try to open the models... Done

Processing file: input/Big Bang Theory.jpg
Number of faces detected: 3
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Processing file: input/Big Bang Theory.png
Number of faces detected: 7
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done

On the other hand, change the size of the images since the jpg was very small. and no faces were detected. 😅

ManuSB90 · 2020-11-22T15:51:01Z

Hi Matias,

Thanks for this, hope my dream is not this short lived... 😞

This is exactly what i was going to ask next, help to determine if the issue is with pdlib / dlib compilation or CUDA drivers, so then to raise the issue in the according forums, but before raising an issue there just needed to dig a bit deeper somehow in the pdlib calls to see what dlib modules we are calling where the GPU validation is happening and failing and if this is cumming from my compiled library's, vs the pip ones working fine...

Run the test again with the below not so promising result:
(even the python test failing now, and the cpp stack error trace goes on an on... :S ):

php scripts/face_detect.php
Welcome to pdlib min test suite for Facerecognition app...

First we try to open the models... Done

Processing file: input/Big Bang Theory.jpg
Number of faces detected: 3
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Processing file: input/Big Bang Theory.png
PHP Fatal error:  Uncaught Exception: Error while calling cudnnGetConvolutionForwardWorkspaceSize( context(), descriptor(data), (const cudnnFilterDescriptor_t)filter_handle, (const cudnnConvolutionDescriptor_t)conv_handle, descriptor(dest_desc), (cudnnConvolutionFwdAlgo_t)forward_algo, &forward_workspace_size_in_bytes) in file /home/nvjet/dlib/dlib/cuda/cudnn_dlibapi.cpp:1026. code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED in /home/nvjet/pdlib-min-test-suite/scripts/face_detect.php:35
Stack trace:
#0 /home/nvjet/pdlib-min-test-suite/scripts/face_detect.php(35): CnnFaceDetection->detect('input/Big Bang ...')
#1 /home/nvjet/pdlib-min-test-suite/scripts/face_detect.php(53): findFaces('input/Big Bang ...')
#2 {main}
  thrown in /home/nvjet/pdlib-min-test-suite/scripts/face_detect.php on line 35
Makefile:27: recipe for target 'php-test' failed
make: *** [php-test] Error 255
nvjet@nvjet-desktop:~/pdlib-min-test-suite$ make python-test
python3 scripts/face_detect.py
Welcome to pdlib min test suite for Facerecognition app

First we try to open the models... Done

Processing file: input/Big Bang Theory.jpg
Number of faces detected: 3
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Face landmarks... Done
Face descriptor... Done
Processing file: input/Big Bang Theory.png
Traceback (most recent call last):
  File "scripts/face_detect.py", line 46, in <module>
    dets = detector(img)
RuntimeError: Error while calling cudnnGetConvolutionForwardWorkspaceSize( context(), descriptor(data), (const cudnnFilterDescriptor_t)filter_handle, (const cudnnConvolutionDescriptor_t)conv_handle, descriptor(dest_desc), (cudnnConvolutionFwdAlgo_t)forward_algo, &forward_workspace_size_in_bytes) in file /tmp/pip-build-c1h2zeax/dlib/dlib/cuda/cudnn_dlibapi.cpp:1026. code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED
Makefile:30: recipe for target 'python-test' failed
make: *** [python-test] Error 1
nvjet@nvjet-desktop:~/pdlib-min-test-suite$ make cpp-test
g++ -o face_detect -std=c++11 -O3 `pkg-config --libs dlib-1` scripts/face_detect.cpp
/tmp/ccPbzMwP.o: In function `dlib::resizable_tensor::set_size(long long, long long, long long, long long) [clone .constprop.2034]':
face_detect.cpp:(.text+0x3e0): undefined reference to `dlib::cuda::tensor_descriptor::set_size(int, int, int, int)'
face_detect.cpp:(.text+0x3e8): undefined reference to `dlib::gpu_data::set_size(unsigned long)'
face_detect.cpp:(.text+0x40c): undefined reference to `dlib::cuda::tensor_descriptor::set_size(int, int, int, int)'
/tmp/ccPbzMwP.o: In function `dlib::tensor::operator=(float) [clone .constprop.2036]':
face_detect.cpp:(.text+0xb20): undefined reference to `dlib::gpu_data::copy_to_host() const'
face_detect.cpp:(.text+0xc54): undefined reference to `dlib::cuda::set_tensor(dlib::tensor&, float)'
/tmp/ccPbzMwP.o: In function `dlib::memcpy(dlib::tensor&, dlib::tensor const&) [clone .constprop.2035]':
face_detect.cpp:(.text+0xcf4): undefined reference to `dlib::memcpy(dlib::gpu_data&, unsigned long, dlib::gpu_data const&, unsigned long, unsigned long)'
/tmp/ccPbzMwP.o: In function `dlib::resizable_tensor::resizable_tensor(dlib::resizable_tensor const&) [clone .constprop.2033]':
face_detect.cpp:(.text+0x10f8): undefined reference to `dlib::cuda::tensor_descriptor::tensor_descriptor()'
face_detect.cpp:(.text+0x11c0): undefined reference to `dlib::cuda::tensor_descriptor::~tensor_descriptor()'
/tmp/ccPbzMwP.o: In function `dlib::alias_tensor::operator()(dlib::tensor&, unsigned long) const [clone .constprop.2032]':
face_detect.cpp:(.text+0x1400): undefined reference to `dlib::cuda::tensor_descriptor::tensor_descriptor()'
face_detect.cpp:(.text+0x1444): undefined reference to `dlib::cuda::tensor_descriptor::set_size(int, int, int, int)'
/tmp/ccPbzMwP.o: In function `void dlib::loss_mmod_::to_label<dlib::dimpl::subnet_wrapper<dlib::add_layer<dlib::con_<1l, 9l, 9l, 1, 1, 4, 4>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<45l, 5l, 5l, 1, 1, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<45l, 5l, 5l, 1, 1, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<45l, 5l, 5l, 1, 1, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<32l, 5l, 5l, 2, 2, 0, 0>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<32l, 5l, 5l, 2, 2, 0, 0>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<16l, 5l, 5l, 2, 2, 0, 0>, dlib::input_rgb_image_pyramid<dlib::pyramid_down<6u> >, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, true, void>, std::vector<dlib::mmod_rect, std::allocator<dlib::mmod_rect> >*>(dlib::tensor const&, dlib::dimpl::subnet_wrapper<dlib::add_layer<dlib::con_<1l, 9l, 9l, 1, 1, 4, 4>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<45l, 5l, 5l, 1, 1, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<45l, 5l, 5l, 1, 1, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<45l, 5l, 5l, 1, 1, 2, 2>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<32l, 5l, 5l, 2, 2, 0, 0>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<32l, 5l, 5l, 2, 2, 0, 0>, dlib::add_layer<dlib::relu_, dlib::add_layer<dlib::affine_, dlib::add_layer<dlib::con_<16l, 5l, 5l, 2, 2, 0, 0>, dlib::input_rgb_image_pyramid<dlib::pyramid_down<6u> >, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, void>, true, void> const&, std::vector<dlib::mmod_rect, std::allocator<dlib::mmod_rect> >*, double) const [clone .constprop.2011]':
face_detect.cpp:(.text+0x2250): undefined reference to `dlib::gpu_data::copy_to_host() const'
/tmp/ccPbzMwP.o: In function `dlib::alias_tensor_instance::device_write_only()':
face_detect.cpp:(.text._ZN4dlib21alias_tensor_instance17device_write_onlyEv[_ZN4dlib21alias_tensor_instance17device_write_onlyEv]+0x18): undefined reference to `dlib::gpu_data::copy_to_device() const'
/tmp/ccPbzMwP.o: In function `dlib::alias_tensor_instance::device()':

matiasdelellis · 2020-11-22T16:06:00Z

Maybe like you should update CUDNN... 🤔

See Known Issues:

https://docs.nvidia.com/deeplearning/cudnn/release-notes/rel_7xx.html#rel_742

But I really can't help you beyond that because i don't have any experience with CUDNN. Observe how pip3 compiles, and try to reproduce it in your installation, then try the cpp test before trying php.. 😉

matiasdelellis · 2020-11-23T22:59:28Z

Hi @ManuSB90
If you can get it to work again with pip3, maybe you could try the new external model.

You still need to install Pdlib for face clustering, but the image analysis task (Where you get all the CUDA advantage) is done a with service in python... 🤔

Just replace your facerecognition folder with that, and follow the instructions of https://github.com/matiasdelellis/facerecognition-external-model

ManuSB90 · 2020-11-24T20:13:14Z

Hi @matiasdelellis ,

Thanks for the effort in building an external model 🙇

I have not manage to exactly know what the dlib python compile property's are or how they are different to the manual compile. or if the repository it is pulling dlib sources are different in any way prepared for arm/jetson nano. Just wanted to confirm if the install is different...

Also trying to pas the --install-option="--set BUILD_SHARED_LIBS=ON" but did not quite work :)
But did manage to get this which looks to be the failed build command, where it looks that it might just be the default install via setup.py

Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-i5j9_bt1/dlib/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-bpxveaob-record/install-record.txt --single-version-externally-managed --compile "--set BUILD_SHARED_LIBS=ON"" failed with error code 1 in /tmp/pip-build-i5j9_bt1/dlib/

In an other note, i did manage to run the cpp-test finally with the same result :
(just had to change change the order of the liking in the gcc omplie comamnd for it to complie for me, to:)

g++ -o face_detect -std=c++11 -O3 scripts/face_detect.cpp pkg-config --libs dlib-1

Then all 3 test failed consistently on the second .png image with, great 😅

RuntimeError: Error while calling cudnnGetConvolutionForwardWorkspaceSize( context(), descriptor(data), (const cudnnFilterDescriptor_t)filter_handle, (const cudnnConvolutionDescriptor_t)conv_handle, descriptor(dest_desc), (cudnnConvolutionFwdAlgo_t)forward_algo, &forward_workspace_size_in_bytes) in file /tmp/pip-build-c1h2zeax/dlib/dlib/cuda/cudnn_dlibapi.cpp:1026. code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED

CUDNN not being up to date wouldn't be the issue as currently the machine is at cuDNN 8.0.0.

I raised the issue in the Nvidia forums where very vauge and not that helpfull 🤦‍♂️

CUDNN_STATUS_NOT_SUPPORTED indicates the requested usage is out of our support scope.
Please check the detail information of cudnnGetConvolutionForwardWorkspaceSize here:

But did point to the documentation of the failing function:

https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnGetConvolutionForwardWorkspaceSize

And looks to be that the function is dealing with rerunning the amount of GPU memory workspace the user needs to allocate to be able to call. Then is this a memory issue?

Also noticed that in the first pull of the test suite the python test was completing the scan for both images fine.
so reverted the change on 'Dont upsample python test' commit and the python test is completing now, interesting...

Does that give any meaning full info? is the image to small?

matiasdelellis · 2020-11-24T20:36:52Z

Hi @ManuSB90

Also noticed that in the first pull of the test suite the python test was completing the scan for both images fine.
so reverted the change on 'Dont upsample python test' commit and the python test is completing now, interesting...

This is strange .. If you use upsample, it can detect more faces, but i should consume 4 times more memory in that test. 😕

you can do the same test with php? Just add an 1 here..
https://github.com/matiasdelellis/pdlib-min-test-suite/blob/master/scripts/face_detect.php#L35

🤔

ManuSB90 · 2020-11-24T22:35:51Z

Well this is counter-intuitive...

So yes, ran the php test as indicated with the 1 and it completed this time the scan for both photos.

But have monitored the GPU memory allocation and notice some differences:

Without the upscaling on the first picture it can just manage to detect 3 faces on the first jpg file and crash on the png. The GPU using a total of 1.6GB RAM.
With the upscaling, the process takes much longer to complete, taking GPU up to 2.4GB of system ram but detecting all 7 faces on both images.
In both cases the GPU is being 100% utilised which is good.

I feel like we could inquire this in the NVIDIA forums why that would be? (also am not able to pin exactly waht/how to ask... )

But at the same time I think we have proven some instances where the php dlib use is working correctly?

Again not sure if the upscaling is needed because the images are quite small in size in this sample test? and that if in the real NC implementation this would be needed as most of my pictures would be bigger, circa 4-6MB files.

Also coming back to the original issue, while running the NC background job the error doesn't seem to be the same facing here with the sample test.
Is that there is some GPU validation call that is happening before or it is actually failing in the same call but different reason?

Thanks :)

matiasdelellis · 2020-11-24T23:07:36Z

Well this is counter-intuitive...

Maybe we should report it to Dlib
Test this example, if if it fails in the same way putting 0, it could be reported..

https://github.com/davisking/dlib/blob/master/python_examples/cnn_face_detector.py#L56

So yes, ran the php test as indicated with the 1 and it completed this time the scan for both photos.

Great.. 😄
Maybe you must apply the same path here for use our application:

https://github.com/matiasdelellis/facerecognition/blob/master/lib/Model/DlibCnnModel/DlibCnnModel.php#L197

But have monitored the GPU memory allocation and notice some differences:

Without the upscaling on the first picture it can just manage to detect 3 faces on the first jpg file and crash on the png. The GPU using a total of 1.6GB RAM.
With the upscaling, the process takes much longer to complete, taking GPU up to 2.4GB of system ram but detecting all 7 faces on both images.
In both cases the GPU is being 100% utilised which is good.

The test images are small, so found few faces.
The upsample option literally just increases the size of the image (Double the sides, and quadruple the area.) to find more faces.. To process a larger image, you need more memory, processing, etc. Is just that. Everything is practically linear.

But at the same time I think we have proven some instances where the php dlib use is working correctly?

Try to reproduce it with the official examples, and report it... 😉

Again not sure if the upscaling is needed because the images are quite small in size in this sample test? and that if in the real NC implementation this would be needed as most of my pictures would be bigger, circa 4-6MB files.

As for the test, no matter how many faces it reports, the idea is just to check that pdlib is working correctly. The images are small just to make sure that it doesn't fail due to lack of memory.

As for our application, we have a more precise control of the size of the images. All images are standardized to one size so that they all take the same time, and memory consumption is controlled.

https://github.com/matiasdelellis/facerecognition/wiki/Settings#temporary-files

Is that there is some GPU validation call that is happening before or it is actually failing in the same call but different reason?

No idea!. 😅

ManuSB90 · 2020-11-29T22:39:36Z

Hi,

Apologies for the delay, got bit busy these last days... 😅

Maybe we should report it to Dlib
Test this example, if if it fails in the same way putting 0, it could be reported..

Tried dlib cnn_face_detector.py script and it's working with both.
'0' upscaling property detecting way less faces than with '1', but '1' not completing and failing on the below picture (where '0' completes detected 24 faces):

Processing file: ../examples/faces/bald_guys.jpg
Killed

So not able to reproduce the error we where getting with the pdlib test suite png picture:

RuntimeError: Error while calling cudnnGetConvolutionForwardWorkspaceSize( context(), descriptor(data), (const cudnnFilterDescriptor_t)filter_handle, (const cudnnConvolutionDescriptor_t)conv_handle, descriptor(dest_desc), (cudnnConvolutionFwdAlgo_t)forward_algo, &forward_workspace_size_in_bytes) in file /tmp/pip-build-c1h2zeax/dlib/dlib/cuda/cudnn_dlibapi.cpp:1026. code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED

may be something to do with the .png file format?

Maybe you must apply the same path here for use our application:

Tried changing the upscale property to '1' on line 197 as indicated but still failed once submitted:

The point where the background job code seems to fail seams to be even before than the pdlib test script, on the open call :

facerecognition/lib/BackgroundJob/Tasks/ImageProcessingTask.php

Line 113 in d267c34

$this->model->open();

Without even getting to the detect call, so our upscale change having no effect relay:

facerecognition/lib/BackgroundJob/Tasks/ImageProcessingTask.php

Line 140 in d267c34

$rawFaces = $this->model->detectFaces($tempImagePath);

9/10 - Executing task ImageProcessingTask (Process all images to extract faces)
	NOTE: Starting face recognition. If you experience random crashes after this point, please look FAQ at https://github.com/matiasdelellis/facerecognition/wiki/FAQ
Error during background task execution
If error is not transient, this means that core component of face recognition is not working properly
and that quantity and quality of detected faces and person will be low or suboptimal.
You probably want to file an issue (please include exception below) to: https://github.com/matiasdelellis/facerecognition/issues

In DlibCnnModel.php line 191:
                                                                                                                                                                
  [Exception]                                                                                                                                                   
  Error while calling cudaGetDevice(&the_device_id) in file /home/nvjet/dlib/dlib/cuda/gpu_data.cpp:204. code: 100, reason: no CUDA-capable device is detected  
                                                                                                                                                                

Exception trace:
  at /var/www/html/nextcloud/apps/facerecognition/lib/Model/DlibCnnModel/DlibCnnModel.php:191
 CnnFaceDetection->__construct() at /var/www/html/nextcloud/apps/facerecognition/lib/Model/DlibCnnModel/DlibCnnModel.php:191
 OCA\FaceRecognition\Model\DlibCnnModel\DlibCnnModel->open() at /var/www/html/nextcloud/apps/facerecognition/lib/BackgroundJob/Tasks/ImageProcessingTask.php:113
 OCA\FaceRecognition\BackgroundJob\Tasks\ImageProcessingTask->execute() at /var/www/html/nextcloud/apps/facerecognition/lib/BackgroundJob/BackgroundService.php:120
 OCA\FaceRecognition\BackgroundJob\BackgroundService->execute() at /var/www/html/nextcloud/apps/facerecognition/lib/Command/BackgroundCommand.php:138
 OCA\FaceRecognition\Command\BackgroundCommand->execute() at /var/www/html/nextcloud/3rdparty/symfony/console/Command/Command.php:255
 Symfony\Component\Console\Command\Command->run() at /var/www/html/nextcloud/3rdparty/symfony/console/Application.php:1000
 Symfony\Component\Console\Application->doRunCommand() at /var/www/html/nextcloud/3rdparty/symfony/console/Application.php:271
 Symfony\Component\Console\Application->doRun() at /var/www/html/nextcloud/3rdparty/symfony/console/Application.php:147
 Symfony\Component\Console\Application->run() at /var/www/html/nextcloud/lib/private/Console/Application.php:215
 OC\Console\Application->run() at /var/www/html/nextcloud/console.php:100
 require_once() at /var/www/html/nextcloud/occ:11

Also tried installing the external module as indicated, but after chaining and re-running the background job it is getting the same failure... i think the module change didn't work and still running module 1?

nvjet@nvjet-desktop:/var/www/html/nextcloud$ sudo -u www-data php occ face:setup -m 5
The model 5 (ExternalModel) will be installed
Install model 5 (ExternalModel) successfully done
The model 5 (ExternalModel) was configured as default
nvjet@nvjet-desktop:/var/www/html/nextcloud$ sudo -u www-data php occ face:background_job -t 900
1/10 - Executing task CheckRequirementsTask (Check all requirements)
2/10 - Executing task CheckCronTask (Check that service is started from either cron or from command)
3/10 - Executing task LockTask (Acquire lock so that only one background task can run)
4/10 - Executing task DisabledUserRemovalTask (Purge all the information of a user when disable the analysis.)
5/10 - Executing task StaleImagesRemovalTask (Crawl for stale images (either missing in filesystem or under .nomedia) and remove them from DB)
6/10 - Executing task CreateClustersTask (Create new persons or update existing persons)
	Skipping cluster creation, not enough data (yet) collected. For cluster creation, you need either one of the following:
	* have 1000 faces already processed
	* or you need to have 95% of you images processed
	Use stats command to track progress
7/10 - Executing task AddMissingImagesTask (Crawl for missing images for each user and insert them in DB)
	Finding missing images for user admin
8/10 - Executing task EnumerateImagesMissingFacesTask (Find all images which don't have faces generated for them)
9/10 - Executing task ImageProcessingTask (Process all images to extract faces)
	NOTE: Starting face recognition. If you experience random crashes after this point, please look FAQ at https://github.com/matiasdelellis/facerecognition/wiki/FAQ
Error during background task execution
If error is not transient, this means that core component of face recognition is not working properly
and that quantity and quality of detected faces and person will be low or suboptimal.
You probably want to file an issue (please include exception below) to: https://github.com/matiasdelellis/facerecognition/issues

In DlibCnnModel.php line 191:
                                                                                                                                                                
  Error while calling cudaGetDevice(&the_device_id) in file /home/nvjet/dlib/dlib/cuda/gpu_data.cpp:204. code: 100, reason: no CUDA-capable device is detected  
                                                                                                                                                                

face:background_job [-u|--user_id USER_ID] [-M|--max_image_area MAX_IMAGE_AREA] [-t|--timeout TIMEOUT]

I tried restarting Apache and still no change
have the flask service up and running getting the successful message when tested whit curl.

Not sure what to do next... 😅 any suggestions?
Might just give 😞 up and use the external module using my desktop machine with an i7 8700k and GTX 1070.

Thanks again for the support 💪

matiasdelellis · 2020-11-30T00:00:31Z

Also tried installing the external module as indicated, but after chaining and re-running the background job it is getting the same failure... i think the module change didn't work and still running module 1?

You did not set the model URL or API key..

https://github.com/matiasdelellis/facerecognition-external-model#use

matiasdelellis · 2020-11-30T00:08:53Z

Processing file: ../examples/faces/bald_guys.jpg
Killed

Means that the kernel killed the process due consuming a lot of memory. You must make the image smaller..

matiasdelellis closed this as completed Apr 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia Jetson Nano SBC - No CUDA device detected #387

Nvidia Jetson Nano SBC - No CUDA device detected #387

ManuSB90 commented Nov 22, 2020

matiasdelellis commented Nov 22, 2020

matiasdelellis commented Nov 22, 2020

ManuSB90 commented Nov 22, 2020

matiasdelellis commented Nov 22, 2020

matiasdelellis commented Nov 23, 2020

ManuSB90 commented Nov 24, 2020

matiasdelellis commented Nov 24, 2020 •

edited

ManuSB90 commented Nov 24, 2020

matiasdelellis commented Nov 24, 2020

ManuSB90 commented Nov 29, 2020

matiasdelellis commented Nov 30, 2020

matiasdelellis commented Nov 30, 2020

Nvidia Jetson Nano SBC - No CUDA device detected #387

Nvidia Jetson Nano SBC - No CUDA device detected #387

Comments

ManuSB90 commented Nov 22, 2020

Expected behaviour

Actual behaviour

Steps to reproduce

Server configuration

Client configuration

Logs

Background task log with debug.

matiasdelellis commented Nov 22, 2020

matiasdelellis commented Nov 22, 2020

ManuSB90 commented Nov 22, 2020

matiasdelellis commented Nov 22, 2020

matiasdelellis commented Nov 23, 2020

ManuSB90 commented Nov 24, 2020

matiasdelellis commented Nov 24, 2020 • edited

ManuSB90 commented Nov 24, 2020

matiasdelellis commented Nov 24, 2020

ManuSB90 commented Nov 29, 2020

matiasdelellis commented Nov 30, 2020

matiasdelellis commented Nov 30, 2020

matiasdelellis commented Nov 24, 2020 •

edited