mobilenet-gpu not working with double-take #691

bigbangus · 2021-12-22T13:01:45Z

Describe the bug
Initially, mobilenet-gpu version appears to work in the GUI and I can successfully test my recognition application in the GUI using a stock photo. However, once I try to connect double-take to it using the url + key, it stalls and I receive errors in the compreface logs.

notes
-same behavior if I use Unraid single container version
-compreface regular version works fine with double-take in both single container version and docker-compose version
-same behavior with internal or external db
-same behavior with arcface gpu version

Hardware/OS
Unraid 6.10-rc2 (docker-compose plugin)
Nvidia Driver: 495.46 (patched)
GTX 1050Ti
Ryzen 9 3900X w/64MB RAM

initial GUI test works and subsequent tests work

then the double-take api connection fails

nvidia-smi:

docker logs:
compreface-db.txt
compreface-ui.txt
compreface-admin.txt
compreface-api.txt
compreface-core.txt

.env:
registry=exadel/ postgres_username=postgres postgres_password=postgres postgres_db=frs postgres_domain=compreface-postgres-db postgres_port=5432 email_host=smtp.gmail.com email_username= email_from= email_password= enable_email_server=false save_images_to_db=true compreface_api_java_options=-Xmx8g compreface_admin_java_options=-Xmx8g ADMIN_VERSION=0.6.1 API_VERSION=0.6.1 FE_VERSION=0.6.1 CORE_VERSION=0.6.1-mobilenet-gpu

docker-compose:
docker-compose.zip

The text was updated successfully, but these errors were encountered:

pospielov · 2021-12-22T16:54:04Z

According to logs, compreface-api makes a request to compreface-core and it answers too long, this is why it returns a timeout error.
compreface-core is a module with a neural network. Also, in compreface-core I don't see errors, but I see that uwsgi listen queue is full.
Also, I see on nvidia-smi that GPU is overloaded.
Is it possible that you do too many requests?

bigbangus · 2021-12-22T21:39:28Z

I see on nvidia-smi that GPU is overloaded. Is it possible that you do too many requests?

I think so. This is what nvidia-smi looks like when I'm uploading images to the web GUI. Only 17%.

I just don't understand why double-take would flood compreface with requests when trying to connect. And why this only happens with the gpu versions. I think it uses a sample lenna.jpg to test the API and to show a green status on compreface. But for whatever reason it goes nuts with the gpu version.

pospielov · 2021-12-23T17:44:14Z

How much time does it take for a response for the GPU version when you test through UI? Ideally, it should take less time.
Also, be aware that the first one-two requests will take more time as servers will load models and init caches.

jakowenko · 2021-12-27T03:22:53Z

I can shed a little insight on the Double Take side. On page load of /config or every 30 seconds, the detectors status is updated. There shouldn't be any spamming of the CompreFace API unless you keep refreshing the /config page on the DT UI.

bigbangus · 2021-12-28T15:54:00Z

How much time does it take for a response for the GPU version when you test through UI? Ideally, it should take less time. Also, be aware that the first one-two requests will take more time as servers will load models and init caches.

Yes this is accurate to my experiences. compreface regular takes several seconds to process each image. compreface-mobilenet-gpu takes under 1 sec to process the images. Both have an initial delay like you said.

I can shed a little insight on the Double Take side. On page load of /config or every 30 seconds, the detectors status is updated. There shouldn't be any spamming of the CompreFace API unless you keep refreshing the /config page on the DT UI.

Understood. Again, works fine in compreface regular. GPU versions it just floods it. Are the GPU versions too fast?

bigbangus · 2021-12-28T16:05:31Z

Today i tried again using an Ubuntu 20.04.3 LTS 64bit virtual machine with a GTX 1660Ti passed through on Unraid 6.10rc2. I installed docker, docker compose, and the nvidia docker drivers. Same result for the mobilenet gpu. Initially I passed it lenna.jpg and it works fine. All subsequent Web GUI requests work fine and are super fast. No issues to this point.

As soon as I connect double-take using the url and key it blows up again.

Not sure what else I can do here to help solve the issue. I feel like double-take is flooding compreface with requests but not sure why. Would love for this to work because it's so much faster on the gpu.

Docker logs before double-take connects:
compreface-admin.log
compreface-api.log
compreface-core.log
compreface-postgres-db.log
compreface-ui.log

Docker logs after double-take tries to connect
compreface-admin.log
compreface-api.log
compreface-core.log
compreface-fe.log
compreface-postgres-db.log

nvidia-smi:

docker ps

pospielov · 2021-12-30T12:04:38Z

Looks like I found the issue, here is an example of your request:
{"log":"10.129.0.250 - - [28/Dec/2021:15:45:02 +0000] \"POST /api/v1/recognition/recognize?face_plugins=undefined\u0026det_prob_threshold=0 HTTP/1.1\" 499 0 \"-\" \"axios/0.24.0\"\n","stream":"stdout","time":"2021-12-28T15:45:02.168531834Z"}
There is a param det_prob_threshold=0.
How the algorithm works - first it tries to find all faces on the image. As every other ML algorithm it can't say "yes" or "no", it says that this is a face with probability from 0 to 1. And this threshold should tell the algorithm what should it treat as a face.
If it equals zero, it just uses all found "faces". I tried locally if it equals the default value, it found one face. If it equals 0, it found 7129 "faces". Then the algorithm runs facial recognition on all 7129 "faces"...
So double-take does not flood Compreface with requests, it just sends a request that is too heavy as it tries to recognize thousands of faces on the image.
So why is the difference when you change the CompreFace version?
Looks like double-take always sends det_prob_threshold=0, but FaceNet(the default version) works different than InsightFace(all custom builds). With that same image and det_prob_threshold=0 FaceNet returned just one face.
Then I tried this image and by default, FaceNet found 14 faces:

When I send the same request with det_prob_threshold=0, FaceNet returned 17 faces.
Then I tried the same image with InsightFace, and by default, it returned only 7 faces:

If I set det_prob_threshold=0.1, it returned 14 faces.
If I set det_prob_threshold=0, it returned 4714 faces.
The default value for FaceNet is 0.85 and for InsightFace is 0.8 and we didn't change it from the original libraries.

So, you need to change this value in double-take if it's possible. If not, we need to ask @jakowenko to implement this functionality :)

bigbangus · 2021-12-31T01:04:18Z

@pospielov thank you for putting the time to find this and explain how det_prob_threshold works.

So, you need to change this value in double-take if it's possible. If not, we need to ask @jakowenko to implement this functionality :)

I can confirm that my double-take config has det_prob_threshold: 0.8 (see below).

@jakowenko can you confirm that det_prob_threshold is being correctly passed through to the compreface API?

double-take detector config:

# detector settings (default: shown below)
detectors:
  compreface:

    url: http://x.x.x.x:8000 #masked for privacy
    key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx #masked for privacy
   
    # number of seconds before the request times out and is aborted
    timeout: 15
    # minimum required confidence that a recognized face is actually a face
    # value is between 0.0 and 1.0
    det_prob_threshold: 0.8
    # comma-separated slugs of face plugins
    # https://github.com/exadel-inc/CompreFace/blob/master/docs/Face-services-and-plugins.md)
    # face_plugins: mask,gender,age

bigbangus · 2022-01-02T02:05:27Z

Yep pretty confident that it's just the status check in double-take that is hardcoded at det_prob_threshold = 0.

@jakowenko if you can update the code to use 0.1 or to use the det_prob_threshold defined in the double-take config for the lenna.jpg status check, that would probably solve the issue.

Thanks!

From compreface unraid log (regular version), you can see where status check lenna.jpg is at 0.0 despite the config being at 0.8.

pospielov · 2022-01-04T16:04:41Z

@bigbangus Probably you can create a pull request to double-take, looks like this is the line where det_prob_threshold is defined:
https://github.com/jakowenko/double-take/blob/252cbce65f4a94cc20d2cc9e333b43b8887655bf/api/src/util/detectors/compreface.js#L26

bigbangus · 2022-01-04T16:27:00Z

@bigbangus Probably you can create a pull request to double-take, looks like this is the line where det_prob_threshold is defined: https://github.com/jakowenko/double-take/blob/252cbce65f4a94cc20d2cc9e333b43b8887655bf/api/src/util/detectors/compreface.js#L26

yes I found that in the code as well, but I'm still new to programming, github, dockers and IT in general. I will read up on how to make a pull request and pursue that if the author doesn't have the time to update it. thanks!

bigbangus · 2022-01-04T19:14:06Z

Ok nevermind I watched this youtube video and made a pull request using my linux vm. So cool! Thanks for the tip. Love github!

https://github.com/jakowenko/double-take/pull/185/files

juan11perez · 2022-01-15T17:57:26Z

@bigbangus
I have very similar hardware (GTX 1050Ti, Ryzen 9 3900X w/64MB RAM) with unraid 6.9.2 and experienced exactly the same issue.
I have modified the double-take docker per your pull request and it works now works with mobilenet-gpu.

Thank you

bigbangus · 2022-01-17T19:17:47Z

@juan11perez awesome. I modified the compreface.js in the running docker and restarted double-take through it's own GUI and it works now. So yes seems like this change would be great once @jakowenko has time to address it!

{"severity": "DEBUG", "message": "Found: BoundingBoxDTO(x_min=214, y_min=193, x_max=350, y_max=395, probability=0.9944502115249634, _np_landmarks=array([[270.41437, 271.69827],\n [327.2149 , 273.5742 ],\n [308.94156, 315.77658],\n [270.0415 , 344.64978],\n [316.9687 , 346.79092]], dtype=float32))", "request": {"method": "POST", "path": "/find_faces", "filename": "lenna.jpg", "api_key": "", "remote_addr": "127.0.0.1"}, "logger": "src.services.facescan.plugins.insightface.insightface", "module": "insightface", "traceback": null, "build_version": "dev"}
{"severity": "INFO", "message": "200 OK", "request": {"method": "POST", "path": "/find_faces", "filename": "lenna.jpg", "api_key": "", "remote_addr": "127.0.0.1"}, "logger": "src.services.flask_.log_response", "module": "log_response", "traceback": null, "build_version": "dev"}
172.17.0.1 - - [17/Jan/2022:14:16:04 -0500] "POST /api/v1/recognition/recognize?face_plugins=undefined&det_prob_threshold=0.8 HTTP/1.1" 200 256 "-" "axios/0.24.0"

bigbangus · 2022-05-23T12:47:01Z

The PR is now merged and the issue is confirmed resolved with v.1.9.0 of double-take. Thank you @jakowenko @pospielov
https://github.com/jakowenko/double-take/releases/tag/v1.9.0

bigbangus mentioned this issue Dec 22, 2021

[BUG] error: TypeError: Cannot read properties of undefined (reading 'recognize') jakowenko/double-take#180

Open

bigbangus mentioned this issue Mar 29, 2022

Fix det_prob_threshold in compreface detector test jakowenko/double-take#185

Merged

bigbangus closed this as completed May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mobilenet-gpu not working with double-take #691

mobilenet-gpu not working with double-take #691

bigbangus commented Dec 22, 2021 •

edited

pospielov commented Dec 22, 2021

bigbangus commented Dec 22, 2021

pospielov commented Dec 23, 2021

jakowenko commented Dec 27, 2021

bigbangus commented Dec 28, 2021

bigbangus commented Dec 28, 2021

pospielov commented Dec 30, 2021 •

edited

bigbangus commented Dec 31, 2021 •

edited

bigbangus commented Jan 2, 2022

pospielov commented Jan 4, 2022

bigbangus commented Jan 4, 2022

bigbangus commented Jan 4, 2022

juan11perez commented Jan 15, 2022

bigbangus commented Jan 17, 2022

bigbangus commented May 23, 2022

mobilenet-gpu not working with double-take #691

mobilenet-gpu not working with double-take #691

Comments

bigbangus commented Dec 22, 2021 • edited

pospielov commented Dec 22, 2021

bigbangus commented Dec 22, 2021

pospielov commented Dec 23, 2021

jakowenko commented Dec 27, 2021

bigbangus commented Dec 28, 2021

bigbangus commented Dec 28, 2021

pospielov commented Dec 30, 2021 • edited

bigbangus commented Dec 31, 2021 • edited

bigbangus commented Jan 2, 2022

pospielov commented Jan 4, 2022

bigbangus commented Jan 4, 2022

bigbangus commented Jan 4, 2022

juan11perez commented Jan 15, 2022

bigbangus commented Jan 17, 2022

bigbangus commented May 23, 2022

bigbangus commented Dec 22, 2021 •

edited

pospielov commented Dec 30, 2021 •

edited

bigbangus commented Dec 31, 2021 •

edited