Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU is not being used #67

Closed
derritter88 opened this issue Aug 26, 2021 · 25 comments · Fixed by #529
Closed

GPU is not being used #67

derritter88 opened this issue Aug 26, 2021 · 25 comments · Fixed by #529
Labels
enhancement New feature or request
Projects

Comments

@derritter88
Copy link

Hello @marcelklehr ,

I have enabled GPU support at the admin GUI but when I start a manual process via occ recognize:classify I can see that a process is being started and using 100 % a CPU core.
The GPU is not being used.

I have installed all the specified Nvidia applications/libraries

@marcelklehr
Copy link
Member

Hi!

Are there any messages in the nextcloud log?

@derritter88
Copy link
Author

Unfortunatley not - the only "warning" I can see in my log would be:
[recognize] Warning: Classifying photos of user 3A60C52D-9415-4F28-A2B7-71A8CBD7A9E3 at 2021-08-26T08:37:57+02:00

The only thing I can see on my shell is that www-data is running node-v14.17.4-linux-x64.
This processed cannot be stopped or killed - even a reboot does not solve it.
I need to reset the whole VM to have the processed killed.

@derritter88
Copy link
Author

What I see additional within the log (but it's not linked to my manual start of the classifying process) would be:
`[recognize] Warning: Classifier process output: 2021-08-26 07:59:08.434295: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[]

at 2021-08-26T07:59:08+02:00`

@derritter88
Copy link
Author

and:

[index] Error: Call to a member function getOwner() on null

GET /index.php/apps/recognize/admin/countMissed
from 192.168.10.2 by 3A60C52D-9415-4F28-A2B7-71A8CBD7A9E3 at 2021-08-26T08:19:11+02:00

But I am not sure if this is linked to this issue or not.

@derritter88
Copy link
Author

Okay so during the night the new version was able to be downloaded. I did so today morning.
Nextcloud 22.1.1
Recognize 1.6.3

When manually starting the process I get following error message:
Classifying photos of user ED17CAA4-EC2F-4457-95AB-A5980927C9C8
Failed to classify images
Classifier process error

My log would say:
[recognize] Warning: Classifier process output: 2021-08-27 06:42:20.937775: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Error: Cannot find module '@tensorflow/tfjs-node-gpu'
Require stack:

  • /var/www/cloud/apps/recognize/src/efficientnet/EfficientnetModel.js
  • /var/www/cloud/apps/recognize/src/classifier_imagenet.js
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:889:15)
    at Function.Module._load (internal/modules/cjs/loader.js:745:27)
    at Module.require (internal/modules/cjs/loader.js:961:19)
    at require (internal/modules/cjs/helpers.js:92:18)
    at Object. (/var/www/cloud/apps/recognize/src/efficientnet/EfficientnetModel.js:11:9)
    at Module._compile (internal/modules/cjs/loader.js:1072:14)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1101:10)
    at Module.load (internal/modules/cjs/loader.js:937:32)
    at Function.Module._load (internal/modules/cjs/loader.js:778:12)
    at Module.require (internal/modules/cjs/loader.js:961:19) {
    code: 'MODULE_NOT_FOUND',
    requireStack: [
    '/var/www/cloud/apps/recognize/src/efficientnet/EfficientnetModel.js',
    '/var/www/cloud/apps/recognize/src/classifier_imagenet.js'
    ]
    }
    Trying js-only mode
    internal/modules/cjs/loader.js:892
    throw err;
    ^

Error: Cannot find module '@tensorflow/tfjs-backend-wasm'
Require stack:

  • /var/www/cloud/apps/recognize/src/efficientnet/EfficientnetModel.js
  • /var/www/cloud/apps/recognize/src/classifier_imagenet.js
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:889:15)
    at Function.Module._load (internal/modules/cjs/loader.js:745:27)
    at Module.require (internal/modules/cjs/loader.js:961:19)
    at require (internal/modules/cjs/helpers.js:92:18)
    at Object. (/var/www/cloud/apps/recognize/src/efficientnet/EfficientnetModel.js:19:3)
    at Module._compile (internal/modules/cjs/loader.js:1072:14)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1101:10)
    at Module.load (internal/modules/cjs/loader.js:937:32)
    at Function.Module._load (internal/modules/cjs/loader.js:778:12)
    at Module.require (internal/modules/cjs/loader.js:961:19) {
    code: 'MODULE_NOT_FOUND',
    requireStack: [
    '/var/www/cloud/apps/recognize/src/efficientnet/EfficientnetModel.js',
    '/var/www/cloud/apps/recognize/src/classifier_imagenet.js'
    ]
    }

at 2021-08-27T06:42:20+02:00

@derritter88
Copy link
Author

derritter88 commented Aug 27, 2021

So it looks like '@tensorflow/tfjs-node-gpu & @tensorflow/tfjs-backend-wasm are not included in the NC app.

@marcelklehr
Copy link
Member

I've had to disable GPU for now, because the bundle would exceed the bundle size limit :/

@derritter88
Copy link
Author

I've had to disable GPU for now, because the bundle would exceed the bundle size limit :/

The limitation from the Nextcloud appstore?

@marcelklehr
Copy link
Member

Yeah

@derritter88
Copy link
Author

Okay, would it be possible that you create a "Github-only" version of it (e.g. xxx-RC1) so I can download and test it?

@marcelklehr
Copy link
Member

I'll definitely try to make something available. Currently, my problem is that I have to develop that blindly, as I don't have a GPU machine available.

@derritter88
Copy link
Author

If you want you can pack me the thing and I will act as your alpha-/beta tester?!

@arch-user-france1
Copy link

arch-user-france1 commented Sep 21, 2021

I'm testing it with my NVIDIA GeForce GTX 1660 super (cuda supported even I couldn't find it on the list)

First I have to set up another instance ..
I'm using an older version where it still is integrated

@arch-user-france1
Copy link

lol nextcloud apps is down :(

Now I can wait even longer

@marcelklehr marcelklehr added this to To do in Recognize Sep 29, 2021
@marcelklehr
Copy link
Member

GPU support has to wait until other issues are sorted out, sorry.

@derritter88
Copy link
Author

Okay so for the moment I can remove all necessary Nvidia libraries (except driver)?

@marcelklehr
Copy link
Member

Okay so for the moment I can remove all necessary Nvidia libraries (except driver)?

For the moment no NVIDIA drivers and libraries are needed, but they won't hurt either, so it's up to you.

@derritter88
Copy link
Author

It's just a bit complex to install different CUDA libraries/versions - that's why I am asking :-)
At the moment I sticking with CUDA 11.2 as you have mentioned it in a previous version

@derritter88
Copy link
Author

@marcelklehr just in case of: Windows now supports NVIDIA GPUs within its WSL which I am using.
So if you have any tests which I could do just let me know.

@marcelklehr marcelklehr added the enhancement New feature or request label Oct 21, 2022
@bugsyb
Copy link

bugsyb commented Dec 1, 2022

@derritter88, did you get it working?
I've NC in Docker and have been able to get containers gaining access to GPU, i.e. Tensorflow example:
docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Similar results get NVIDIA examples:

#docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Maxwell" with compute capability 5.0

> Compute 5.0 CUDA device: [NVIDIA GeForce GTX 960M]
5120 bodies, total time for 10 iterations: 6.155 ms
= 42.591 billion interactions per second
= 851.816 single-precision GFLOP/s at 20 flops per interaction

I've big archive of photos to get processed and running it on CPU is an overkill.

Thanks for hints on how to get it working - am not shy customizing NC container/whatever is needed.

@derritter88
Copy link
Author

@derritter88, did you get it working? I've NC in Docker and have been able to get containers gaining access to GPU, i.e. Tensorflow example: docker run --gpus all -it --rm tensorflow/tensorflow:latest-gpu python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Similar results get NVIDIA examples:

#docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "Maxwell" with compute capability 5.0

> Compute 5.0 CUDA device: [NVIDIA GeForce GTX 960M]
5120 bodies, total time for 10 iterations: 6.155 ms
= 42.591 billion interactions per second
= 851.816 single-precision GFLOP/s at 20 flops per interaction

I've big archive of photos to get processed and running it on CPU is an overkill.

Thanks for hints on how to get it working - am not shy customizing NC container/whatever is needed.

Hello @bugsyb ,

thanks for sharing this with me/us. Might be a useful information for some people but unfortunately I do not use Nextcloud as a Docker container.
I "just" have a regular dedicated Nextcloud VM.
I also had around ~100k of photos/images to classify but my CPU handled that over the last couple of weeks.

I had some discussions with @marcelklehr about it and the major problem would be to have an AI library like Tensorflow which could handled both Nvidia and AMD GPUs

@bugsyb
Copy link

bugsyb commented Dec 2, 2022

Hi @derritter88 ,

Thanks for swift response.

I did take a quick look at what gets installed as part of Recognize and smells like tensorflow-webgl gets there.
There is also flag in the code which suggest it should be possible even today:
process.env.RECOGNIZE_GPU

Hopes were that given your earlier engagement you'd know how to get Recognize using GPU.

I have also large number of photos to be processed and... well, hoped could leverage GPU which is wasted otherwise.

I run most of apps these days as containers, just for simplicity/dependency and easiness of portability between systems. Happy to share knowledge on the side if you'd be interested.

Re GPUs Nvidia and AMD, tensorflow allows to get it run both natively as well as in container, as demonstrated for Nvidia.

Here is small explanation covering AMD:
https://community.amd.com/t5/hsa/tensorflow-with-amd-gpu/td-p/199925
https://medium.com/analytics-vidhya/install-tensorflow-2-for-amd-gpus-87e8d7aeb812
https://www.amd.com/en/technologies/infinity-hub/tensorflow
https://tealfeed.com/install-tensorflow-gpu-amd-gpus-vbs7s

There was also other implementation DirectML, though as Internet claims, it was for Windows and WSL which standard Linux wouldn't count in as to be used (am not sure about the latter though).

If we could get started with Nvidia, which is more popular across people who would use it for Linux (not so much gaming ;) ) it would be great, especially as Tensorflow is already available.

I can't help much with AMD as don't have one.

@derritter88
Copy link
Author

To be honest: I gave up this topic and passed my GPU to a Plex VM for video transcoding but maybe @marcelklehr could improve the general logic of recognize?

@Doomsdayrs
Copy link

I have an AMD gpu in a laptop that I use for nextcloud

@arch-user-france1
Copy link

I have an AMD gpu in a laptop that I use for nextcloud

AMD GPUs probably won't work anyways.

marcelklehr added a commit that referenced this issue Dec 4, 2022
fixes #67

Signed-off-by: Marcel Klehr <mklehr@gmx.net>
Recognize automation moved this from To do to Done Dec 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging a pull request may close this issue.

5 participants