Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process Killed without any Error #36

Open
BATspock opened this issue Apr 24, 2023 · 11 comments
Open

Process Killed without any Error #36

BATspock opened this issue Apr 24, 2023 · 11 comments

Comments

@BATspock
Copy link

I have the following specs:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 T...    On | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P5                8W /  60W|     54MiB /  4096MiB |     41%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

I see the following warning before the program is killed:
W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
I do not see other errors:

python whisperJAX.py 
2023-04-23 22:28:46.200680: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Killed

How can I resolve this issue? Please let me know if I need to share any more details

@sanchit-gandhi
Copy link
Owner

Hey @BATspock - could you check that JAX is correctly installed? See comment #30 (comment)

@BATspock
Copy link
Author

Hi @sanchit-gandhi, I do not believe this is a problem with the installations. I executed the script referred to in the comment above and got the following output:
Found 1 JAX devices of type NVIDIA GeForce RTX 3050 Ti Laptop GPU.

I am trying to run the basic starter code but still #getting the same error.

Starter code:

from whisper_jax import FlaxWhisperPipline

# instantiate pipeline
pipeline = FlaxWhisperPipline("openai/whisper-large-v2")


# JIT compile the forward call - slow, but we only do once
text = pipeline("audio.mp3")

# used cached function thereafter - super fast!!
text = pipeline("audio.mp3")

# print the text
print("Done")

@sanchit-gandhi
Copy link
Owner

The error message looks like it's a tensorflow issue (not a JAX / Whisper JAX one)? Could you maybe first try uninstalling tensorflow:

pip uninstall tensorflow
conda remove tensorflow

And then re-running the code?

@sanchit-gandhi
Copy link
Owner

Any luck @BATspock? Happy to help make this work here!

@BATspock
Copy link
Author

I was facing the same issue even after uninstalling and reinstalling TensorFlow. I tried to play around with my Nvidia drivers the, but now whisper-Jax is not detecting my GPU. However, I still see the same error of the process being killed without error.

 python whisperJAX.py 
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Killed

@kevin01881
Copy link

Same problem here. 😢

same

@kevin01881
Copy link

kevin01881 commented May 1, 2023

@BATspock

Update. I decided to ask GPT-4 what's going on and it told me that it's because there's insufficient memory. I checked it and it turns out GPT-4 is right! I see the memory fill up aaaaaaaall the way to 16 GB and then the app gets killed.

In your case, what you'll have to solve is the problem that it does not detect your GPU somehow, because obviously you wouldn't need as much RAM if you use your 3050. It is only because it somehow does not want to cooperate with your GPU, that it falls back on CPU and then tries to load the model into your system's RAM instead of in your GPU's VRAM, which is apparently also insufficient like in my case.

I don't have a GPU so I'm now shit out of luck. You'll be in luck if you figure out how to get the GPU working with it! :) But at least now you know the reason why the app keeps on getting killed! :)

@gamingflexer
Copy link

gamingflexer commented Jun 5, 2023

I was facing the same issue even after uninstalling and reinstalling TensorFlow. I tried to play around with my Nvidia drivers the, but now whisper-Jax is not detecting my GPU. However, I still see the same error of the process being killed without error.

 python whisperJAX.py 
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Killed

Still i would suggest to you CUDA greater than 12 !

For this i have wasted a lot of time it is the CUDA & jax incapability issues. If you have a fresh instance of only ubuntu this this steps

Install CUDA & Drivers

sudo apt-get install nvidia-driver-510-server
sudo restart
nvidia-smi

Install virtualenv

sudo apt-get update
sudo apt-get upgrade
apt install python3-virtualenv

Restart the instance after this !

Make a Venv & install dependencies

virtualenv -p python3 venv
source venv/bin/activate

pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

And then install whisper JAX. It works !

@BATspock
Copy link
Author

BATspock commented Jun 5, 2023

What exactly do you mean by fresh instance of only ubuntu? Are you working with a cloud spawned image?

@gamingflexer
Copy link

Yeah on cloud, but running larger files it gets killed. I am also trying out diff things. If i get anything will surely share.

On thing i noticed from htop is that the CPU is at 100% even for 1 file, while memory is 4.7GB/ 8GB

@gamingflexer
Copy link

Okay so the issue here is that "pool" which is getting created using mulit-processing is not getting closed. So every time we run the code in a "Flask or Flask + Celery Server" the program gets killed

  • New process are spawned (For ex we have kept the max NUM_PROC = 32 so new workers are started from 33)
  • Old one are never closed and makes new memory allocations

I have tried with pool.close() & pool.terminate() to close the workers but it was not working, maybe i missing something.

But yeah this was the issue i have checked the RAM & CPU usage on the tiny model.

Temp Solution

  • Comment out the mulit-processing lines !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants