Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Zero Volatile GPU-Util #5

Closed
SannyZhou opened this issue Jan 22, 2019 · 5 comments
Closed

Question about Zero Volatile GPU-Util #5

SannyZhou opened this issue Jan 22, 2019 · 5 comments

Comments

@SannyZhou
Copy link

Hello,
I am trying to train and evaluate your LISA model on CoNLL dataset.
While trying to train the model on a GPU, I use the cmd as CUDA_VISIBLE_DEVICES=0 bin/evaluate.sh config/conll05-lisa.conf --save_dir model. However, it seems that nothing works on the GPU. The nvidia-smi shows volatile GPU-util is zero.
How to make best use of GPU for TensorFlow Estimators?
Do you have any ideas about the reason of this problem?

@patverga
Copy link

First thing to check is that you've installed tensorflow with gpu support. The default tensorflow package is cpu only.
pip3 install --user tensorflow-gpu

@SannyZhou
Copy link
Author

The package is tensorflow-gpu 1.9.0. @patverga

@strubell
Copy link
Owner

Does tensorflow output a line like:

2019-01-22 12:22:22.434234: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow
device (/job:localhost/replica:0/task:0/device:GPU:0 with 11428 MB memory)
-> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id:
0000:82:00.0, compute capability: 5.2)

If so, then it's using GPU. If not, then you likely have some kind of
configuration issue.

I would expect GPU usage to fluctuate a lot during evaluation, and in fact
for most of the time to be spent on CPU since the code calls the official
CoNLL evaluation scripts (perl). Currently I believe evaluation uses the
same batch size as training, but you could increase it depending on your
GPU's memory to make better use of the GPU.

The code currently doesn't have a "predict" mode, which simply outputs
predictions for sentences without evaluating. This may be more the
functionality you desire, and I'm happy to accept pull requests :)

@SannyZhou
Copy link
Author

Does tensorflow output a line like:

2019-01-22 12:22:22.434234: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow
device (/job:localhost/replica:0/task:0/device:GPU:0 with 11428 MB memory)
-> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id:
0000:82:00.0, compute capability: 5.2)

If so, then it's using GPU. If not, then you likely have some kind of
configuration issue.

I would expect GPU usage to fluctuate a lot during evaluation, and in fact
for most of the time to be spent on CPU since the code calls the official
CoNLL evaluation scripts (perl). Currently I believe evaluation uses the
same batch size as training, but you could increase it depending on your
GPU's memory to make better use of the GPU.

The code currently doesn't have a "predict" mode, which simply outputs
predictions for sentences without evaluating. This may be more the
functionality you desire, and I'm happy to accept pull requests :)

Thanks for your patient answer. I suddenly found that I set the parameter of debug as 1, which caused the high frequency of evaluation for validation and the low GPU usage.

@strubell
Copy link
Owner

strubell commented Feb 2, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants