Skip to content
This repository has been archived by the owner on May 19, 2018. It is now read-only.

Examples are running very slow on AWS #53

Closed
pavelanni opened this issue Jun 18, 2017 · 1 comment
Closed

Examples are running very slow on AWS #53

pavelanni opened this issue Jun 18, 2017 · 1 comment

Comments

@pavelanni
Copy link

As I don't have any GPU available, I decided to try the examples on AWS. They are running very slow on AWS 'p2-xlarge' instance. Much slower than on the video and even slower than on my desktop which doesn't have any GPU (2-3 slower--my visual estimation)

My config: instance p2-xlarge Ubuntu 16.04.2, tensorflow 1.2, python3, CUDA 8.0, cudaDNN 6 (installed as .deb from NVIDIA)

What I have tested so far:

  1. CPU-GPU Tensorflow test from here: http://learningtensorflow.com/lesson10/ shows performance improvement of 11x when running matrix multiplication on GPU
  2. When I am running mnist_1.0 example and run nvidia-smi command in another window, it shows that the GPU is busy and it's running that same python process ID that is running the example.
  3. When the example starts it shows that it has found the GPU and is using it:
2017-06-18 18:35:07.317169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:00:1e.0
Total memory: 11.17GiB
Free memory: 11.11GiB
2017-06-18 18:35:07.317192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-06-18 18:35:07.317200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-06-18 18:35:07.317209: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0)

All signs confirm that Tensorflow is using the GPU, but why it's so slow? It seems I'm missing something, but I can't find it.
Thanks,
Pavel

@martin-gorner
Copy link
Owner

The videos in the presentation are accelerated to be 20 sec long whatever the length of the original training run.
If you do not have a GPU, I recommend Google ML Engine. It's a training cluster as a service. You have a sample in the mlengine folder.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants