Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of Simple Use Case - PyTorch vs Plain Python with Numpy #1630

Closed
makeyourownneuralnetwork opened this issue May 23, 2017 · 12 comments

Comments

@makeyourownneuralnetwork
Copy link

commented May 23, 2017

Summary - PyTorch seems to be half as fast as plain Python for very simple networks.

Context

I am learning PyTorch because it promises convenience and easy access to GPU acceleration.

In learning, I re-implemented my previous very simple tutorial 3-layer neural network created from scratch in Python with Numpy.

Code

The two are here:

Results

They are built to be as similar as possible - same size, same loss function, same training data, etc

The result of 2 timing tests show that the plain python version is about twice as fast as the PyTorch version:

  • home-made simple pure python - 440 seconds, 458 seconds
  • simple PyTorch version - 841 seconds, 834 seconds

More Detail

Here's a blog which details the experiment conditions in more detail (near the bottom of the post).

What Am I Doing Wrong?

Or does PyTorch have too much overhead for simple scenarios, and only really shines for larger or GPU accelerated cases?

@pavanky

This comment has been minimized.

Copy link
Contributor

commented May 23, 2017

It'll also be helpful if you mention how you installed numpy and PyTorch (for example what BLAS library are they using).

@apaszke

This comment has been minimized.

Copy link
Member

commented May 23, 2017

Right now our GPU libs are very optimized, but we can't say the same about the CPU backend. It's getting better, but we still aren't there where numpy is, and this is probably why are you seeing this difference. Can you try running your script with OMP_NUM_THREADS=X where X is 8 (or less if you have fewer cores)?

@makeyourownneuralnetwork

This comment has been minimized.

Copy link
Author

commented May 23, 2017

@pavanky I'm using the latest Anaconda Python 3.6 updated daily via conda

pytorch was installed using the conda method on the official pytorch GitHub readme

@makeyourownneuralnetwork

This comment has been minimized.

Copy link
Author

commented May 23, 2017

@apaszke thanks for your helpful reply

My MacBook Pro 13 early 2015 has only 1 cpu with 2 cores. I therefore set OMP_NUM_THREADS=2 and did the timings again.

The results were:

  • PyTorch - 928 and 995 seconds compared to the 841 and 834 seconds earlier.
  • Plain python with numpy again - 591 and 597 seconds.

I did this the comparison again as I am now running a browser showing live tv news, so need to make the environment equivalent to aid comparison. The ratio is 1.6 not in favour of PyTorch.

Interestingly, setting the OMP_NUM_THREADS=1 makes no difference, the result with PyTorch was 953 seconds, within the range above.

The environment variable was set in the notebook using:

%env OMP_NUM_THREADS=1

and was tested with %env

@rosinality

This comment has been minimized.

Copy link

commented May 24, 2017

Autodifferentiaion engine of PyTorch is very fast (so performance is similar or even better than the other frameworks), but it can be slower than raw numpy forward/backward implementation, especially when network is small. I think performance will be better if you implement same network with torch.Tensor like numpy, instead of Variable.

Sorry, this was my mistake. I've tested your code with smaller datasets and PyTorch implementation was 2 times faster

@makeyourownneuralnetwork

This comment has been minimized.

Copy link
Author

commented May 24, 2017

The readme.md on pytorch's GitHub page says:

Hence, PyTorch is quite fast -- whether you run small or large neural networks.

But the above comments suggest this isn't true.

I am now setting up a Google Cloud Compute GPU instance to test performance with CUDA. Weill update this issue if interesting.

@fmassa

This comment has been minimized.

Copy link
Member

commented May 24, 2017

@makeyourownneuralnetwork Note that for such a small network, you will probably see that cuda will be slower than CPU, and that is expected.

@makeyourownneuralnetwork

This comment has been minimized.

Copy link
Author

commented May 24, 2017

I've updated my blog .. the GPU results are a bit faster .. about 25%.

CPU	GPU
	
494	366
483	372
451	355
	
476.0	364.3 (averages)

See here for more: http://makeyourownneuralnetwork.blogspot.co.uk/2017/05/learning-mnist-with-gpu-acceleration.html

I will do 2 more tests .. one using the environment variable ... and one scaling the hidden layer to about 1000, to see if the relative performance improves.

@makeyourownneuralnetwork

This comment has been minimized.

Copy link
Author

commented May 25, 2017

Update - I did the tests comparing PyTorch in CPU vs GPU mode. The results are interesting and positive.

nodes CPU GPU
  
200 463 362
1000 803 356
2000 1174 366
5000 3390 518

pytorch_cpu_v_gpu

In GPU mode, the duration to complete MNIST training rises slowly, but in CPU mode it grows quickly, as the size of the network grows (hidden nodes).

More detail at the bottom of this post: http://makeyourownneuralnetwork.blogspot.co.uk/2017/05/learning-mnist-with-gpu-acceleration.html

So this is good. I think PyTorch is nice to work with .. I understand that the tutorials and beginner-friendly guides still need people to work on them... would be great when they're out. Also I will do an intro talk at the PyData monthly meetup in London on my experience beginning PyTorch.

Also the environment variable didn't seem to have an effect during my comparative test ?! @apaszke

Now if only it was faster than numpy .....

@soumith

This comment has been minimized.

Copy link
Member

commented May 26, 2017

great. I'd consider the issue closed then.
We will slowly and eventually converge to numpy for all of our workloads. Right now we are faster for some workloads on CPU, but as you showed slower for other.

@soumith soumith closed this May 26, 2017

@impredicative

This comment has been minimized.

Copy link

commented Sep 16, 2017

@makeyourownneuralnetwork This is beside the point, but why use numpy.dot when the @ operator works?

@soumith In newer Python, does PyTorch support the @ operator too?

@fmassa

This comment has been minimized.

Copy link
Member

commented Sep 17, 2017

@impredicative yes, it supports the @ operator, but it is equivalent to matmul and not dot (both also present in numpy)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.