Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MXNet backend for Keras #8697

Closed
sandeep-krishnamurthy opened this issue Dec 5, 2017 · 5 comments
Closed

MXNet backend for Keras #8697

sandeep-krishnamurthy opened this issue Dec 5, 2017 · 5 comments

Comments

@sandeep-krishnamurthy
Copy link

Hello Keras team,
 
@jiajiechen and myself have started working on adding MXNet backend for Keras 2. With the option to use MXNet as Keras backend, Keras users can benefit from MXNet’s performance and scalability, while still using their Keras skillset and models.

We are extending the work done by (@piiswrong, @kevinthesun, @yajiedesign, @howard0su) at https://github.com/dmlc/keras for Keras 1.2 with MXNet backend.

Here is the repo we are currently working on - https://github.com/deep-learning-tools/keras/tree/keras2_mxnet_backend (Note this is just a development fork for us to work initially, we will create PRs with keras from this repo). Till now we have added subset of operators in the MXNet backend.
 
To get focussed early feedback and continuously push out the work done for the users, we plan to create incremental PRs. First we plan to create PR with MXNet backend implementing basic operators(variable manipulations, Linear Algebra operations, element wise operations, shape operations and more), then we will create PR supporting CNN architectures to train, save, load and inference. Followed by support to Distributed multi-machine training capability for Keras users with MXNet backend, RNNs, Sparse and more.
 
Contributions are very welcome. Please participate in code reviews, early testing and development activities. You can create issues and PRs https://github.com/deep-learning-tools/keras/tree/keras2_mxnet_backend till we get a good stable code merged into fchollet/keras.

Thanks,

@sandeep-krishnamurthy
Copy link
Author

Hello Keras Community,

We have completed the majority of operators required for training a Keras MLP and CNNs with MXNet backend. (CPU, 1 GPU, multi-GPU).

I am currently working on fixing broken test cases and preparing to create a PR here with keras-team. I am hoping to create the first PR in next 2 weeks.

In the meantime, here is a quick guide to try out Keras with MXNet backend on the work in progress repository. https://github.com/deep-learning-tools/keras/wiki/Installation-Guide---Keras-with-MXNet-backend

Thanks,
Sandeep

@sandeep-krishnamurthy
Copy link
Author

Submitted first of the PR for adding MXNet backend support.

#9291

@dkasper26
Copy link

Great job for supporting mxnet as Keras backend!
I followed your Installation guide and after a forced reinstall of numpy I was able to start a Keras-Script which I used with TF so far: KERAS_BACKEND=mxnet time python /home1/bernd/git/keras/examples/mnist_cnn.py
...
it is running, but consuming 177s per epoch compared with 37s in case of TF.
I got the following warnings:
Using MXNet backend
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
/home1/bernd/.local/lib/python2.7/site-packages/keras/backend/mxnet_backend.py:89: UserWarning: MXNet Backend performs best with channels_first format. Using channels_last will significantly reduce performance due to the Transpose operations.
train_symbol = func(*args, **kwargs)
/home1/bernd/.local/lib/python2.7/site-packages/keras/backend/mxnet_backend.py:92: UserWarning: MXNet Backend performs best with channels_first format. Using channels_last will significantly reduce performance due to the Transpose operations.
test_symbol = func(*args, **kwargs)
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
...
any hints how to improve the performance ... following the hint 'channels_first'?

Thanks!
-Dieter

@sandeep-krishnamurthy
Copy link
Author

Hello @dkasper26,

  1. Required force re-install of numpy -> This should not be the case, I will take a look at it and fix the issue. My guess - MXNet pip package requires numpy <=1.13 and by default other package installs numpy 1.14. MXNet new release is fixing this upgrade.

  2. Performance: MXNet on CPU is slower than TF backend. MXNet on GPU is close to 1.67 times faster than TF backend. Slowness in CPU is mainly due to single threaded broadcast_add() operator only for CPU in MXNet. Related Issue - Broadcasting ops are slow apache/mxnet#8219.

  3. Channels_first / channels_last: MXNet gives best results with channels_first. You need to do two things. 1) Set image_data_format in keras.json to 'channels_first'. 2) You need to transpose the input image. If it was say, 32323 (channels_last), You need to transpose it to (33232). We are adding an API to this for you. Open PR - Added keras util API for conversion of data tensor from channels_last to channels_first using MXNet backend awslabs/keras-apache-mxnet#65

@julioasotodv
Copy link

Very interesting. I really look forward to having the mxnet backend in order to have better rnn training speed (TF is still slow with lstms)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants