New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Keras support using multiple GPUs? #2436

Closed
henry0312 opened this Issue Apr 21, 2016 · 160 comments

Comments

Projects
None yet
@henry0312
Contributor

henry0312 commented Apr 21, 2016

Theano has supported multiple GPUs since v0.8.0.
(cf. Using multiple GPUs — Theano 0.8.0 documentation )
Does Keras also support using multiple GPUs?

For example, can I run the below task?

  1. Learn a sequential model A on gpu0
  2. Learn a sequential model B on gpu1
  3. Merge A and B on gpu0
@fchollet

This comment has been minimized.

Show comment
Hide comment
@fchollet

fchollet Apr 21, 2016

Collaborator

Yes, can run Keras models on multiple GPUs. This is only possible with the TensorFlow backend for the time being, because the Theano feature is still rather new. We are looking at adding support for multi-gpu in Theano in the near future (it should be fairly straightforward).

With the TensorFlow backend, you can achieve this the same way as you would in pure TensorFlow: by using the with tf.device(d) scope when defining Keras layers.

Collaborator

fchollet commented Apr 21, 2016

Yes, can run Keras models on multiple GPUs. This is only possible with the TensorFlow backend for the time being, because the Theano feature is still rather new. We are looking at adding support for multi-gpu in Theano in the near future (it should be fairly straightforward).

With the TensorFlow backend, you can achieve this the same way as you would in pure TensorFlow: by using the with tf.device(d) scope when defining Keras layers.

@henry0312

This comment has been minimized.

Show comment
Hide comment
@henry0312

henry0312 Apr 21, 2016

Contributor

We are looking at adding support for multi-gpu in Theano in the near future (it should be fairly straightforward).

I'm looking forward to it 😃
Thank you.

Contributor

henry0312 commented Apr 21, 2016

We are looking at adding support for multi-gpu in Theano in the near future (it should be fairly straightforward).

I'm looking forward to it 😃
Thank you.

@lemuriandezapada

This comment has been minimized.

Show comment
Hide comment
@lemuriandezapada

lemuriandezapada Apr 21, 2016

tf.device() scope?
Can you expand on this?
I haven't seen it in the api

lemuriandezapada commented Apr 21, 2016

tf.device() scope?
Can you expand on this?
I haven't seen it in the api

@jeffzhengye

This comment has been minimized.

Show comment
Hide comment
@jeffzhengye

jeffzhengye Apr 23, 2016

Contributor

Any example to use multiple gpus with TF?

Contributor

jeffzhengye commented Apr 23, 2016

Any example to use multiple gpus with TF?

@phalexo

This comment has been minimized.

Show comment
Hide comment
@phalexo

phalexo Apr 23, 2016

Hm. Theano has libgpuarray, which allows one to push shared variables to different devices. This will not do all the work for you of recombining weight matrices but with a little effort you could use multiple GPUs.

phalexo commented Apr 23, 2016

Hm. Theano has libgpuarray, which allows one to push shared variables to different devices. This will not do all the work for you of recombining weight matrices but with a little effort you could use multiple GPUs.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Apr 25, 2016

Contributor

There is platoon, a project on top of Theano for data parallelism. Should
be easy to use. We currently focus more on days parallelism then model
parallelism in Theano. But both are possible.

Fred
Le 23 avr. 2016 17:24, "phalexo" notifications@github.com a écrit :

Hm. Theano has libgpuarray, which allows one to push shared variables to
different devices. This will not do all the work for you of recombining
weight matrices but with a little effort you could use multiple GPUs.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2436 (comment)

Contributor

nouiz commented Apr 25, 2016

There is platoon, a project on top of Theano for data parallelism. Should
be easy to use. We currently focus more on days parallelism then model
parallelism in Theano. But both are possible.

Fred
Le 23 avr. 2016 17:24, "phalexo" notifications@github.com a écrit :

Hm. Theano has libgpuarray, which allows one to push shared variables to
different devices. This will not do all the work for you of recombining
weight matrices but with a little effort you could use multiple GPUs.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2436 (comment)

@fchollet

This comment has been minimized.

Show comment
Hide comment
@fchollet

fchollet Apr 25, 2016

Collaborator

I have looked into Platoon and it seemed like it was pretty much compatible
with Keras out of the box except for a couple lines of code. Easy to adapt,
in any case...

On 25 April 2016 at 05:46, Frédéric Bastien notifications@github.com
wrote:

There is platoon, a project on top of Theano for data parallelism. Should
be easy to use. We currently focus more on days parallelism then model
parallelism in Theano. But both are possible.

Fred
Le 23 avr. 2016 17:24, "phalexo" notifications@github.com a écrit :

Hm. Theano has libgpuarray, which allows one to push shared variables to
different devices. This will not do all the work for you of recombining
weight matrices but with a little effort you could use multiple GPUs.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2436 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2436 (comment)

Collaborator

fchollet commented Apr 25, 2016

I have looked into Platoon and it seemed like it was pretty much compatible
with Keras out of the box except for a couple lines of code. Easy to adapt,
in any case...

On 25 April 2016 at 05:46, Frédéric Bastien notifications@github.com
wrote:

There is platoon, a project on top of Theano for data parallelism. Should
be easy to use. We currently focus more on days parallelism then model
parallelism in Theano. But both are possible.

Fred
Le 23 avr. 2016 17:24, "phalexo" notifications@github.com a écrit :

Hm. Theano has libgpuarray, which allows one to push shared variables to
different devices. This will not do all the work for you of recombining
weight matrices but with a little effort you could use multiple GPUs.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2436 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2436 (comment)

@phalexo

This comment has been minimized.

Show comment
Hide comment
@phalexo

phalexo Apr 25, 2016

The way libgpuarray work is by mapping variables to different GPUs, and
then function automatically generates code to transfer data between GPUs as
needed.
On Apr 25, 2016 16:13, "François Chollet" notifications@github.com wrote:

I have looked into Platoon and it seemed like it was pretty much compatible
with Keras out of the box except for a couple lines of code. Easy to adapt,
in any case...

On 25 April 2016 at 05:46, Frédéric Bastien notifications@github.com
wrote:

There is platoon, a project on top of Theano for data parallelism. Should
be easy to use. We currently focus more on days parallelism then model
parallelism in Theano. But both are possible.

Fred
Le 23 avr. 2016 17:24, "phalexo" notifications@github.com a écrit :

Hm. Theano has libgpuarray, which allows one to push shared variables
to
different devices. This will not do all the work for you of recombining
weight matrices but with a little effort you could use multiple GPUs.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2436 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2436 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2436 (comment)

phalexo commented Apr 25, 2016

The way libgpuarray work is by mapping variables to different GPUs, and
then function automatically generates code to transfer data between GPUs as
needed.
On Apr 25, 2016 16:13, "François Chollet" notifications@github.com wrote:

I have looked into Platoon and it seemed like it was pretty much compatible
with Keras out of the box except for a couple lines of code. Easy to adapt,
in any case...

On 25 April 2016 at 05:46, Frédéric Bastien notifications@github.com
wrote:

There is platoon, a project on top of Theano for data parallelism. Should
be easy to use. We currently focus more on days parallelism then model
parallelism in Theano. But both are possible.

Fred
Le 23 avr. 2016 17:24, "phalexo" notifications@github.com a écrit :

Hm. Theano has libgpuarray, which allows one to push shared variables
to
different devices. This will not do all the work for you of recombining
weight matrices but with a little effort you could use multiple GPUs.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#2436 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2436 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#2436 (comment)

@themummy

This comment has been minimized.

Show comment
Hide comment
@themummy

themummy Jun 22, 2016

I have looked into Platoon and it seemed like it was pretty much compatible
with Keras out of the box except for a couple lines of code. Easy to adapt,
in any case...

What's the priority of adding multi GPU support for the theano backend?

themummy commented Jun 22, 2016

I have looked into Platoon and it seemed like it was pretty much compatible
with Keras out of the box except for a couple lines of code. Easy to adapt,
in any case...

What's the priority of adding multi GPU support for the theano backend?

@phalexo

This comment has been minimized.

Show comment
Hide comment
@phalexo

phalexo Jun 23, 2016

I think it would expand user base for Keras. I have several Titan X in the
same box. Please, take a look at libgpuarray as well.
On Jun 22, 2016 19:54, "themummy" notifications@github.com wrote:

I have looked into Platoon and it seemed like it was pretty much compatible
with Keras out of the box except for a couple lines of code. Easy to adapt,
in any case...

What's the priority of adding multi GPU support for the theano backend?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2436 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AEY95aBPElrTcVv0ZzPFyDgcDKMaw-0iks5qOcsXgaJpZM4IMTcS
.

phalexo commented Jun 23, 2016

I think it would expand user base for Keras. I have several Titan X in the
same box. Please, take a look at libgpuarray as well.
On Jun 22, 2016 19:54, "themummy" notifications@github.com wrote:

I have looked into Platoon and it seemed like it was pretty much compatible
with Keras out of the box except for a couple lines of code. Easy to adapt,
in any case...

What's the priority of adding multi GPU support for the theano backend?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2436 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AEY95aBPElrTcVv0ZzPFyDgcDKMaw-0iks5qOcsXgaJpZM4IMTcS
.

@tetmin

This comment has been minimized.

Show comment
Hide comment
@tetmin

tetmin Jul 26, 2016

How does this actually work in tensorflow? There is a brief tutorial here: http://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html, I understand the concept of running the model replicas on seperate GPU devices & then merging the weights, but how do we actually run this? instead of model.fit do we call merged.fit on the result of the merged models?

tetmin commented Jul 26, 2016

How does this actually work in tensorflow? There is a brief tutorial here: http://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html, I understand the concept of running the model replicas on seperate GPU devices & then merging the weights, but how do we actually run this? instead of model.fit do we call merged.fit on the result of the merged models?

@pengpaiSH

This comment has been minimized.

Show comment
Hide comment
@pengpaiSH

pengpaiSH Aug 21, 2016

@tetmin I have the same confusion as yours. Although the blog shows how to predict model in different GPUs, it is still unclear how to train the same model across different GPUs in a single machine, i.e. I need data parallelism and don't know how to implement it in Keras with TensorFlow as backend.

pengpaiSH commented Aug 21, 2016

@tetmin I have the same confusion as yours. Although the blog shows how to predict model in different GPUs, it is still unclear how to train the same model across different GPUs in a single machine, i.e. I need data parallelism and don't know how to implement it in Keras with TensorFlow as backend.

@rudaoshi

This comment has been minimized.

Show comment
Hide comment
@rudaoshi

rudaoshi Aug 23, 2016

Agreed with @pengpaiSH and @tetmin . Hope there would be more details.

rudaoshi commented Aug 23, 2016

Agreed with @pengpaiSH and @tetmin . Hope there would be more details.

@pengpaiSH

This comment has been minimized.

Show comment
Hide comment
@pengpaiSH

pengpaiSH Aug 23, 2016

@rudaoshi Well, I know this would not be proper to suggest since we are in the Keras community, and personally I am a Big Big Big fan of Keras! We know TensorFlow could utilize Multi-GPUs by computing averaging gradients across different devices, however, I am expecting Keras could provide a simple and unified API (Keras's Style) to help me focus my big picture end hide those IO/Parallel Computing details. For the time being, in order to make good use of multiple GPUs, I am doing my deep learning programs with MXNET, which I only specify the GPU IDs and the lib will do everything it needs under the hood.

pengpaiSH commented Aug 23, 2016

@rudaoshi Well, I know this would not be proper to suggest since we are in the Keras community, and personally I am a Big Big Big fan of Keras! We know TensorFlow could utilize Multi-GPUs by computing averaging gradients across different devices, however, I am expecting Keras could provide a simple and unified API (Keras's Style) to help me focus my big picture end hide those IO/Parallel Computing details. For the time being, in order to make good use of multiple GPUs, I am doing my deep learning programs with MXNET, which I only specify the GPU IDs and the lib will do everything it needs under the hood.

@WenchenLi

This comment has been minimized.

Show comment
Hide comment
@WenchenLi

WenchenLi Aug 25, 2016

@fchollet I saw your blog with multi gpu training, thanks for pointing out the way doing multi gpu training, but I would really appreciate it if say model.fit() has a gpu=n option, I'll willing to implement my own version on that, may I ask for suggestions? or I'm willing to contribute on the multi gpu training within keras with more abstraction from end users. Thanks in advance!

WenchenLi commented Aug 25, 2016

@fchollet I saw your blog with multi gpu training, thanks for pointing out the way doing multi gpu training, but I would really appreciate it if say model.fit() has a gpu=n option, I'll willing to implement my own version on that, may I ask for suggestions? or I'm willing to contribute on the multi gpu training within keras with more abstraction from end users. Thanks in advance!

@pengpaiSH

This comment has been minimized.

Show comment
Hide comment
@pengpaiSH

pengpaiSH Aug 26, 2016

@WenchenLi +1, gpus=0,1,2... is exactly what I need!

pengpaiSH commented Aug 26, 2016

@WenchenLi +1, gpus=0,1,2... is exactly what I need!

@acrosson

This comment has been minimized.

Show comment
Hide comment
@acrosson

acrosson Sep 10, 2016

@WenchenLi did you create a PR for multigpu abstraction?

acrosson commented Sep 10, 2016

@WenchenLi did you create a PR for multigpu abstraction?

@anewlearner

This comment has been minimized.

Show comment
Hide comment
@anewlearner

anewlearner Oct 21, 2016

Hope someone can contribute on the multi gpu training within keras. Thanks in advance.

I have two gpus. I did not do anything to set which gpu would be used for training. But when I used the nvidia-smi to check memory. I found almost all of the memory in two gpus were in use. I thought only one gpu would be used.

anewlearner commented Oct 21, 2016

Hope someone can contribute on the multi gpu training within keras. Thanks in advance.

I have two gpus. I did not do anything to set which gpu would be used for training. But when I used the nvidia-smi to check memory. I found almost all of the memory in two gpus were in use. I thought only one gpu would be used.

@acrosson

This comment has been minimized.

Show comment
Hide comment
@acrosson

acrosson Oct 21, 2016

@anewlearner apparently this is the intended functionality of TF.
Use export CUDA_VISIBLE_DEVICES="0".

See tensorflow/tensorflow#5066 for details

Looking forward to a simplified version of mult-gpu :)

acrosson commented Oct 21, 2016

@anewlearner apparently this is the intended functionality of TF.
Use export CUDA_VISIBLE_DEVICES="0".

See tensorflow/tensorflow#5066 for details

Looking forward to a simplified version of mult-gpu :)

@jonilaserson

This comment has been minimized.

Show comment
Hide comment
@jonilaserson

jonilaserson Oct 27, 2016

For data parallelization in keras, you can use this approach:

import tensorflow as tf

from keras import backend as K

from keras.models import Model

from keras.layers import Input, merge

from keras.layers.core import Lambda

def slice_batch(x, n_gpus, part):

sh = K.shape(x)

L = sh[0] / n_gpus

if part == n_gpus - 1:

    return x[part*L:]

return x[part*L:(part+1)*L]

def to_multi_gpu(model, n_gpus=2):

with tf.device('/cpu:0'):

    x = Input(model.input_shape[1:], name=model.input_names[0])


towers = []

for g in range(n_gpus):

    with tf.device('/gpu:' + str(g)):

        slice_g = Lambda(slice_batch, lambda shape: shape,

arguments={'n_gpus':n_gpus, 'part':g})(x)

        towers.append(model(slice_g))


    with tf.device('/cpu:0'):

        merged = merge(towers, mode='concat', concat_axis=0)


return Model(input=[x], output=merged)

To use just take any model and set model = to_multi_gpu(model).

model.fit() and model.predict() should work without any change.

On Fri, Oct 21, 2016 at 6:13 PM, Alexander notifications@github.com wrote:

@anewlearner https://github.com/anewlearner apparently this is the
intended functionality of TF.
Use export CUDA_VISIBLE_DEVICES="0".

See tensorflow/tensorflow#5066
tensorflow/tensorflow#5066 for details

Looking forward to a simplified version of mult-gpu :)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2436 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFdLCAoGsCYxA9CVIN7IIJqX7ddkxEurks5q2NaagaJpZM4IMTcS
.

jonilaserson commented Oct 27, 2016

For data parallelization in keras, you can use this approach:

import tensorflow as tf

from keras import backend as K

from keras.models import Model

from keras.layers import Input, merge

from keras.layers.core import Lambda

def slice_batch(x, n_gpus, part):

sh = K.shape(x)

L = sh[0] / n_gpus

if part == n_gpus - 1:

    return x[part*L:]

return x[part*L:(part+1)*L]

def to_multi_gpu(model, n_gpus=2):

with tf.device('/cpu:0'):

    x = Input(model.input_shape[1:], name=model.input_names[0])


towers = []

for g in range(n_gpus):

    with tf.device('/gpu:' + str(g)):

        slice_g = Lambda(slice_batch, lambda shape: shape,

arguments={'n_gpus':n_gpus, 'part':g})(x)

        towers.append(model(slice_g))


    with tf.device('/cpu:0'):

        merged = merge(towers, mode='concat', concat_axis=0)


return Model(input=[x], output=merged)

To use just take any model and set model = to_multi_gpu(model).

model.fit() and model.predict() should work without any change.

On Fri, Oct 21, 2016 at 6:13 PM, Alexander notifications@github.com wrote:

@anewlearner https://github.com/anewlearner apparently this is the
intended functionality of TF.
Use export CUDA_VISIBLE_DEVICES="0".

See tensorflow/tensorflow#5066
tensorflow/tensorflow#5066 for details

Looking forward to a simplified version of mult-gpu :)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2436 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFdLCAoGsCYxA9CVIN7IIJqX7ddkxEurks5q2NaagaJpZM4IMTcS
.

@themummy

This comment has been minimized.

Show comment
Hide comment
@themummy

themummy Oct 27, 2016

@jonilaserson , looks great! Does this work with the Theano backend or only TF?

themummy commented Oct 27, 2016

@jonilaserson , looks great! Does this work with the Theano backend or only TF?

@pengpaiSH

This comment has been minimized.

Show comment
Hide comment
@pengpaiSH

pengpaiSH Oct 28, 2016

@jonilaserson Could you please provide more detailed comments for the codes?
For example, what's the purpose of slice_g? And what does tower do actually? Thank you !

pengpaiSH commented Oct 28, 2016

@jonilaserson Could you please provide more detailed comments for the codes?
For example, what's the purpose of slice_g? And what does tower do actually? Thank you !

@anewlearner

This comment has been minimized.

Show comment
Hide comment
@anewlearner

anewlearner Oct 28, 2016

I tested the code provided by @jonilaserson and got a error.
merged = merge(towers, mode='concat', concat_axis=0)
Exception: A Merge should only be applied to a list of layers with at least 2 elements. Found: [<keras.engine.training.Model object at 0x7f9c1c3123d0>]

anewlearner commented Oct 28, 2016

I tested the code provided by @jonilaserson and got a error.
merged = merge(towers, mode='concat', concat_axis=0)
Exception: A Merge should only be applied to a list of layers with at least 2 elements. Found: [<keras.engine.training.Model object at 0x7f9c1c3123d0>]

@pengpaiSH

This comment has been minimized.

Show comment
Hide comment
@pengpaiSH

pengpaiSH Oct 31, 2016

@anewlearner Have you solved the problem that you met with before?

pengpaiSH commented Oct 31, 2016

@anewlearner Have you solved the problem that you met with before?

@jonilaserson

This comment has been minimized.

Show comment
Hide comment
@jonilaserson

jonilaserson Oct 31, 2016

@Carol

There was an indentation error in the code I posted.
The [with tf.device('/cpu:0')] paragraph should be outside the loop.

Here is a piece of code that should work:

import tensorflow as tf

from keras import backend as K

from keras.models import Model

from keras.layers import Input, merge

from keras.layers.core import Lambda

def slice_batch(x, n_gpus, part):

"""

Divide the input batch into [n_gpus] slices, and obtain slice no.

[part].

i.e. if len(x)=10, then slice_batch(x, 2, 1) will return x[5:].

"""

sh = K.shape(x)

L = sh[0] / n_gpus

if part == n_gpus - 1:

    return x[part*L:]

return x[part*L:(part+1)*L]

def to_multi_gpu(model, n_gpus=2):

"""Given a keras [model], return an equivalent model which parallelizes

the computation over [n_gpus] GPUs.



Each GPU gets a slice of the input batch, applies the model on that

slice

and later the outputs of the models are concatenated to a single

tensor,

hence the user sees a model that behaves the same as the original.

"""

with tf.device('/cpu:0'):

    x = Input(model.input_shape[1:], name=model.input_names[0])


towers = []

for g in range(n_gpus):

    with tf.device('/gpu:' + str(g)):

        slice_g = Lambda(slice_batch, lambda shape: shape,

arguments={'n_gpus':n_gpus, 'part':g})(x)

        towers.append(model(slice_g))


with tf.device('/cpu:0'):

    merged = merge(towers, mode='concat', concat_axis=0)


return Model(input=[x], output=merged)

To use just take any model and set model = to_multi_gpu(model).

model.fit() and model.predict() should work without any change.

Example:

from keras.layers.convolutional import Convolution2D

from keras.layers.core import Activation

import numpy as np

def get_model():

x = Input( (96,96,1), name="input1")

output = Convolution2D(64, 5, 5, border_mode='same', name="conv1")(x)

output = Activation('relu', name="relu1")(output)

[More layers...]

model = Model(input=x, output=output)
model.compile(optimizer='rmsprop', loss='mse')
return model

model = get_model()
model = to_multi_gpu(model)

x = np.random.rand(1000, 96, 96, 1)
y = model.predict(x, verbose=True)

On Mon, Oct 31, 2016 at 10:18 AM, Pai Peng notifications@github.com wrote:

@anewlearner https://github.com/anewlearner Have you solved the problem
that you met with before?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2436 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFdLCCOCSqV-6UR7QtUN7Gv8YGe73u03ks5q5ZZegaJpZM4IMTcS
.

jonilaserson commented Oct 31, 2016

@Carol

There was an indentation error in the code I posted.
The [with tf.device('/cpu:0')] paragraph should be outside the loop.

Here is a piece of code that should work:

import tensorflow as tf

from keras import backend as K

from keras.models import Model

from keras.layers import Input, merge

from keras.layers.core import Lambda

def slice_batch(x, n_gpus, part):

"""

Divide the input batch into [n_gpus] slices, and obtain slice no.

[part].

i.e. if len(x)=10, then slice_batch(x, 2, 1) will return x[5:].

"""

sh = K.shape(x)

L = sh[0] / n_gpus

if part == n_gpus - 1:

    return x[part*L:]

return x[part*L:(part+1)*L]

def to_multi_gpu(model, n_gpus=2):

"""Given a keras [model], return an equivalent model which parallelizes

the computation over [n_gpus] GPUs.



Each GPU gets a slice of the input batch, applies the model on that

slice

and later the outputs of the models are concatenated to a single

tensor,

hence the user sees a model that behaves the same as the original.

"""

with tf.device('/cpu:0'):

    x = Input(model.input_shape[1:], name=model.input_names[0])


towers = []

for g in range(n_gpus):

    with tf.device('/gpu:' + str(g)):

        slice_g = Lambda(slice_batch, lambda shape: shape,

arguments={'n_gpus':n_gpus, 'part':g})(x)

        towers.append(model(slice_g))


with tf.device('/cpu:0'):

    merged = merge(towers, mode='concat', concat_axis=0)


return Model(input=[x], output=merged)

To use just take any model and set model = to_multi_gpu(model).

model.fit() and model.predict() should work without any change.

Example:

from keras.layers.convolutional import Convolution2D

from keras.layers.core import Activation

import numpy as np

def get_model():

x = Input( (96,96,1), name="input1")

output = Convolution2D(64, 5, 5, border_mode='same', name="conv1")(x)

output = Activation('relu', name="relu1")(output)

[More layers...]

model = Model(input=x, output=output)
model.compile(optimizer='rmsprop', loss='mse')
return model

model = get_model()
model = to_multi_gpu(model)

x = np.random.rand(1000, 96, 96, 1)
y = model.predict(x, verbose=True)

On Mon, Oct 31, 2016 at 10:18 AM, Pai Peng notifications@github.com wrote:

@anewlearner https://github.com/anewlearner Have you solved the problem
that you met with before?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#2436 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFdLCCOCSqV-6UR7QtUN7Gv8YGe73u03ks5q5ZZegaJpZM4IMTcS
.

@pengpaiSH

This comment has been minimized.

Show comment
Hide comment
@pengpaiSH

pengpaiSH Nov 1, 2016

@jonilaserson Thank you for you updating! Would you please comments on the code snippets

 for g in range(n_gpus):
        with tf.device('/gpu:' + str(g)):
            slice_g = Lambda(slice_batch, lambda shape: shape, 
                            arguments={'n_gpus':n_gpus, 'part':g})(x)
            towers.append(model(slice_g))

pengpaiSH commented Nov 1, 2016

@jonilaserson Thank you for you updating! Would you please comments on the code snippets

 for g in range(n_gpus):
        with tf.device('/gpu:' + str(g)):
            slice_g = Lambda(slice_batch, lambda shape: shape, 
                            arguments={'n_gpus':n_gpus, 'part':g})(x)
            towers.append(model(slice_g))
@anewlearner

This comment has been minimized.

Show comment
Hide comment
@anewlearner

anewlearner Nov 1, 2016

@jonilaserson
Thanks for sharing your code. It works. :)
I tested code to compare the cost of time bettween a gpu and two gpus.
When I used two gpus(same type of gpus here), the speed was slower than expected. Does the switch bettween cpu and gpu affects the speed?
My test result is as follows.

Two gpus

97650/682307 [===>..........................] - ETA: 1933s - loss: 0.3320 - acc: 0.8320
188593/682307 [=======>......................] - ETA: 1654s - loss: 0.2354 - acc: 0.8904
279093/682307 [===========>..................] - ETA: 1348s - loss: 0.1936 - acc: 0.9140

One gpu

97650/682307 [===>..........................] - ETA: 2669s - loss: 0.3488 - acc: 0.8266
188593/682307 [=======>......................] - ETA: 2239s - loss: 0.2431 - acc: 0.8880
279093/682307 [===========>..................] - ETA: 1844s - loss: 0.2004 - acc: 0.9116

anewlearner commented Nov 1, 2016

@jonilaserson
Thanks for sharing your code. It works. :)
I tested code to compare the cost of time bettween a gpu and two gpus.
When I used two gpus(same type of gpus here), the speed was slower than expected. Does the switch bettween cpu and gpu affects the speed?
My test result is as follows.

Two gpus

97650/682307 [===>..........................] - ETA: 1933s - loss: 0.3320 - acc: 0.8320
188593/682307 [=======>......................] - ETA: 1654s - loss: 0.2354 - acc: 0.8904
279093/682307 [===========>..................] - ETA: 1348s - loss: 0.1936 - acc: 0.9140

One gpu

97650/682307 [===>..........................] - ETA: 2669s - loss: 0.3488 - acc: 0.8266
188593/682307 [=======>......................] - ETA: 2239s - loss: 0.2431 - acc: 0.8880
279093/682307 [===========>..................] - ETA: 1844s - loss: 0.2004 - acc: 0.9116
@pengpaiSH

This comment has been minimized.

Show comment
Hide comment
@pengpaiSH

pengpaiSH Nov 2, 2016

I think you should compile the model in case of a error: you must compile the model before training/testing.

pengpaiSH commented Nov 2, 2016

I think you should compile the model in case of a error: you must compile the model before training/testing.

@CeadeS

This comment has been minimized.

Show comment
Hide comment
@CeadeS

CeadeS Aug 22, 2017

When using this Code https://github.com/kuza55/keras-extras/blob/master/utils/multi_gpu.py it seems, that there is an error in the regularizers like described in this issue kuza55/keras-extras#22. I cant tell if this is caused by the method, the model is copied and merged or if there is a flaw in the regularizers of keras. This is in tensorflow 1.12.

def` conv2d_bn(x, nb_filter, nb_row, nb_col, padding='same', strides=(1, 1), bias=False):

 """
    Utility function to apply conv + BN.
    (Slightly modified from https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py)
    """
    if K.image_data_format() == "channels_first":
        channel_axis = 1
    else:
        channel_axis = -1
    x = Convolution2D(nb_filter, (nb_row, nb_col),
                      strides=strides,
                      padding=padding,
                      use_bias=bias,
                      kernel_regularizer=regularizers.l2(0.00004), ##<---- causes error because no _loss 
                      kernel_initializer=initializers.VarianceScaling(scale=2.0, mode='fan_in', distribution='normal',
                                                                      seed=None))(x)
    x = BatchNormalization(axis=channel_axis, momentum=0.9997, scale=False)(x)
    x = Activation('relu')(x)
    return x

I get the error:
„AttributeError: 'Model' object has no attribute '_losses'„
caused by outputs = model (inputs) that merges the outputs of the different splits in one model.

CeadeS commented Aug 22, 2017

When using this Code https://github.com/kuza55/keras-extras/blob/master/utils/multi_gpu.py it seems, that there is an error in the regularizers like described in this issue kuza55/keras-extras#22. I cant tell if this is caused by the method, the model is copied and merged or if there is a flaw in the regularizers of keras. This is in tensorflow 1.12.

def` conv2d_bn(x, nb_filter, nb_row, nb_col, padding='same', strides=(1, 1), bias=False):

 """
    Utility function to apply conv + BN.
    (Slightly modified from https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py)
    """
    if K.image_data_format() == "channels_first":
        channel_axis = 1
    else:
        channel_axis = -1
    x = Convolution2D(nb_filter, (nb_row, nb_col),
                      strides=strides,
                      padding=padding,
                      use_bias=bias,
                      kernel_regularizer=regularizers.l2(0.00004), ##<---- causes error because no _loss 
                      kernel_initializer=initializers.VarianceScaling(scale=2.0, mode='fan_in', distribution='normal',
                                                                      seed=None))(x)
    x = BatchNormalization(axis=channel_axis, momentum=0.9997, scale=False)(x)
    x = Activation('relu')(x)
    return x

I get the error:
„AttributeError: 'Model' object has no attribute '_losses'„
caused by outputs = model (inputs) that merges the outputs of the different splits in one model.

@pGit1

This comment has been minimized.

Show comment
Hide comment
@pGit1

pGit1 Sep 22, 2017

pGit1 commented Sep 22, 2017

@tRosenflanz

This comment has been minimized.

Show comment
Hide comment
@tRosenflanz

tRosenflanz Sep 22, 2017

@pGit1 it has already been pointed out that Keras can support multi-gpu training so this kind of defeatist attitude is not helpful and is quite misguiding for a lot of people trying to find a solution... I personally prefer https://github.com/avolkov1/keras_experiments/ for now for it's ease of use

tRosenflanz commented Sep 22, 2017

@pGit1 it has already been pointed out that Keras can support multi-gpu training so this kind of defeatist attitude is not helpful and is quite misguiding for a lot of people trying to find a solution... I personally prefer https://github.com/avolkov1/keras_experiments/ for now for it's ease of use

@bzamecnik

This comment has been minimized.

Show comment
Hide comment
@bzamecnik

bzamecnik Sep 25, 2017

Contributor

@tRosenflanz Yes, also I came around https://github.com/avolkov1/keras_experiments/ recently and have been testing that. So far it seems it's indeed working well. Also a problem with my previous measurements was a wrongly set up machine. On a standard cloud instance with multiple GPUs I can observe a speed up, especially with avolkov1 code. Anyway I'm writing a summary article with a lot of measurements, so stay tuned.

Contributor

bzamecnik commented Sep 25, 2017

@tRosenflanz Yes, also I came around https://github.com/avolkov1/keras_experiments/ recently and have been testing that. So far it seems it's indeed working well. Also a problem with my previous measurements was a wrongly set up machine. On a standard cloud instance with multiple GPUs I can observe a speed up, especially with avolkov1 code. Anyway I'm writing a summary article with a lot of measurements, so stay tuned.

@alsrgv

This comment has been minimized.

Show comment
Hide comment
@alsrgv

alsrgv Oct 5, 2017

Contributor

FYI - we just added an example of data-parallel distributed training with Keras using Horovod - https://github.com/uber/horovod/blob/master/examples/keras_mnist.py. It works both for multiple GPUs within the server, and across servers. Hope it helps.

Contributor

alsrgv commented Oct 5, 2017

FYI - we just added an example of data-parallel distributed training with Keras using Horovod - https://github.com/uber/horovod/blob/master/examples/keras_mnist.py. It works both for multiple GPUs within the server, and across servers. Hope it helps.

@michelleowen

This comment has been minimized.

Show comment
Hide comment
@michelleowen

michelleowen Oct 5, 2017

I used the code of @jonilaserson. And it works. However, it seems that multi-gpu converged slower compared to single gpu. Anyone else observed the same?

michelleowen commented Oct 5, 2017

I used the code of @jonilaserson. And it works. However, it seems that multi-gpu converged slower compared to single gpu. Anyone else observed the same?

@alsrgv

This comment has been minimized.

Show comment
Hide comment
@alsrgv

alsrgv Oct 5, 2017

Contributor

@michelleowen you typically want to adjust learning rate to total # of GPUs across all the servers - here's an example for very simple scaling. Facebook published a paper with a more sophisticated strategy that works for a large number of GPUs.

Contributor

alsrgv commented Oct 5, 2017

@michelleowen you typically want to adjust learning rate to total # of GPUs across all the servers - here's an example for very simple scaling. Facebook published a paper with a more sophisticated strategy that works for a large number of GPUs.

@michelleowen

This comment has been minimized.

Show comment
Hide comment
@michelleowen

michelleowen Oct 5, 2017

@alsrgv, thank you. This is very helpful. I will do some experiments to see how it works in my case.

michelleowen commented Oct 5, 2017

@alsrgv, thank you. This is very helpful. I will do some experiments to see how it works in my case.

@fernandoandreotti

This comment has been minimized.

Show comment
Hide comment
@fernandoandreotti

fernandoandreotti Oct 18, 2017

I guess the function previously mentioned by @avolkov1 is finally coming into Keras:
https://github.com/fchollet/keras/blob/master/keras/utils/training_utils.py

fernandoandreotti commented Oct 18, 2017

I guess the function previously mentioned by @avolkov1 is finally coming into Keras:
https://github.com/fchollet/keras/blob/master/keras/utils/training_utils.py

@bzamecnik

This comment has been minimized.

Show comment
Hide comment
@bzamecnik

bzamecnik Oct 18, 2017

Contributor

@fernandoandreotti Yes and no. It's a cleaned-up variant of function from kuza55. It has nice documentation and grabs list of devices via device_lib instead of CUDA_VISIBLE_DEVICES. On the other hand it's missing some stuff from avolkov1: slicing on CPU, save/load of parameters of original serial model. Since there's no wrapper class, so the latter is not necessary, but at least might be documented.

Contributor

bzamecnik commented Oct 18, 2017

@fernandoandreotti Yes and no. It's a cleaned-up variant of function from kuza55. It has nice documentation and grabs list of devices via device_lib instead of CUDA_VISIBLE_DEVICES. On the other hand it's missing some stuff from avolkov1: slicing on CPU, save/load of parameters of original serial model. Since there's no wrapper class, so the latter is not necessary, but at least might be documented.

@fernandoandreotti

This comment has been minimized.

Show comment
Hide comment
@fernandoandreotti

fernandoandreotti Nov 3, 2017

Keras v2.0.9 now includes it (release notes). Despite the improvements that can be done, I guess this issue should be closed.

fernandoandreotti commented Nov 3, 2017

Keras v2.0.9 now includes it (release notes). Despite the improvements that can be done, I guess this issue should be closed.

@pGit1

This comment has been minimized.

Show comment
Hide comment
@pGit1

pGit1 Nov 9, 2017

pGit1 commented Nov 9, 2017

@fchollet

This comment has been minimized.

Show comment
Hide comment
@fchollet

fchollet Nov 9, 2017

Collaborator

Yes: https://keras.io/utils/#multi_gpu_model

You can also check out Horovod, which seems nice.

Collaborator

fchollet commented Nov 9, 2017

Yes: https://keras.io/utils/#multi_gpu_model

You can also check out Horovod, which seems nice.

@fchollet fchollet closed this Nov 9, 2017

@ViaFerrata

This comment has been minimized.

Show comment
Hide comment
@ViaFerrata

ViaFerrata Nov 25, 2017

Is there any intention for making it work with CNTK too?

ViaFerrata commented Nov 25, 2017

Is there any intention for making it work with CNTK too?

@nbansal90

This comment has been minimized.

Show comment
Hide comment
@nbansal90

nbansal90 Dec 29, 2017

@avolkov1 @jonilaserson Is there an issue with saving models using ModelCheckpoint using multi_gpu model. I actually used few other callbacks but it worked fine, but ModelCheckpoint is the one which fails to save the model, and throws error after an epcoh.

CODE

`class MyCallBack(keras.callbacks.Callback):
def init(self, callbacks,model):
super().init()
self.callback = callbacks
self.model = model

def on_epoch_begin(self,epoch,logs=None):
        self.callback.on_epoch_begin(epoch, logs=logs)

def on_epoch_end(self,epoch,logs=None):
        self.callback.on_epoch_end(epoch, logs=logs)

def on_batch_end(self, batch, logs=None):
        self.callback.on_batch_end(batch, logs=logs)

def on_batch_begin(self, batch, logs=None):
        self.callback.on_batch_begin(batch, logs=logs)

def on_train_begin(self, logs=None):
        self.callback.set_model(self.model)
        self.callback.on_train_begin(logs=logs)

def on_train_end(self, logs=None):
        self.callback.on_train_end(logs=logs)

parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(loss='categorical_crossentropy',optimizer=Adam(lr=lr_schedule(0)),metrics=['accuracy'])
#Setting up Callbacks, during fitting of the Model
filename='model_train_new.csv'
filepath = os.path.join(save_dir, model_name)
checkpoint = ModelCheckpoint(filepath=filepath, monitor='val_acc',verbose=1,
save_best_only=True)
cbk3 = MyCallBack(checkpoint, model)
callbacks = [cbk3]

#Adding Data Augmentation Provided by Keras Module
datagen=ImageDataGenerator(featurewise_center=False,samplewise_center=False,featurewise_std_normalization=False,samplewise_std_normalization=False,zca_whitening=False,rotation_range=0, width_shift_range=0.1,height_shift_range=0.1,horizontal_flip=True,vertical_flip=False)

datagen.fit(x_train)
steps_per_epoch = int(np.ceil(x_train.shape[0] / float(batch_size)))
model_info = parallel_model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=steps_per_epoch,
validation_data=(x_test, y_test),
epochs=epochs, verbose=1, workers=4,
callbacks=callbacks)`

nbansal90 commented Dec 29, 2017

@avolkov1 @jonilaserson Is there an issue with saving models using ModelCheckpoint using multi_gpu model. I actually used few other callbacks but it worked fine, but ModelCheckpoint is the one which fails to save the model, and throws error after an epcoh.

CODE

`class MyCallBack(keras.callbacks.Callback):
def init(self, callbacks,model):
super().init()
self.callback = callbacks
self.model = model

def on_epoch_begin(self,epoch,logs=None):
        self.callback.on_epoch_begin(epoch, logs=logs)

def on_epoch_end(self,epoch,logs=None):
        self.callback.on_epoch_end(epoch, logs=logs)

def on_batch_end(self, batch, logs=None):
        self.callback.on_batch_end(batch, logs=logs)

def on_batch_begin(self, batch, logs=None):
        self.callback.on_batch_begin(batch, logs=logs)

def on_train_begin(self, logs=None):
        self.callback.set_model(self.model)
        self.callback.on_train_begin(logs=logs)

def on_train_end(self, logs=None):
        self.callback.on_train_end(logs=logs)

parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(loss='categorical_crossentropy',optimizer=Adam(lr=lr_schedule(0)),metrics=['accuracy'])
#Setting up Callbacks, during fitting of the Model
filename='model_train_new.csv'
filepath = os.path.join(save_dir, model_name)
checkpoint = ModelCheckpoint(filepath=filepath, monitor='val_acc',verbose=1,
save_best_only=True)
cbk3 = MyCallBack(checkpoint, model)
callbacks = [cbk3]

#Adding Data Augmentation Provided by Keras Module
datagen=ImageDataGenerator(featurewise_center=False,samplewise_center=False,featurewise_std_normalization=False,samplewise_std_normalization=False,zca_whitening=False,rotation_range=0, width_shift_range=0.1,height_shift_range=0.1,horizontal_flip=True,vertical_flip=False)

datagen.fit(x_train)
steps_per_epoch = int(np.ceil(x_train.shape[0] / float(batch_size)))
model_info = parallel_model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
steps_per_epoch=steps_per_epoch,
validation_data=(x_test, y_test),
epochs=epochs, verbose=1, workers=4,
callbacks=callbacks)`

@pGit1

This comment has been minimized.

Show comment
Hide comment
@pGit1

pGit1 Jan 2, 2018

@nbansal90

I had this same problem. Model Checkpoint will not work with multi GPU model. You can change the parameter save_weights_only to True and this will work fine HOWEVER if you then want to do inference on a SINGLE gpu the model will not load weights properly even if you load the checkpointed weights by name.

@fchollet

Kind of an urgent question: is there a way to train on multiple GPUs but save the weights in such a way that I can do inference on only a single GPU? I am not sure how to get this to work properly as model.load_weights(''/weights_path', by_name=True) does not work. I have to re instantiate the network as a multi-gpu-model to properly load weights. I may be missing something simple though.

pGit1 commented Jan 2, 2018

@nbansal90

I had this same problem. Model Checkpoint will not work with multi GPU model. You can change the parameter save_weights_only to True and this will work fine HOWEVER if you then want to do inference on a SINGLE gpu the model will not load weights properly even if you load the checkpointed weights by name.

@fchollet

Kind of an urgent question: is there a way to train on multiple GPUs but save the weights in such a way that I can do inference on only a single GPU? I am not sure how to get this to work properly as model.load_weights(''/weights_path', by_name=True) does not work. I have to re instantiate the network as a multi-gpu-model to properly load weights. I may be missing something simple though.

@fercook

This comment has been minimized.

Show comment
Hide comment
@fercook

fercook Jan 2, 2018

mmmm since it's urgent, maybe a dirty patch will do: couldn't you save the weights as matrices and then load them directly into the weights of the layers of a new (single GPU) model?

edit: saving/loading the weights of the example from the docs doesn't work? https://keras.io/utils/#multi_gpu_model

fercook commented Jan 2, 2018

mmmm since it's urgent, maybe a dirty patch will do: couldn't you save the weights as matrices and then load them directly into the weights of the layers of a new (single GPU) model?

edit: saving/loading the weights of the example from the docs doesn't work? https://keras.io/utils/#multi_gpu_model

@pGit1

This comment has been minimized.

Show comment
Hide comment
@pGit1

pGit1 Jan 2, 2018

@fercook

Thanks for quick response. I believe I have tried that. My weights were saved via the Model Checkpoint callback for a multi gpu model.

When I re-instantiate the model I cannot load the weights to my single GPU model because I get an error stating that I am trying to load weights into a model with one layer when it expects four layers(4 is the number of GPUs I was using).

edit:

edit: saving/loading the weights of the example from the docs doesn't work? https://keras.io/utils/#multi_gpu_model

That is correct. It does not work. Although I haven't tried the cpu device scope. Will try and let know. Ive only used model checkpoint callback with save_weights_only = True and model.load_weights.

pGit1 commented Jan 2, 2018

@fercook

Thanks for quick response. I believe I have tried that. My weights were saved via the Model Checkpoint callback for a multi gpu model.

When I re-instantiate the model I cannot load the weights to my single GPU model because I get an error stating that I am trying to load weights into a model with one layer when it expects four layers(4 is the number of GPUs I was using).

edit:

edit: saving/loading the weights of the example from the docs doesn't work? https://keras.io/utils/#multi_gpu_model

That is correct. It does not work. Although I haven't tried the cpu device scope. Will try and let know. Ive only used model checkpoint callback with save_weights_only = True and model.load_weights.

@fercook

This comment has been minimized.

Show comment
Hide comment
@fercook

fercook Jan 2, 2018

Did you double check that you are saving with the template model, not the multi_gpu one?

From the docs:

On model saving

To save the multi-gpu model, use `.save(fname)` or `.save_weights(fname)`
with the template model (the argument you passed to `multi_gpu_model`),
rather than the model returned by `multi_gpu_model`.

edit: sorry I just re-read that you are saving through the callback...how are you doing that? Is each GPU saving a different file (or overwriting it)?

fercook commented Jan 2, 2018

Did you double check that you are saving with the template model, not the multi_gpu one?

From the docs:

On model saving

To save the multi-gpu model, use `.save(fname)` or `.save_weights(fname)`
with the template model (the argument you passed to `multi_gpu_model`),
rather than the model returned by `multi_gpu_model`.

edit: sorry I just re-read that you are saving through the callback...how are you doing that? Is each GPU saving a different file (or overwriting it)?

@avolkov1

This comment has been minimized.

Show comment
Hide comment
@avolkov1

avolkov1 Jan 2, 2018

@pGit1 Take a look at my example:
https://github.com/avolkov1/keras_experiments/blob/master/examples/cifar/cifar10_cnn_mgpu.py

Run like this to save weights:

python ./examples/cifar/cifar10_cnn_mgpu.py --epochs=3 --mgpu --checkpt --aug

Can then run again and it will load the checkpoint file and continue training. This will work with single GPU also.

CUDA_VISIBLE_DEVICES=0 python ./examples/cifar/cifar10_cnn_mgpu.py --epochs=3 --mgpu --checkpt --aug

I have a slightly different implementation for multigpu, but you can use the mutligpu implementation from Keras. Just wrap it in a class to use the non-multigpu model for saving and loading weights.
https://github.com/avolkov1/keras_experiments/blob/master/keras_exp/multigpu/_multigpu.py#L129

The essence of the wrapper class for saving/loading weights is:

    def __getattribute__(self, attrname):
        '''Override load and save methods to be used from the serial-model. The
        serial-model holds references to the weights in the multi-gpu model.
        '''
        # return Model.__getattribute__(self, attrname)
        if 'load' in attrname or 'save' in attrname:
            return getattr(self._smodel, attrname)

        return super(ModelMGPU, self).__getattribute__(attrname)

This works with fit_generator.

avolkov1 commented Jan 2, 2018

@pGit1 Take a look at my example:
https://github.com/avolkov1/keras_experiments/blob/master/examples/cifar/cifar10_cnn_mgpu.py

Run like this to save weights:

python ./examples/cifar/cifar10_cnn_mgpu.py --epochs=3 --mgpu --checkpt --aug

Can then run again and it will load the checkpoint file and continue training. This will work with single GPU also.

CUDA_VISIBLE_DEVICES=0 python ./examples/cifar/cifar10_cnn_mgpu.py --epochs=3 --mgpu --checkpt --aug

I have a slightly different implementation for multigpu, but you can use the mutligpu implementation from Keras. Just wrap it in a class to use the non-multigpu model for saving and loading weights.
https://github.com/avolkov1/keras_experiments/blob/master/keras_exp/multigpu/_multigpu.py#L129

The essence of the wrapper class for saving/loading weights is:

    def __getattribute__(self, attrname):
        '''Override load and save methods to be used from the serial-model. The
        serial-model holds references to the weights in the multi-gpu model.
        '''
        # return Model.__getattribute__(self, attrname)
        if 'load' in attrname or 'save' in attrname:
            return getattr(self._smodel, attrname)

        return super(ModelMGPU, self).__getattribute__(attrname)

This works with fit_generator.

@pGit1

This comment has been minimized.

Show comment
Hide comment
@pGit1

pGit1 Jan 2, 2018

@fercook

Since the ModelCheckpoint is only saving the weights it may be overwriting it.

@avolkov1

Thank you! I'll take a look!!

pGit1 commented Jan 2, 2018

@fercook

Since the ModelCheckpoint is only saving the weights it may be overwriting it.

@avolkov1

Thank you! I'll take a look!!

@pGit1

This comment has been minimized.

Show comment
Hide comment
@pGit1

pGit1 Jan 2, 2018

@fercook

I've confirmed that the example from the docs will not work with Model Checkpoint call back either.
FYI: my callback code -

best_wts_callback = callbacks.ModelCheckpoint(mod_wt_path, save_weights_only=True, save_best_only=True)

@avolkov1

Your example seems like it maywork but I having trouble thinking of a simple example of how to use. Your guidance would be much appreciated.

Is something like this feasible?

.
.
.
.
.
# model topology instantiation above
ser_model = Keras.models.Model(inputs = x, output=out)
parallel_model = avolkov1.make_parallel(serial_model = ser_model, gdev_list=['/gpu:0', '/gpu:1', '/gpu:2','/gpu:3',]),ps_device='/cpu:0', model_class=avolkov1.ModelMGPU)

#callback to save best weights
mod_wt_path = './best_weights.hdf5'
best_wts_callback = callbacks.ModelCheckpoint(mod_wt_path, save_weights_only=True, save_best_only=True)

parallel_model.fit(X, y, callbacks=[best_wts_callback])

#Now I want to infer on single GPU so I load saved weights ??
ser_model.load_weights(mod_wt_path)

ser_model.predict(X_holdout)

Would something like this work? Actually I need a more exact version of what would actually work.

THANK YOU!

EDIT:

Looking at you Cifar 10 example it looks like something like this would work. Im in a crunch so don't want to embark on the above journey if I am missing something glaring.

pGit1 commented Jan 2, 2018

@fercook

I've confirmed that the example from the docs will not work with Model Checkpoint call back either.
FYI: my callback code -

best_wts_callback = callbacks.ModelCheckpoint(mod_wt_path, save_weights_only=True, save_best_only=True)

@avolkov1

Your example seems like it maywork but I having trouble thinking of a simple example of how to use. Your guidance would be much appreciated.

Is something like this feasible?

.
.
.
.
.
# model topology instantiation above
ser_model = Keras.models.Model(inputs = x, output=out)
parallel_model = avolkov1.make_parallel(serial_model = ser_model, gdev_list=['/gpu:0', '/gpu:1', '/gpu:2','/gpu:3',]),ps_device='/cpu:0', model_class=avolkov1.ModelMGPU)

#callback to save best weights
mod_wt_path = './best_weights.hdf5'
best_wts_callback = callbacks.ModelCheckpoint(mod_wt_path, save_weights_only=True, save_best_only=True)

parallel_model.fit(X, y, callbacks=[best_wts_callback])

#Now I want to infer on single GPU so I load saved weights ??
ser_model.load_weights(mod_wt_path)

ser_model.predict(X_holdout)

Would something like this work? Actually I need a more exact version of what would actually work.

THANK YOU!

EDIT:

Looking at you Cifar 10 example it looks like something like this would work. Im in a crunch so don't want to embark on the above journey if I am missing something glaring.

@pGit1

This comment has been minimized.

Show comment
Hide comment
@pGit1

pGit1 Jan 2, 2018

@avolkov1

In general I think this line from docs in your code explain it all

'''Override load and save methods of the multi-gpu model. The load and
save should correspond to the serial model's load and save.

In general one should be easily be able to train in parallel on multiple GPUs use callbacks to save weights on the parallel run and load back those saved weights to the serial model that was parallized in the first place (without having to re-instantiate the serial model as a parallel model). I think your code allows one to train on 8 GPUs but then load weights and infer on one. It should be a option perhaps in the >=2.0.9 implementation? Training with keras.utils.multi_gpu_model() works great and definitely provides a speed up. It just doesn't play nice with Model Checkpoint, or weight saving/loading.

pGit1 commented Jan 2, 2018

@avolkov1

In general I think this line from docs in your code explain it all

'''Override load and save methods of the multi-gpu model. The load and
save should correspond to the serial model's load and save.

In general one should be easily be able to train in parallel on multiple GPUs use callbacks to save weights on the parallel run and load back those saved weights to the serial model that was parallized in the first place (without having to re-instantiate the serial model as a parallel model). I think your code allows one to train on 8 GPUs but then load weights and infer on one. It should be a option perhaps in the >=2.0.9 implementation? Training with keras.utils.multi_gpu_model() works great and definitely provides a speed up. It just doesn't play nice with Model Checkpoint, or weight saving/loading.

@avolkov1

This comment has been minimized.

Show comment
Hide comment
@avolkov1

avolkov1 Jan 2, 2018

@pGit1 Yea, what you have there should work. Or you can can use the keras.utils.multi_gpu_model so create a wrapper class:

from keras import Model
from keras.utils import multi_gpu_model


class ModelMGPU(Model):
    def __init__(self, ser_model, gpus):
        pmodel = multi_gpu_model(ser_model, gpus)
        self.__dict__.update(pmodel.__dict__)
        self._smodel = ser_model

    def __getattribute__(self, attrname):
        '''Override load and save methods to be used from the serial-model. The
        serial-model holds references to the weights in the multi-gpu model.
        '''
        # return Model.__getattribute__(self, attrname)
        if 'load' in attrname or 'save' in attrname:
            return getattr(self._smodel, attrname)

        return super(ModelMGPU, self).__getattribute__(attrname)

Then you can use your example above with this new class.

# model topology instantiation above
ser_model = Keras.models.Model(inputs = x, output=out)
parallel_model = ModelMGPU(ser_model , 4)

#callback to save best weights
mod_wt_path = './best_weights.hdf5'
best_wts_callback = callbacks.ModelCheckpoint(mod_wt_path, save_weights_only=True, save_best_only=True)

# compile the parallel model prior to fit
parallel_model.fit(X, y, callbacks=[best_wts_callback])

#Now I want to infer on single GPU so I load saved weights ??
ser_model.load_weights(mod_wt_path)

# I think you might have to compile the serial model prior to predict
ser_model.predict(X_holdout)

avolkov1 commented Jan 2, 2018

@pGit1 Yea, what you have there should work. Or you can can use the keras.utils.multi_gpu_model so create a wrapper class:

from keras import Model
from keras.utils import multi_gpu_model


class ModelMGPU(Model):
    def __init__(self, ser_model, gpus):
        pmodel = multi_gpu_model(ser_model, gpus)
        self.__dict__.update(pmodel.__dict__)
        self._smodel = ser_model

    def __getattribute__(self, attrname):
        '''Override load and save methods to be used from the serial-model. The
        serial-model holds references to the weights in the multi-gpu model.
        '''
        # return Model.__getattribute__(self, attrname)
        if 'load' in attrname or 'save' in attrname:
            return getattr(self._smodel, attrname)

        return super(ModelMGPU, self).__getattribute__(attrname)

Then you can use your example above with this new class.

# model topology instantiation above
ser_model = Keras.models.Model(inputs = x, output=out)
parallel_model = ModelMGPU(ser_model , 4)

#callback to save best weights
mod_wt_path = './best_weights.hdf5'
best_wts_callback = callbacks.ModelCheckpoint(mod_wt_path, save_weights_only=True, save_best_only=True)

# compile the parallel model prior to fit
parallel_model.fit(X, y, callbacks=[best_wts_callback])

#Now I want to infer on single GPU so I load saved weights ??
ser_model.load_weights(mod_wt_path)

# I think you might have to compile the serial model prior to predict
ser_model.predict(X_holdout)
@pGit1

This comment has been minimized.

Show comment
Hide comment
@pGit1

pGit1 Jan 2, 2018

@avolkov1

THANK YOU!! Your code works. To test I bypassed multi-gpu-model altogether.
I used raw code from https://github.com/avolkov1/keras_experiments/blob/master/keras_exp/multigpu/_multigpu.py#L129.

After training on a simple dummy data set, I call the function a function that returns two models (serial and parallel) and only choose the serial_model. Keep in mind during training I call the fit function with the parallel model not the serial model. I also feed my best weight callback to the parallel model during training.

Once this is done I load the learned weights into the serial model and get the expected results without any errors. I am not entirely sure why this works but it does. I confirmed multi-gpu training and single gpu inference. Now I am going to clean up my code to do something like you outline above.

Thanks again for your help!!

EDIT: The cleaned up version where you wrap the multi-gpu-model class works flawlessly. This is definitely my preferred method. Thanks again for all of your help. Your code is an extremely valuable contribution.

pGit1 commented Jan 2, 2018

@avolkov1

THANK YOU!! Your code works. To test I bypassed multi-gpu-model altogether.
I used raw code from https://github.com/avolkov1/keras_experiments/blob/master/keras_exp/multigpu/_multigpu.py#L129.

After training on a simple dummy data set, I call the function a function that returns two models (serial and parallel) and only choose the serial_model. Keep in mind during training I call the fit function with the parallel model not the serial model. I also feed my best weight callback to the parallel model during training.

Once this is done I load the learned weights into the serial model and get the expected results without any errors. I am not entirely sure why this works but it does. I confirmed multi-gpu training and single gpu inference. Now I am going to clean up my code to do something like you outline above.

Thanks again for your help!!

EDIT: The cleaned up version where you wrap the multi-gpu-model class works flawlessly. This is definitely my preferred method. Thanks again for all of your help. Your code is an extremely valuable contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment