How to train on multi-GPUs when using fit_generator? #9502

Golbstein · 2018-02-27T13:40:38Z

Hello guys,
I have opened a p3.8xlarge instance to benefit from its 4-GPUs but the training time improves only by a factor of 2 (x2) compared to my P3.2xlarge machine (not x4 and not even x3.. and I'm a bit disappointed)
I believe that it's due to the data_generator that slows down the process. Is there a way to overcome this issue? I have directories with thousands of images (~30GB) so I compelled to use flow_from_directory attribute.

Here's a sample from my code:



def get_batches(dirname, gen=image.ImageDataGenerator(), shuffle=True, batch_size=8, class_mode='categorical',
                target_size=(256,256), classes = None):
    return gen.flow_from_directory(dirname, target_size=target_size, classes=classes,
            class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)

with tf.device('/cpu:0'):
    model = ResNet50(weights=None, include_top=True,
                     input_shape=(224, 224, 3),
                     classes=3)

parallel_model = multi_gpu_model(model, gpus=4)
parallel_model.compile(loss='categorical_crossentropy',
                       optimizer=Adam(), metrics=['accuracy'])

filepath = '/home/ubuntu/efs/images/IndexNew/toSep/KRAS_RN_multiGPU.h5'
checkpointer = ModelCheckpoint(filepath=filepath, verbose=1, save_best_only=True, save_weights_only=True)
stop_train = EarlyStopping(monitor='val_acc', patience=7, verbose=1)
reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.5,
            patience=2, min_lr=0.00001)

callbacks = [checkpointer, reduce_lr, stop_train]

path = '/home/ubuntu/efs/images/IndexNew/toSep/'
gen=image.ImageDataGenerator(horizontal_flip=True, vertical_flip=True)
batches = get_batches(path+'train', gen, batch_size=batch_size, shuffle=True, target_size=(224,224))
val_batches = get_batches(path+'valid', batch_size=batch_size, shuffle=False, target_size=(224,224))
parallel_model.fit_generator(batches, steps_per_epoch=batches.n//batch_size, epochs=20,
                    validation_data = val_batches, validation_steps = val_batches.n//batch_size,
                    verbose=1, callbacks=callbacks, workers=8, use_multiprocessing=True)

Thanks!

The text was updated successfully, but these errors were encountered:

spate141 · 2018-02-28T14:28:27Z

Correct me if I'm wrong, but if your batch_size=8, multiple GPU won't give you that much speedup right? It says here, https://keras.io/utils/#multi_gpu_model that,

if your batch_size is 64 and you use gpus=2, then we will divide the input into 2 sub-batches of 32 samples, process each sub-batch on one GPU, then return the full batch of 64 processed samples.

This induces quasi-linear speedup on up to 8 GPUs.

Golbstein · 2018-02-28T19:41:09Z

You're right but I'm actually using batch_size = 64*4
The default of my function is 8 but I change it when I call "get_batches"

spate141 · 2018-02-28T20:34:20Z

max_queue_size: Integer. Maximum size for the generator queue. If unspecified, max_queue_size will default to 10.

max_queue_size in fit_generator(), default is 10. Is your model training faster than your generator generating batches?

spate141 · 2018-03-02T00:56:33Z

@JeniaNovellusDx You got any solution for this? I just tried training a model on p2.8xlarge with 8 GPUs. I'm also not getting the required speedup! Theoretically, 1 epoch should have to take around ~3000s of time, but it's almost taking 6x more time! I'm keeping batch_size = 512*8 so that all 8 GPUs get batch of 512 for training, my max_queue_size = 300 so that I have enough data cached in memory to consumed by all GPUs. I'm using all workers=32 with thread safe generator to generate data from generator, I don't get it where the bottleneck is!? Any thoughts? I tried lr=1e-3 on Adam, still no improvement :(

Golbstein · 2018-03-02T15:26:34Z

What's # of workers? I believe that the more the better, but how should I know what precise #workers do I have

spate141 · 2018-03-02T15:33:04Z

workers: Integer. Maximum number of processes to spin up when using process based threading.

spate141 · 2018-03-02T15:34:38Z

New finding: Just spin up the p2.xlarge with 1 GPU, and 1 epoch is taking approx. 8:00:00 hours to finish, on the other hand, 8 GPUs are taking approx. 7:00:00 hours! This doesn't make any sense!

CPU: 
Epoch 1/100
  501/523828 [..............................] - ETA: 35:47:40 - loss: 7.3692 - acc: 7.9918e-04
  
GPU:
Epoch 1/100
  536/523828 [..............................] - ETA: 8:39:52 - loss: 7.3663 - acc: 7.2513e-04
  
Multi-GPUs (8):
Epoch 1/100
 551/65478 [..............................] - ETA: 7:15:07 - loss: 7.3602 - acc: 7.8515e-04

UPDATE: There was disk I/O bottleneck in my code. If possible, only read once from a file! Solved it by keeping as much as possible data in memory.

mohapatras · 2018-03-06T14:37:40Z

@spate141 Any Updates regarding that ? I am facing the same issue in 1080 GTX 8 GB with tf 1.4.0 and keras 2.1.3. I am using single GPU and getting the issue.

spate141 · 2018-03-06T15:07:08Z

@mohapatras I don't think there is any issue with single GPU. You can basically get the speedup without making any changes in your code. If you somehow not getting boost, you can check how you are doing pre-process before feeding the data to your model. As I understand, most of the pre-processing is done on CPU and if you are using generator, disk i/o can be the main bottleneck.
P.S: if yo are using any methods/functions from Keras or other package in pre processing, you can write your own version with Numpy!

mohapatras · 2018-03-06T15:12:14Z

I am reading the data from disk using DataGenerator in Keras. I am simulating 256 x 256 images of total 56k in training and 2k in Val. It takes 6 hours/epoch which is insane.

if you are using generator, disk i/o can be the main bottleneck.

Any workarounds regarding this ?

spate141 · 2018-03-06T19:10:34Z

Perhaps you can find some information here:

ghostplant · 2018-03-09T16:14:22Z

@mohapatras @spate141

Hi, can you share the benchmark difference between use_multiprocessing=True and use_multiprocessing=False, because I just want to know whether you have the same issue.

From my experiments, I see no performance improved when use_multiprocessing=True:

use_multiprocessing=False, workers=1, data=disk, gpu=4, perf=88s/epoch
use_multiprocessing=True, workers=8, data=disk, gpu=4, perf=89s/epoch

the value of workers will take effects only when use_multiprocessing=True

spate141 · 2018-03-10T16:37:20Z

@ghostplant Currently I'm not running the instance. My issue was solved by adjusting the way I was fetching data from disk with generator, and pre-processing it before feeding to GPU.

I also noticed the same behavior, If I use thread safe generator (Proper way of making a data generator which can handle multiple workers #1638) with use_multiprocessing=True, workers=28 or 32, performance was worse actually compared to same thread safe generator but with use_multiprocessing=False.
From the nvidia-smi log, all 8 GPUs were ~70% usage when multiprocessing was off, on the other hand when I turned on multiprocessing, usage was oscillating between 0-40% on all 8 GPUs.

I will post exact log next time when I start the instance again.

Cheers!

ghostplant · 2018-03-10T16:48:14Z

Good news! So do you solve the bottleneck by putting some files into memory?

Golbstein · 2018-03-11T12:48:16Z

I launched new EC2 P3.8xlarge with the following packages:
Keras 2.1.3
tensorflow 1.5.0
cudnn 7
cuda 9

I also set the # workers to be 16 (cpu count // 2) use_multiprocessing=True and it seemed to perform well at 3.4x speedup compared to 1 gpu

ppwwyyxx · 2018-03-11T19:33:15Z

Benchmark things separately always helps with understanding. You can just benchmark without worrying about the ImageGenerator part first, for example:

import keras
keras.backend.set_image_data_format('channels_first')
from keras.applications.resnet50 import ResNet50
from keras.utils import np_utils
import numpy as np

NUM_GPU = 1
batch_size = 32 * NUM_GPU

img_rows, img_cols = 224, 224

X_train = np.random.random((batch_size, 3, img_rows, img_cols)).astype('float32')
Y_train = np.random.random((batch_size,)).astype('int32')
Y_train = np_utils.to_categorical(Y_train, 1000)

def gen():
    while True:
        yield (X_train, Y_train)

model = ResNet50(weights=None, input_shape=X_train.shape[1:])

if NUM_GPU != 1:
    model = keras.utils.multi_gpu_model(model, gpus=NUM_GPU)

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
model.fit_generator(gen(), epochs=100, steps_per_epoch=50)

spate141 · 2018-03-12T14:50:48Z

@ghostplant Yes, I did converted data in format which can be loaded in memory and then with little post-processing I was managed to feed data to model on 8GPUs, all 8 GPUs usage was ~70-75% so I can say that it was pretty much the disk i/o in my case.

joelxiangnanchen · 2018-09-17T08:08:52Z

@ppwwyyxx Hi, thx for your example and could i discuss something about the model defining?
In your example, you defined the model on gpu, not like keras doc's suggested on cpu:

model = ResNet50(weights=None, input_shape=X_train.shape[1:])
if NUM_GPU != 1: 
    model = keras.utils.multi_gpu_model(model, gpus=NUM_GPU)

Recent days, I found that after added the cpu scope with with tf.device("/cpu:0"):, something doesn't make sense appeared is that my gpu didn't working for training all the time(E.g usage with lots of 0% but few 100%) :( . So did you ever have faced this problem ?
And this is the block of define the model:

with tf.device('/cpu:0'):
    model_factory = ModelFactory()
    base_model, img_size = model_factory.get_model(conf)
    x = base_model.output
    x = Flatten(name="flat_last")(x)
    prediction = Dense(conf["num_classes"], activation='softmax', name='logits',
                                     kernel_initializer="he_normal")(x)
    model = Model(inputs=base_model.input, outputs=prediction)
    model.summary()
model = multi_gpu_model(model, FLAGS.gpus)

And GPU usage:

-----------------Params-----------------------
Collection interval is 10(s)
Check every 1.00(h)
The minimum GPU average usage is 10.00%
----------------------------------------------
Has 2 GPUs in task, should audit.
[2018-09-17 07:57:42 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:57:52 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:58:02 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:58:12 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:58:22 UTC] [GPU usage] [GPU0 10%] [GPU1 9%] 
[2018-09-17 07:58:32 UTC] [GPU usage] [GPU0 94%] [GPU1 82%] 
[2018-09-17 07:58:42 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:58:52 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:59:02 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:59:12 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:59:22 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:59:32 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:59:42 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 07:59:52 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:00:02 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:00:12 UTC] [GPU usage] [GPU0 100%] [GPU1 100%] 
[2018-09-17 08:00:23 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:00:33 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:00:43 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:00:53 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:01:03 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:01:13 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:01:22 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:01:32 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:01:42 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:01:52 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:02:02 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:02:12 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:02:22 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:02:32 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:02:42 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:02:52 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:03:02 UTC] [GPU usage] [GPU0 14%] [GPU1 23%] 
[2018-09-17 08:03:12 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:03:22 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:03:32 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:03:42 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:03:52 UTC] [GPU usage] [GPU0 82%] [GPU1 100%] 
[2018-09-17 08:04:02 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:04:13 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:04:23 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:04:33 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:04:43 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:04:53 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:05:03 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:05:12 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 08:05:22 UTC] [GPU usage] [GPU0 0%] [GPU1 0%]

@ghostplant Also, did you test the benchmark of with and without cpu scope?
Thx everybody, it confused me a lot. And my env is as follows:

tensorflow = 1.7.0
keras = tf.keras

ppwwyyxx · 2018-09-17T08:13:41Z

You certainly should not put model on CPU if you want to use a GPU.

ghostplant · 2018-09-17T08:19:49Z

@Hayao41 I did both kinds of benchmarks, and GPUs are always working. However, the performance of using tf.device('/cpu':0) is a bit faster.

joelxiangnanchen · 2018-09-17T08:39:05Z

@ppwwyyxx @ghostplant thx guys, but it doesn't work for me :( as above. If I drop the cpu scope, all things go to normal, working hard.

-----------------Params-----------------------
Collection interval is 10(s)
Check every 1.00(h)
The minimum GPU average usage is 10.00%
----------------------------------------------
Has 2 GPUs in task, should audit.
[2018-09-17 06:43:06 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:43:16 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:43:26 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:43:36 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:43:46 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:43:56 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:44:06 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:44:16 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:44:27 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:44:37 UTC] [GPU usage] [GPU0 0%] [GPU1 0%] 
[2018-09-17 06:44:47 UTC] [GPU usage] [GPU0 86%] [GPU1 88%] 
[2018-09-17 06:44:57 UTC] [GPU usage] [GPU0 100%] [GPU1 98%] 
[2018-09-17 06:45:07 UTC] [GPU usage] [GPU0 100%] [GPU1 0%] 
[2018-09-17 06:45:17 UTC] [GPU usage] [GPU0 100%] [GPU1 45%] 
[2018-09-17 06:45:27 UTC] [GPU usage] [GPU0 94%] [GPU1 100%] 
[2018-09-17 06:45:37 UTC] [GPU usage] [GPU0 79%] [GPU1 11%] 
[2018-09-17 06:45:47 UTC] [GPU usage] [GPU0 100%] [GPU1 100%] 
[2018-09-17 06:45:57 UTC] [GPU usage] [GPU0 64%] [GPU1 98%] 
[2018-09-17 06:46:07 UTC] [GPU usage] [GPU0 0%] [GPU1 100%] 
[2018-09-17 06:46:17 UTC] [GPU usage] [GPU0 100%] [GPU1 0%] 
[2018-09-17 06:46:27 UTC] [GPU usage] [GPU0 47%] [GPU1 43%] 
[2018-09-17 06:46:37 UTC] [GPU usage] [GPU0 100%] [GPU1 81%] 
[2018-09-17 06:46:47 UTC] [GPU usage] [GPU0 100%] [GPU1 100%] 
[2018-09-17 06:46:57 UTC] [GPU usage] [GPU0 100%] [GPU1 100%] 
[2018-09-17 06:47:07 UTC] [GPU usage] [GPU0 100%] [GPU1 100%] 
[2018-09-17 06:47:16 UTC] [GPU usage] [GPU0 100%] [GPU1 99%] 
[2018-09-17 06:47:26 UTC] [GPU usage] [GPU0 100%] [GPU1 57%] 
[2018-09-17 06:47:36 UTC] [GPU usage] [GPU0 62%] [GPU1 44%] 
[2018-09-17 06:47:46 UTC] [GPU usage] [GPU0 68%] [GPU1 52%] 
[2018-09-17 06:47:56 UTC] [GPU usage] [GPU0 77%] [GPU1 82%] 
[2018-09-17 06:48:06 UTC] [GPU usage] [GPU0 59%] [GPU1 88%] 
[2018-09-17 06:48:16 UTC] [GPU usage] [GPU0 56%] [GPU1 43%] 
[2018-09-17 06:48:27 UTC] [GPU usage] [GPU0 78%] [GPU1 90%] 
[2018-09-17 06:48:37 UTC] [GPU usage] [GPU0 100%] [GPU1 60%] 
[2018-09-17 06:48:47 UTC] [GPU usage] [GPU0 100%] [GPU1 79%] 
[2018-09-17 06:48:57 UTC] [GPU usage] [GPU0 100%] [GPU1 50%] 
[2018-09-17 06:49:07 UTC] [GPU usage] [GPU0 100%] [GPU1 47%] 
[2018-09-17 06:49:17 UTC] [GPU usage] [GPU0 100%] [GPU1 0%] 
[2018-09-17 06:49:27 UTC] [GPU usage] [GPU0 100%] [GPU1 100%] 
[2018-09-17 06:49:37 UTC] [GPU usage] [GPU0 100%] [GPU1 85%] 
[2018-09-17 06:49:47 UTC] [GPU usage] [GPU0 100%] [GPU1 0%] 
[2018-09-17 06:49:57 UTC] [GPU usage] [GPU0 100%] [GPU1 100%] 
[2018-09-17 06:50:07 UTC] [GPU usage] [GPU0 100%] [GPU1 87%] 
[2018-09-17 06:50:17 UTC] [GPU usage] [GPU0 100%] [GPU1 71%] 
[2018-09-17 06:50:27 UTC] [GPU usage] [GPU0 70%] [GPU1 40%] 
[2018-09-17 06:50:37 UTC] [GPU usage] [GPU0 78%] [GPU1 100%] 
[2018-09-17 06:50:47 UTC] [GPU usage] [GPU0 100%] [GPU1 80%]

ghostplant · 2018-09-17T08:45:08Z

@Hayao41 I am not using tf.keras which might be of older version than keras mainstream. I used keras mainstream version installed from pip.

joelxiangnanchen · 2018-09-17T08:48:59Z

@ghostplant This may be a spot, I will go to have a test on mainstream. Thx very much!

xuzheyuan624 · 2019-03-17T06:13:26Z

I am using tf.keras and tf.keras.sequence to load data. I also rewrite my code in pytorch. I found it's much slower when using keras. I want to kown if it's the reason that torch.utils.data.Dataset is running faster than tf.keras.sequence when loading data

ghostplant · 2019-03-17T08:40:14Z

@xuzheyuan624 Why keras multi-gpu is slow:

Each GPU should have a self-owned data generators;
data generator should store image inputs to memories allocated by cuMemHostAlloc();
Allreduce should be used instead of using parameter-server;

If the above 3 problem solved, keras multi_gpu could have as fast performance as the version of official Tensorflow CNN benchmark.

xuzheyuan624 · 2019-03-17T10:17:08Z

@ghostplant So just use keras.utils.multi_gpu_modeland set use_multiprocessing=True when using fit_generator is not enough to run fast ? Are there any examples to solve this problem?

ghostplant · 2019-03-17T10:20:05Z

@xuzheyuan624 Yes, it is just a a work around that looks simple. If using the fastest solution, you have to modify lots of codes related to input processing and gradient updates.

xuzheyuan624 · 2019-03-17T10:36:58Z

@ghostplant OK, it's sound too difficult for me to speed up the code with keras. I will train my model on pytorch and then transform it's weights to keras.

ppwwyyxx · 2019-03-17T17:22:27Z

You can also use tensorpack which includes built-in fast multi-GPU solution. Some comparison scripts here.

NiksanJP · 2020-02-07T03:04:03Z

Hey Guys!

Just found the solution and it has been running properly haha!

We cannot use the traditional self-made generator as the GPU may have varying end-training times and may send the information to one model for same batch sizes. Therefore become NOT thread-safe. The generator needs to be sequenced, Hence, thread-safe (tf.keras.utils).

I am running my model on g3-16xlarge due to the size of the model and the size of the training set!

The generator is now a class and not a method!
`

    class traingen(tf.keras.utils.Sequence):

       def __init__(self, batchSize):
           self.dataset = pf.read_csv('mydataset.csv')
           self.batchSize = batchSize


       def __len__(self):
           return self.dataset.shape[0] // self.batchSize
   
       def getLen(self):
           return self.dataset.shape[0] // self.batchSize

       def __getitem__(self, idx):
           rows = random.sample(range(0, self.dataset.shape[0]), batchSize)
           return self.dataset[rows]

`

The following code should work when inserted like

history = model.fit( traingen, epochs = 5, steps_per_epoch = trainSeq.getLen(), verbose = 1, use_multiprocessing = True, workers = 32 )
This works on 4 GPUs parallel training!
Good Luck yall!

kattan1969 · 2020-06-09T09:42:51Z

Hi, I went through all the discussions and the outcome I got is that you need to use a data generator based on the keras.utils.Sequence within the fit_generator, setup the batch size divisible by the number of GPUs available and go for the use_multiprocessing = True. This didn't work in my case. I'm training the system for NLP sequence-to-sequence method. The data is sentences (text) with source and target. Multiprocessing on 4 GPUs is still 50% for that of a single GPU. Something is wrong and from my experience it's related to keras. I'm using TensforFlow GPU v1.15.3 (latest before v2.0), and keras v2.1.5 (tried newer ones, no change).
I have no reported errors nor warnings! I have 20+ years experience in programming / A.I. and I can say that this Python based framework is lacking, not easy to work with and makes things hard. Documentation is poor, and they keep rolling out new versions "deprecating" former methods and classes. That is simply BAD as you keep on modifying your source and face the ugly reality of "dependency" issues.
Based, on Amdahl's law, there should be some noticeable speedup as this task is for sure paralellizable. I believe it's a keras issue ..

spate141 mentioned this issue Mar 2, 2018

Why keras apps using multi_gpu_model is slower than single gpu? #9204

Closed

ghostplant mentioned this issue Mar 10, 2018

Support Memcache for Samples on (Virtual) Disk #9615

Closed

spate141 mentioned this issue May 23, 2018

Multi GPU is ~50% slower than single GPU #10264

Closed

fchollet closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train on multi-GPUs when using fit_generator? #9502

How to train on multi-GPUs when using fit_generator? #9502

Golbstein commented Feb 27, 2018 •

edited

Loading

spate141 commented Feb 28, 2018

Golbstein commented Feb 28, 2018

spate141 commented Feb 28, 2018

spate141 commented Mar 2, 2018

Golbstein commented Mar 2, 2018

spate141 commented Mar 2, 2018

spate141 commented Mar 2, 2018 •

edited

Loading

mohapatras commented Mar 6, 2018

spate141 commented Mar 6, 2018

mohapatras commented Mar 6, 2018

spate141 commented Mar 6, 2018 •

edited

Loading

ghostplant commented Mar 9, 2018 •

edited

Loading

spate141 commented Mar 10, 2018

ghostplant commented Mar 10, 2018

Golbstein commented Mar 11, 2018

ppwwyyxx commented Mar 11, 2018

spate141 commented Mar 12, 2018 •

edited

Loading

joelxiangnanchen commented Sep 17, 2018 •

edited

Loading

ppwwyyxx commented Sep 17, 2018

ghostplant commented Sep 17, 2018

joelxiangnanchen commented Sep 17, 2018

ghostplant commented Sep 17, 2018

joelxiangnanchen commented Sep 17, 2018

xuzheyuan624 commented Mar 17, 2019 •

edited

Loading

ghostplant commented Mar 17, 2019

xuzheyuan624 commented Mar 17, 2019

ghostplant commented Mar 17, 2019

xuzheyuan624 commented Mar 17, 2019

ppwwyyxx commented Mar 17, 2019 •

edited

Loading

NiksanJP commented Feb 7, 2020

kattan1969 commented Jun 9, 2020

How to train on multi-GPUs when using fit_generator? #9502

How to train on multi-GPUs when using fit_generator? #9502

Comments

Golbstein commented Feb 27, 2018 • edited Loading

spate141 commented Feb 28, 2018

Golbstein commented Feb 28, 2018

spate141 commented Feb 28, 2018

spate141 commented Mar 2, 2018

Golbstein commented Mar 2, 2018

spate141 commented Mar 2, 2018

spate141 commented Mar 2, 2018 • edited Loading

mohapatras commented Mar 6, 2018

spate141 commented Mar 6, 2018

mohapatras commented Mar 6, 2018

spate141 commented Mar 6, 2018 • edited Loading

ghostplant commented Mar 9, 2018 • edited Loading

spate141 commented Mar 10, 2018

ghostplant commented Mar 10, 2018

Golbstein commented Mar 11, 2018

ppwwyyxx commented Mar 11, 2018

spate141 commented Mar 12, 2018 • edited Loading

joelxiangnanchen commented Sep 17, 2018 • edited Loading

ppwwyyxx commented Sep 17, 2018

ghostplant commented Sep 17, 2018

joelxiangnanchen commented Sep 17, 2018

ghostplant commented Sep 17, 2018

joelxiangnanchen commented Sep 17, 2018

xuzheyuan624 commented Mar 17, 2019 • edited Loading

ghostplant commented Mar 17, 2019

xuzheyuan624 commented Mar 17, 2019

ghostplant commented Mar 17, 2019

xuzheyuan624 commented Mar 17, 2019

ppwwyyxx commented Mar 17, 2019 • edited Loading

NiksanJP commented Feb 7, 2020

kattan1969 commented Jun 9, 2020

Golbstein commented Feb 27, 2018 •

edited

Loading

spate141 commented Mar 2, 2018 •

edited

Loading

spate141 commented Mar 6, 2018 •

edited

Loading

ghostplant commented Mar 9, 2018 •

edited

Loading

spate141 commented Mar 12, 2018 •

edited

Loading

joelxiangnanchen commented Sep 17, 2018 •

edited

Loading

xuzheyuan624 commented Mar 17, 2019 •

edited

Loading

ppwwyyxx commented Mar 17, 2019 •

edited

Loading