-
-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with running on multiple GPUs #14
Comments
Thanks for reposting this as a new issue @muminoff |
I finally managed to reinstall
Output:
|
@muminoff this code is definitely coming from 'old' version. Newest code looks like this:
Make sure you're using the latest version.
Also, please ignore |
My example code still imports old keras ( |
Before:
After:
Both (before and after) are giving same error message. |
Yes, change everything to |
@muminoff can you paste again the most recent error msg + stack trace? |
Jupyter Notebook exported to Markdown formatimport tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config) import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import glob
import os
import sys
from PIL import Image masks = glob.glob("/home/user/beefscan/experiments/new_models_201912/images/newtrain/*.png")
orgs = list(map(lambda x: x.replace(".png", ".jpg"), masks)) imgs_list = []
masks_list = []
for image, mask in zip(orgs, masks):
imgs_list.append(np.array(Image.open(image).resize((64,64))))
masks_list.append(np.array(Image.open(mask).resize((64,64))))
imgs_np = np.asarray(imgs_list)
masks_np = np.asarray(masks_list) print(imgs_np.shape, masks_np.shape)
from keras_unet.utils import plot_imgs
plot_imgs(org_imgs=imgs_np, mask_imgs=masks_np, nm_img_to_plot=10, figsize=6) print(imgs_np.max(), masks_np.max())
x = np.asarray(imgs_np, dtype=np.float32)/255
y = np.asarray(masks_np, dtype=np.float32) print(x.max(), y.max())
print(x.shape, y.shape)
y = y.reshape(y.shape[0], y.shape[1], y.shape[2], 1)
print(x.shape, y.shape)
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.9, shuffle=True)
print("x_train: ", x_train.shape)
print("y_train: ", y_train.shape)
print("x_val: ", x_val.shape)
print("y_val: ", y_val.shape)
from keras_unet.utils import get_augmented
train_gen = get_augmented(
x_train, y_train, batch_size=8,
data_gen_args = dict(
rotation_range=5.,
width_shift_range=0.05,
height_shift_range=0.05,
shear_range=40,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=False,
fill_mode='constant'
)) sample_batch = next(train_gen)
xx, yy = sample_batch
print(xx.shape, yy.shape)
from keras_unet.utils import plot_imgs
plot_imgs(org_imgs=xx, mask_imgs=yy, nm_img_to_plot=2, figsize=6) from keras_unet.models import custom_unet
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.utils import multi_gpu_model
from keras_unet.metrics import iou, iou_thresholded
from keras_unet.losses import jaccard_distance
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
input_shape = x_train[0].shape
model = custom_unet(
input_shape,
filters=32,
use_batch_norm=True,
dropout=0.3,
dropout_change_per_layer=0.0,
num_layers=6
)
model.summary()
model_filename = 'model-v2.h5'
callback_checkpoint = ModelCheckpoint(
model_filename,
verbose=1,
monitor='val_loss',
save_best_only=True,
)
model.compile(
optimizer=Adam(),
#optimizer=SGD(lr=0.01, momentum=0.99),
loss='binary_crossentropy',
#loss=jaccard_distance,
metrics=[iou, iou_thresholded]
)
# history = model.fit_generator(
# train_gen,
# steps_per_epoch=200,
# epochs=50,
# validation_data=(x_val, y_val),
# callbacks=[callback_checkpoint]
# )
!pip uninstall keras_unet
from keras_unet.utils import plot_segm_history
plot_segm_history(history) model.load_weights(model_filename)
y_pred = model.predict(x_val) from keras_unet.utils import plot_imgs
plot_imgs(org_imgs=x_val, mask_imgs=y_val, pred_imgs=y_pred, nm_img_to_plot=10) tf.__version__
import keras keras.__version__
|
@muminoff
Sorry for the trouble and let me know if that finally solved the issue |
@karolzak I have followed your instruction. Now it works. Thanks a lot for your support! |
Copy-pasting comments from #12
@muminoff:
I cannot run custom unet with multi-gpu. I followed distributed training part in Tensorflow documentation, but no luck. It seems I need to refactor code and use custom distributed training (namely strategy.experimental_distribute_dataset).
@karolzak:
Can you share the code you used, TF/Keras version and error msg? That way I might be able to help you out or at least investigate it.
@muminoff:
I haven't tried
tf.keras.utils.multi_gpu_model
since it is deprecated. But, I tried withtf.distribute.MirroredStrategy()
.And, here is my code:
Error:
fyi, using
multi_gpu_model
raises following exception:@karolzak:
can you specify the version that you're using for TF/Keras? This seem to be related to that problem.
@muminoff:
The text was updated successfully, but these errors were encountered: