Adapted by Carlos Toxtli https://www.carlostoxtli.com/#colab-gpu-1

Source: https://github.com/gmihaila/machine_learning_platform_gpu/blob/master/gpu_usage.ipynb

## Overview

* #### How to use up to 4 gpus (max allowed on MLP) using Keras.
* #### How to supervise your gpu memory allocations.
* #### How to deal with common OOM (Out Of Memory) issues.
* #### How to partition your GPUs
* #### How to save your model for later
* #### How to make sure you are not hogging the GPUs.
* #### How to deal with crazy models.

In [None]:
# BUILD MODEL FUNCTION

def BuildToyModel():
    # generate some dummy data
    n_examples = 1000
    n_features = 100

    x_train = np.random.random((n_examples, n_features))
    x_train = np.round(x_train, 2)
    x_train *= 100
    x_train = np.array(x_train, dtype=int)

    n_words = np.max(x_train)

    y_train = np.random.random(n_examples)

    # number of units in LSTM
    n_units = 256
    # numbe rof words
    n_words = 100500   #vocabulary size
    size_emb = 300     #size of embedding has to match
    size_seq = 50

    embedding_matrix = np.random.random((n_words+2, size_emb))

    # build model
    model = Sequential()
    model.add(Embedding(input_dim=(n_words+2), output_dim=size_emb, weights=[embedding_matrix],
                    mask_zero=False, trainable=False))
    # LSTM LAYER/S
    model.add(LSTM(256, dropout=0.2, recurrent_dropout=0.2))

    # TREAT AS REGRESSION PROBLEM
    model.add(Dense(1, activation='sigmoid'))
    
    return model, x_train, y_train

## How to use up to 4 gpus (max allowed on MLP) using Keras.

In [None]:
import sys
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM


from keras.utils.training_utils import multi_gpu_model #<------------------Esential for GPUs usage


# Build model
model, x_train, y_train = BuildToyModel()

# Add how many GPUs you want/have
model = multi_gpu_model(model, gpus=4)          #<------------------Esential Part to run on GPUs

# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['acc'])

# model summary
print(model.summary())                          #<------------------See lambda layers distributed on each GPU

# batches
n_batch = 32       #<------------------make sure your batch is not too small
# epochs
n_epoch = 10

# train model
model.fit(x_train, y_train, epochs=n_epoch, batch_size=n_batch, verbose=1)

Using TensorFlow backend.


### How to supervise your gpu memory allocations.¶

open terminal and run
watch -n 1 nvidia-smi

### How to deal with common OOM (Out Of Memory) issues.

keep batch size regular size. Too small batch size will crash your notebook.  too big will get you OOM

If your container allocated 4 gpus and you run your current notebook on all 4 gpus and open a new notebook and try to run on all 4 gpus again, you will get OOM - Even if your model is running anymore

## How to partition your GPUs

os.environ["CUDA_VISIBLE_DEVICES"]="enter what GPUs to use"
You will see in your MLP container email what GPUs you receive

Don't forget to adjust the number of GPUs you mention to Keras!

In [None]:
import os

os.environ["CUDA_VISIBLE_DEVICES"]="0,1"  #<------ CHECK YOUR MLP EMAIL

In [None]:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM


from keras.utils.training_utils import multi_gpu_model #<------------------Esential for GPUs usage


# Build model
model, x_train, y_train = BuildToyModel()

# Add how many GPUs you want/have
model = multi_gpu_model(model, gpus=2)          #<------------------Esential Part to run on GPUs

# Compile model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['acc'])

# model summary
print(model.summary())                          #<------------------See lambda layers distributed on each GPU

# batches
n_batch = 32       #<------------------make sure your batch is not too small
# epochs
n_epoch = 10

# train model
model.fit(x_train, y_train, epochs=n_epoch, batch_size=n_batch, verbose=1)

## How to save your model for later

Let's say you train your model on all 4 GPUs and want to save it so you can do transfer learning later or just have the model ready trainned. 
You have to save the weights of the model. You can't save the whole model. You can only save a whole model if you run on 1 GPU or No GPU.
In this case with the model running on 4 GPUs, you need to re-build the model the exact way you did before training and with the exact number of GPUs you used on training. After you compile the model you can simply just load the weights and it will run like a charm (get it? PyCharm!)

## How to make sure you are not hogging the GPUs.

When you don't need the model anymore you have to kill your notebook to make sure you are not blocking the GPUs for others

## Ho to use GPUs on MLP 🦌

How to use up to 4 gpus (max allowed) using Keras
How to supervise your gpu memory allocations.
How to deal with common OOM issues
How to make sure you are not hogging the GPUs
If model goes crazy, or befor running a model for a few days, restart notebook kernel, shut it down and re-run it. It happend to me! especially if you run on GPUs


It works! - deactivate 2 gpus when have 4 and see if contianer cna be created, or use other gpus on other notebook

- if use tensorflow image automaticaaly uses 1 gpu. No need for any setup



In [None]:
os.environ["CUDA_VISIBLE_DEVICES"]="-1"  #<------ DEACTIVATE ALL GPUS


In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0,1"  #<------ DEACTIVATE ALL GPUS


In [None]:
import sys
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Embedding, LSTM


from keras.utils.training_utils import multi_gpu_model #<------------------Esential for GPUs usage


## Toy Example to see the GPUs in aciton! 

In [None]:
# generate some dummy data
n_examples = 1000
n_features = 100

x_train = np.random.random((n_examples, n_features))
x_train = np.round(x_train, 2)
x_train *= 100
x_train = np.array(x_train, dtype=int)

n_words = np.max(x_train)

y_train = np.random.random(n_examples)

# number of units in LSTM
n_units = 256
# numbe rof words
n_words = 100500   #vocabulary size
size_emb = 300     #size of embedding has to match
size_seq = 50


embedding_matrix = np.random.random((n_words+2, size_emb))

# with tf.device("/cpu:0"):
# build model
model = Sequential()
model.add(Embedding(input_dim=(n_words+2), output_dim=size_emb, weights=[embedding_matrix],
                    mask_zero=False, trainable=False))

# LSTM LAYER/S
model.add(LSTM(256, dropout=0.2, recurrent_dropout=0.2))


# TREAT AS REGRESSION PROBLEM

model.add(Dense(1, activation='sigmoid'))

# Add how many GPUs you want/have
model = multi_gpu_model(model, gpus=2)          #<------------------Esential Part to run on GPUs

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['acc'])


# model summary
print(model.summary())                          #<------------------See lambda layers distributed on each GPU

## Train model and see the GPUs working

In [None]:
# batches
n_batch = 32       #<------------------make sure your batch is not too small
# epochs
n_epoch = 10
# train model
model.fit(x_train, y_train, epochs=n_epoch, batch_size=n_batch, verbose=1)

In [None]:
os.environ["CUDA_VISIBLE_DEVICES"]="2,3"  #<------ DEACTIVATE ALL GPUS

# generate some dummy data
n_examples = 1000
n_features = 100

x_train = np.random.random((n_examples, n_features))
x_train = np.round(x_train, 2)
x_train *= 100
x_train = np.array(x_train, dtype=int)

n_words = np.max(x_train)

y_train = np.random.random(n_examples)

# number of units in LSTM
n_units = 256
# numbe rof words
n_words = 100500   #vocabulary size
size_emb = 300     #size of embedding has to match
size_seq = 50


embedding_matrix = np.random.random((n_words+2, size_emb))

# with tf.device("/cpu:0"):
# build model
model2 = Sequential()
model2.add(Embedding(input_dim=(n_words+2), output_dim=size_emb, weights=[embedding_matrix],
                    mask_zero=False, trainable=False))

# LSTM LAYER/S
model2.add(LSTM(256, dropout=0.2, recurrent_dropout=0.2))


# TREAT AS REGRESSION PROBLEM

model2.add(Dense(1, activation='sigmoid'))

# Add how many GPUs you want/have
model2 = multi_gpu_model(model, gpus=2)          #<------------------Esential Part to run on GPUs

model2.compile(loss='mean_squared_error', optimizer='adam', metrics=['acc'])


# model summary
print(model2.summary())                          #<------------------See lambda layers distributed on each GPU

In [None]:
# batches
n_batch = 32       #<------------------make sure your batch is not too small
# epochs
n_epoch = 10
# train model
model2.fit(x_train, y_train, epochs=n_epoch, batch_size=n_batch, verbose=1)