Batch size does not seem to change and ResourceExhaustedError keeps happening #12755

kbulutozler · 2019-04-26T15:12:34Z

Please make sure that this is a Bug or a Feature Request and provide all applicable information asked by the template.
If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.

System information

Have I written custom code (as opposed to using example directory): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu VM of Paperspace Gradient with jupyter notebook
TensorFlow backend (yes / no): yes
TensorFlow version: 1.13.1
Keras version: 2.2.4
Python version: Python 3.6.8 :: Anaconda, Inc.
CUDA/cuDNN version:
name version build
cudatoolkit 10.0.130 0
cudnn 7.3.1 cuda10.0_0
GPU model and memory: Tesla V100 16gb

number of sequences is 900, I am using keras fit_generator by setting steps_per_epoch to 100, 200, 300 etc. to keep batchsize at between 1-10. Whatever I set steps_per_epoch, the error line is same:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[900,50,11710] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/gradients/MLM-Sim/truediv_grad/Neg}}]]

When I change steps_per_epoch, I am basically changing batch size. I don't get why the problematic shape is "shape[45000,11710]" every time? 11710 are not surprising since 11710 is number of tokens in the dictionary which is built from sentence_pairs. 45000 must be 900x50 where 900 is number of sequences, 50 is number of tokens in a single sequence. However since I am theoretically changing batch size, first dimension should be batch size, not the whole (900). I couldn't think any reason other than that batch size cannot be changed with different values of steps_per_epoch.

This is the code I use as base:

from keras_bert import get_base_dict, get_model, gen_batch_inputs
import keras

# A toy input example
sentence_pairs = [
    [['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
    [['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
    [['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
]

# Build token dictionary                    
token_dict = get_base_dict()  # A dict that contains some special tokens
for pairs in sentence_pairs:
    for token in pairs[0] + pairs[1]:
        if token not in token_dict:
            token_dict[token] = len(token_dict)
token_list = list(token_dict.keys())  # Used for selecting a random word


# Build & train the model
model = get_model(
    token_num=len(token_dict),
    head_num=5,
    transformer_num=12,
    embed_dim=100,
    feed_forward_dim=400,
    seq_len=50,
    pos_num=50,
    dropout_rate=0.05,
)
model.summary()

def _generator():
    while True:
        yield gen_batch_inputs(
            sentence_pairs,
            token_dict,
            token_list,
            seq_len=50,
        
)

model.fit_generator(
    generator=_generator(),
    steps_per_epoch=100,
    epochs=10,
)

This code and imported functions are taken from https://github.com/CyberZHG/keras-bert.
I only removed sentence_pairs and created my own sentence_pairs from file with same shape. In the problematic case described above, I am using 900 pairs (1800 sentences) of the whole data which are converted to 900 sequences in the implementation. The 900 I mentioned above is this 900.

The text was updated successfully, but these errors were encountered:

kbulutozler mentioned this issue Apr 29, 2019

What does token_num parameter in get_model function mean? CyberZHG/keras-bert#51

Closed

jvishnuvardhan added the backend:tensorflow label May 2, 2019

fchollet removed the backend:tensorflow label Jun 16, 2021

fchollet closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch size does not seem to change and ResourceExhaustedError keeps happening #12755

Batch size does not seem to change and ResourceExhaustedError keeps happening #12755

kbulutozler commented Apr 26, 2019 •

edited

Loading

Batch size does not seem to change and ResourceExhaustedError keeps happening #12755

Batch size does not seem to change and ResourceExhaustedError keeps happening #12755

Comments

kbulutozler commented Apr 26, 2019 • edited Loading

kbulutozler commented Apr 26, 2019 •

edited

Loading