Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch size does not seem to change and ResourceExhaustedError keeps happening #12755

Closed
kbulutozler opened this issue Apr 26, 2019 · 0 comments
Closed

Comments

@kbulutozler
Copy link

kbulutozler commented Apr 26, 2019

Please make sure that this is a Bug or a Feature Request and provide all applicable information asked by the template.
If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.

System information

  • Have I written custom code (as opposed to using example directory): yes

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu VM of Paperspace Gradient with jupyter notebook

  • TensorFlow backend (yes / no): yes

  • TensorFlow version: 1.13.1

  • Keras version: 2.2.4

  • Python version: Python 3.6.8 :: Anaconda, Inc.

  • CUDA/cuDNN version:
    name version build
    cudatoolkit 10.0.130 0
    cudnn 7.3.1 cuda10.0_0

  • GPU model and memory: Tesla V100 16gb

number of sequences is 900, I am using keras fit_generator by setting steps_per_epoch to 100, 200, 300 etc. to keep batchsize at between 1-10. Whatever I set steps_per_epoch, the error line is same:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[900,50,11710] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/gradients/MLM-Sim/truediv_grad/Neg}}]]

When I change steps_per_epoch, I am basically changing batch size. I don't get why the problematic shape is "shape[45000,11710]" every time? 11710 are not surprising since 11710 is number of tokens in the dictionary which is built from sentence_pairs. 45000 must be 900x50 where 900 is number of sequences, 50 is number of tokens in a single sequence. However since I am theoretically changing batch size, first dimension should be batch size, not the whole (900). I couldn't think any reason other than that batch size cannot be changed with different values of steps_per_epoch.

This is the code I use as base:

from keras_bert import get_base_dict, get_model, gen_batch_inputs
import keras

# A toy input example
sentence_pairs = [
    [['all', 'work', 'and', 'no', 'play'], ['makes', 'jack', 'a', 'dull', 'boy']],
    [['from', 'the', 'day', 'forth'], ['my', 'arm', 'changed']],
    [['and', 'a', 'voice', 'echoed'], ['power', 'give', 'me', 'more', 'power']],
]

# Build token dictionary                    
token_dict = get_base_dict()  # A dict that contains some special tokens
for pairs in sentence_pairs:
    for token in pairs[0] + pairs[1]:
        if token not in token_dict:
            token_dict[token] = len(token_dict)
token_list = list(token_dict.keys())  # Used for selecting a random word


# Build & train the model
model = get_model(
    token_num=len(token_dict),
    head_num=5,
    transformer_num=12,
    embed_dim=100,
    feed_forward_dim=400,
    seq_len=50,
    pos_num=50,
    dropout_rate=0.05,
)
model.summary()

def _generator():
    while True:
        yield gen_batch_inputs(
            sentence_pairs,
            token_dict,
            token_list,
            seq_len=50,
        
)

model.fit_generator(
    generator=_generator(),
    steps_per_epoch=100,
    epochs=10,
)

This code and imported functions are taken from https://github.com/CyberZHG/keras-bert.
I only removed sentence_pairs and created my own sentence_pairs from file with same shape. In the problematic case described above, I am using 900 pairs (1800 sentences) of the whole data which are converted to 900 sequences in the implementation. The 900 I mentioned above is this 900.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants