Sequence length limited #17

Henrykwokkk · 2020-11-20T03:32:31Z

I tried this model, but the sequence length that the Routing Transformer can process seemed limited. I set the batch size as 16 and the sequence length as 1024, but it was out of GPU memory.

lucidrains · 2020-11-20T15:56:41Z

Hmm, that doesn't seem right, have you tried running the colab? Want to share your script?

lucidrains · 2020-11-20T15:57:26Z

How deep is your network? Try turning on reversibility?

Henrykwokkk · 2020-11-20T16:14:19Z

How deep is your network? Try turning on reversibility?

The depths of both the encoder and decoder are 3. Let me give you the feedback later.

lucidrains · 2020-11-20T17:20:33Z

how much memory are you working with? can you show me your full settings?

Henrykwokkk · 2020-11-21T06:23:12Z

The setting is shown as follows:
NUM_BATCHES = int(1e5) BATCH_SIZE = 32 LEARNING_RATE = 1e-4 GENERATE_EVERY = 100 NUM_TOKENS = 256 + 2 ENC_SEQ_LEN = 1024 DEC_SEQ_LEN = 2048
model = RoutingTransformerEncDec( dim=512, enc_num_tokens=NUM_TOKENS, enc_depth=3, enc_heads=8, enc_max_seq_len=ENC_SEQ_LEN, enc_window_size=32, dec_num_tokens = NUM_TOKENS, dec_depth = 3, dec_heads = 8, dec_max_seq_len=DEC_SEQ_LEN, dec_window_size=32, ).cuda()
RuntimeError happened: Tried to allocate 64.00 MiB (GPU 0; 10.76 GiB total capacity; 9.58 GiB already allocated; 20.94 MiB free; 9.89 GiB reserved in total by PyTorch)

Similarly, I also had this problem in implementing the reformer model, When I implemented about 500 batches, it was out of memory.

lucidrains · 2020-11-21T16:13:29Z

So firstly, turn on reversibility, and second, you can decrease your batch size and do gradient accumulation instead

Henrykwokkk · 2020-11-22T07:09:01Z

So firstly, turn on reversibility, and second, you can decrease your batch size and do gradient accumulation instead

I turned on reversibility and set the batch size 8. But the training stopped at batch 172 and RuntimeError happened because CUDA out of memory.

lucidrains · 2020-11-22T19:15:09Z

@guohanyang1994 make your batch size even smaller and increase your gradient accumulation

Henrykwokkk · 2020-11-23T05:05:18Z

Could I ask why CUDA out of memory occurs during training (batch 172) rather than at the beginning of training?

tomweingarten · 2020-11-23T13:21:11Z

Hard to say without seeing the code, but are your batches different sizes? It's possible it takes that long to hit the longest combination of sequence lengths.

Henrykwokkk · 2020-11-23T14:00:23Z

The batch size is fixed at 4. The sequence length is set to 2048 at most, but it still stops at about the 1200th batch. I am still confused :(
But thanks for your reply.

lucidrains · 2020-11-23T17:33:00Z

@guohanyang1994 are you sure you don't have a memory leak? Routing Transformer has been trained on GPT3 sized datasets successfully by others, so I doubt there's any problems with the framework

Henrykwokkk · 2020-11-24T07:22:55Z

Oh yeah, exactly it is the memory leak problem. I have fixed it and thank you so much. Sorry to bother you as an NLP beginner.

lucidrains · 2020-11-24T18:30:08Z

ok np :D

lucidrains closed this as completed Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence length limited #17

Sequence length limited #17

Henrykwokkk commented Nov 20, 2020

lucidrains commented Nov 20, 2020

lucidrains commented Nov 20, 2020

Henrykwokkk commented Nov 20, 2020

lucidrains commented Nov 20, 2020

Henrykwokkk commented Nov 21, 2020 •

edited

lucidrains commented Nov 21, 2020

Henrykwokkk commented Nov 22, 2020

lucidrains commented Nov 22, 2020

Henrykwokkk commented Nov 23, 2020

tomweingarten commented Nov 23, 2020

Henrykwokkk commented Nov 23, 2020

lucidrains commented Nov 23, 2020 •

edited

Henrykwokkk commented Nov 24, 2020

lucidrains commented Nov 24, 2020

Sequence length limited #17

Sequence length limited #17

Comments

Henrykwokkk commented Nov 20, 2020

lucidrains commented Nov 20, 2020

lucidrains commented Nov 20, 2020

Henrykwokkk commented Nov 20, 2020

lucidrains commented Nov 20, 2020

Henrykwokkk commented Nov 21, 2020 • edited

lucidrains commented Nov 21, 2020

Henrykwokkk commented Nov 22, 2020

lucidrains commented Nov 22, 2020

Henrykwokkk commented Nov 23, 2020

tomweingarten commented Nov 23, 2020

Henrykwokkk commented Nov 23, 2020

lucidrains commented Nov 23, 2020 • edited

Henrykwokkk commented Nov 24, 2020

lucidrains commented Nov 24, 2020

Henrykwokkk commented Nov 21, 2020 •

edited

lucidrains commented Nov 23, 2020 •

edited