Colab crashes due to tcmalloc large allocation #4668

karndeb · 2020-05-29T11:29:16Z

I am pretraining a RoBERTa model on the Newsroom dataset on colab. I have trained a custom tokenizer on the text data. I am using the Text Dataset LinebyLineTextDataset as I have a single file and each line is the text of a news article. The colab crashes when I run this code
%%time
from transformers import LineByLineTextDataset
dataset = LineByLineTextDataset(tokenizer=tokenizer,
file_path="/content/drive/My Drive/Newsroom Dataset/newsroom-firsthalf.txt",
block_size=128)

I tried with the full dataset and reduced it to half and have also tried it by reducing the block size.
The config is
config = RobertaConfig(
vocab_size=52000,
max_position_embeddings=514,
num_attention_heads=12,
num_hidden_layers=6,
type_vocab_size=1)

and the error log is
tcmalloc: large alloc 7267041280 bytes == 0x9d916000 @ 0x7f3ea16311e7 0x5aca9b 0x4bb106 0x5bcf53 0x50a2bf 0x50bfb4 0x507d64 0x509042 0x594931 0x549e5f 0x5513d1 0x5a9cbc 0x50a5c3 0x50cd96 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x588d41 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x509758.

Additional Notes: The increase RAM message doesnt come when colab crashes so I am essentially working 12.72 GB RAM.
Please help me

LysandreJik · 2020-05-29T13:56:39Z

Hi! This is indeed a memory error. At which point does it crash?

BramVanroy · 2020-05-30T10:10:40Z

A similar problem like this has been reported when using the Lazy version of LinebyLineTextDataset. Colab deals badly with situation where you are using 90+% of memory - it'll kick you out or throw OOM errors - which you would not get on local machines. This is unfortunate and hard to get around.

In this case, I think you are simply running out of memory. The newsroom dataset is huge (1M+ news articles). So that is likely the issue.

karndeb · 2020-05-31T10:32:21Z

@LysandreJik It crashes when suddenly the ram usage increases to around 7-8 GB and the increase is also very sudden. Its like it stays at 2-3 GB usage for a minute or so and then suddenly it shoots to 8GB and crashes.
@BramVanroy I tried reducing the dataset by half and running it but I am still getting the same error. So would you suggest running it on local machine?I will have to run this part on local as on my local I have a little better ram (16GB ) but then I will have to train in colab only as I dont have a GPU on my local laptop. Is there a better workaround
Also thanks guys for giving such a quick answer.

BramVanroy · 2020-05-31T16:41:24Z

The sudden increase in RAM may be due to a sudden very large sentence/text which results in the whole batch having to be very large, exponentially increasing the memory usage.

How large is the dataset in terms of GB/number of lines?

Unfortunately sometimes you cannot do what you would want due to practical restrictions (mostly money). If you want to train or finetune a model with a huge dataset, it is likely that you need more hardware than is available in free plans.

Perhaps you can try out https://github.com/huggingface/nlp and find out if it has the dataset that you need. If not you can open an issue there and ask whether the dataset can be included. That should solve some issues since it takes into account RAM issues.

ShoubhikBanerjee · 2020-06-19T04:59:26Z

@BramVanroy I have tried with 26GB ram, but it still crashes, is there any minimum requirement of hardware mentioned?

BramVanroy · 2020-06-19T09:51:37Z

No. I fear that this might simply not work on Colab. Line cache loads as much of the file as it can in memory and goes from there but Colab is being annoying and locks you out because it thinks you are going to throw an OOM error (but you won't on a regular system).

ShoubhikBanerjee · 2020-06-19T11:08:50Z

@BramVanroy , I am using "notebook" platform from google AI platforms with 26GB ram, (without GPU) but after running 2% for the very 1st epoch, it says :

can't allocate memory: you tried to allocate 268435456 bytes. Error code 12 (Cannot allocate memory) .

Am I doing something wrong?

BramVanroy · 2020-06-21T06:49:06Z

Just read my previous post. This is a problem about how Google deals with increasing memory usage and thinks an OOM will occur even though it won't. The problem is not with the implementation. It seems that you cannot. Use this functionality in these kinds of VMs.

stale · 2020-08-22T02:07:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

moghadas76 · 2021-07-07T17:39:20Z

Could anybody find a solution for the issue?

raceee mentioned this issue Aug 13, 2020

Memory Issue while following LM tutorial #6454

Closed

stale bot added the wontfix label Aug 22, 2020

stale bot closed this as completed Aug 29, 2020

jiesutd mentioned this issue May 26, 2021

About "tcmalloc: large alloc" message and training aborted jiesutd/NCRFpp#171

Closed

aolko mentioned this issue Jul 21, 2021

Notebook automatically stops by performing ctrl-c KoboldAI/KoboldAI-Client#62

Closed

jmhessel mentioned this issue Sep 23, 2021

latest TPU VM dies on import of TrainingArguments? #13721

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Colab crashes due to tcmalloc large allocation #4668

Colab crashes due to tcmalloc large allocation #4668

karndeb commented May 29, 2020 •

edited

Loading

LysandreJik commented May 29, 2020

BramVanroy commented May 30, 2020

karndeb commented May 31, 2020 •

edited

Loading

BramVanroy commented May 31, 2020 •

edited

Loading

ShoubhikBanerjee commented Jun 19, 2020

BramVanroy commented Jun 19, 2020

ShoubhikBanerjee commented Jun 19, 2020 •

edited

Loading

BramVanroy commented Jun 21, 2020

stale bot commented Aug 22, 2020

moghadas76 commented Jul 7, 2021

Colab crashes due to tcmalloc large allocation #4668

Colab crashes due to tcmalloc large allocation #4668

Comments

karndeb commented May 29, 2020 • edited Loading

LysandreJik commented May 29, 2020

BramVanroy commented May 30, 2020

karndeb commented May 31, 2020 • edited Loading

BramVanroy commented May 31, 2020 • edited Loading

ShoubhikBanerjee commented Jun 19, 2020

BramVanroy commented Jun 19, 2020

ShoubhikBanerjee commented Jun 19, 2020 • edited Loading

BramVanroy commented Jun 21, 2020

stale bot commented Aug 22, 2020

moghadas76 commented Jul 7, 2021

karndeb commented May 29, 2020 •

edited

Loading

karndeb commented May 31, 2020 •

edited

Loading

BramVanroy commented May 31, 2020 •

edited

Loading

ShoubhikBanerjee commented Jun 19, 2020 •

edited

Loading