Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab crashes due to tcmalloc large allocation #4668

Closed
karndeb opened this issue May 29, 2020 · 10 comments
Closed

Colab crashes due to tcmalloc large allocation #4668

karndeb opened this issue May 29, 2020 · 10 comments
Labels

Comments

@karndeb
Copy link

karndeb commented May 29, 2020

I am pretraining a RoBERTa model on the Newsroom dataset on colab. I have trained a custom tokenizer on the text data. I am using the Text Dataset LinebyLineTextDataset as I have a single file and each line is the text of a news article. The colab crashes when I run this code
%%time
from transformers import LineByLineTextDataset
dataset = LineByLineTextDataset(tokenizer=tokenizer,
file_path="/content/drive/My Drive/Newsroom Dataset/newsroom-firsthalf.txt",
block_size=128)

I tried with the full dataset and reduced it to half and have also tried it by reducing the block size.
The config is
config = RobertaConfig(
vocab_size=52000,
max_position_embeddings=514,
num_attention_heads=12,
num_hidden_layers=6,
type_vocab_size=1)

and the error log is
tcmalloc: large alloc 7267041280 bytes == 0x9d916000 @ 0x7f3ea16311e7 0x5aca9b 0x4bb106 0x5bcf53 0x50a2bf 0x50bfb4 0x507d64 0x509042 0x594931 0x549e5f 0x5513d1 0x5a9cbc 0x50a5c3 0x50cd96 0x507d64 0x516345 0x50a2bf 0x50bfb4 0x507d64 0x588d41 0x59fc4e 0x50d356 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x507d64 0x509a90 0x50a48d 0x50bfb4 0x509758.

Additional Notes: The increase RAM message doesnt come when colab crashes so I am essentially working 12.72 GB RAM.
Please help me

@LysandreJik
Copy link
Member

Hi! This is indeed a memory error. At which point does it crash?

@BramVanroy
Copy link
Collaborator

A similar problem like this has been reported when using the Lazy version of LinebyLineTextDataset. Colab deals badly with situation where you are using 90+% of memory - it'll kick you out or throw OOM errors - which you would not get on local machines. This is unfortunate and hard to get around.

In this case, I think you are simply running out of memory. The newsroom dataset is huge (1M+ news articles). So that is likely the issue.

@karndeb
Copy link
Author

karndeb commented May 31, 2020

@LysandreJik It crashes when suddenly the ram usage increases to around 7-8 GB and the increase is also very sudden. Its like it stays at 2-3 GB usage for a minute or so and then suddenly it shoots to 8GB and crashes.
@BramVanroy I tried reducing the dataset by half and running it but I am still getting the same error. So would you suggest running it on local machine?I will have to run this part on local as on my local I have a little better ram (16GB ) but then I will have to train in colab only as I dont have a GPU on my local laptop. Is there a better workaround
Also thanks guys for giving such a quick answer.

@BramVanroy
Copy link
Collaborator

BramVanroy commented May 31, 2020

The sudden increase in RAM may be due to a sudden very large sentence/text which results in the whole batch having to be very large, exponentially increasing the memory usage.

How large is the dataset in terms of GB/number of lines?

Unfortunately sometimes you cannot do what you would want due to practical restrictions (mostly money). If you want to train or finetune a model with a huge dataset, it is likely that you need more hardware than is available in free plans.

Perhaps you can try out https://github.com/huggingface/nlp and find out if it has the dataset that you need. If not you can open an issue there and ask whether the dataset can be included. That should solve some issues since it takes into account RAM issues.

@ShoubhikBanerjee
Copy link

@BramVanroy I have tried with 26GB ram, but it still crashes, is there any minimum requirement of hardware mentioned?

@BramVanroy
Copy link
Collaborator

No. I fear that this might simply not work on Colab. Line cache loads as much of the file as it can in memory and goes from there but Colab is being annoying and locks you out because it thinks you are going to throw an OOM error (but you won't on a regular system).

@ShoubhikBanerjee
Copy link

ShoubhikBanerjee commented Jun 19, 2020

@BramVanroy , I am using "notebook" platform from google AI platforms with 26GB ram, (without GPU) but after running 2% for the very 1st epoch, it says :

can't allocate memory: you tried to allocate 268435456 bytes. Error code 12 (Cannot allocate memory) .

Am I doing something wrong?

@BramVanroy
Copy link
Collaborator

Just read my previous post. This is a problem about how Google deals with increasing memory usage and thinks an OOM will occur even though it won't. The problem is not with the implementation. It seems that you cannot. Use this functionality in these kinds of VMs.

@stale
Copy link

stale bot commented Aug 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@moghadas76
Copy link

Could anybody find a solution for the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants