New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could not allocate pinned host memory of size: 2147483648 #2
Comments
I set up a notebook independently in Colaboratory and hit the same stack trace: I suspect the issue is that the Colaboratory VM has too little normal-RAM (13 GB) and the VM is going OOM. |
Like Max said, I think you're running out of RAM. For me, generation.py takes 36 GB of RAM even while idle. |
I see you're using a T4 GPU. After some testing it appears that while you can start generations.py on a T4, you cannot actually generate text without it running out of memory. Even with TF_FORCE_GPU_ALLOW_GROWTH=true. A V100 GPU with 16 GB works fine. Edit: A P100 (16 GB) also works, which makes it the cheapest GPU with enough memory. 🙂 |
That's weird; a T4 and a V100 both have 16GB of VRAM. AI is funny. Volta may have more memory efficiencies but I thought Turing got some of those too. |
@minimaxir It is strange. I'm not sure what makes the difference, but the amount of memory shown as available from a T4 (in TensorFlow logs or nvidia-smi) is less than with a V100. Right now nvidia-smi shows me:
Bryan McCann tweeted that the model needs 15458 MiB so this seems to explain why the T4 is the only "16 GB" GPU that can't fit it. I also noticed this in one of my own projects: a batch size that worked on a V100 would be too much for a T4. |
There might be a way to hack a version of the code with (slightly) smaller memory requirement. |
I'm doing the same thing on Colab too. I'll be grateful if you may fix it ^_^ |
@Disciple7 One option with Colab is to create a more capable machine with one of Google's Deep Learning VM images, then configure Colab to use it. Similar to this blog post but with a P100 and at least about 45 GB of disk space (the model is big). For this you will need to request a quota increase from Google for global GPUs 0->1 and P100 GPUs 0->1 in a region that has P100s such as us-central1-f (find other regions here). Don't forget to delete the machine after, since it's fairly expensive. |
@AdamDanielKing Thank you, I'll take a look. |
Given that this app only has a CLI at the moment, using a local runtime for Colab seems redundant; might as well run it directly on the VM by SSHing into the instance if we're going to have one up. The VMs can be built as preemptible: for the config described it'll be about $0.50/hr, which is reasonable. I also believe that new GCP projects come with some GPU quota by default now; i'll double check. Additionally, the VMs must be launched with full GCP API access in order for I can write up a guide once I get things working. |
So I have a couple questions:
Also, there should be a RAM requirement mentioned on the README or somewhere, unless I'm missing that. |
|
Yeah, I meant SSHing into the raw GCE instance. |
Even with a P100/V100 and generous system RAM, loading the model hits the VRAM ceiling and errors out.
|
I added a new branch which allows for inference on GPUs with lower available memory. I tested it on K80s on Collaboratory here https://colab.research.google.com/drive/1hVveBQShDru1Mjnhe4C21uQv4A2eH1tV The details on how to use it can be found at the top of the README (Update @ Sep 19, 2019 subsection). This is still in testing phase so expect a few bumps. Closing this for now, please reopen if there are issues. |
On a similar topic, I've managed to start generating words on a V100, but the only words that are generated is the last work from the input prompt over and over again. Any advice what's wrong? I'm using the 512 version. |
Running
!python2 generation.py --model_dir "/content/ctrl/seqlen256_v1.ckpt"
in Colab outputs this:The text was updated successfully, but these errors were encountered: