Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid free in binaries on error message #5400

Closed
jatentaki opened this Issue Feb 24, 2018 · 9 comments

Comments

Projects
None yet
8 participants
@jatentaki
Copy link

jatentaki commented Feb 24, 2018

  • OS: Xubuntu 16.04 4.13.0-32-generic
  • PyTorch version: 0.3.0.post4
  • How you installed PyTorch (conda, pip, source): pip
  • Python version: Python 3.5.2
  • CUDA/cuDNN version: running on CPU

To reproduce, clone this repo and execute python3 run_cnn.py. On my machine it results in the following stack trace.

I believe the script is self-explanatory, I was trying to maxpool a 3d convolved tensor :)

@soumith soumith added the bug label Feb 26, 2018

@zou3519

This comment has been minimized.

Copy link
Contributor

zou3519 commented Feb 26, 2018

@li-roy, can you take a look at this?

@li-roy

This comment has been minimized.

Copy link
Contributor

li-roy commented Feb 28, 2018

@jatentaki I couldn't reproduce this. I get an invalid argument error. Were you using different numbers initially?

RuntimeError: invalid argument 2: pad should be smaller than half of kernel size, but got kT: 1 kW: 3, kH: 3, padT: 1, padW: 0, padH: 0 at /data/users/royboy/pytorch/torch/lib/THNN/generic/VolumetricDilatedMaxPooling.c:52
@jatentaki

This comment has been minimized.

Copy link
Author

jatentaki commented Feb 28, 2018

No, re-cloning the repo reproduces the bug for me. Also, I've just upgraded 3.0post4 to 3.1 and it also reproduces. Aren't you using a source-build master?

Also, if this runtime check is the go-to solution, it leads me to ask: if I have the RGB axis of an image, and I want to maxpool like

input = [R, G, B]
output = [max(0, R, G), max(R, G, B), max(G, B, 0)]

how should I go about it with this argument check? That was actually my intent with this code (along with some maxing along the W/H dimensions as well).

EDIT: Nvm the above, I confused it with a different layer. This snippet is definitely a bug on my side and should indeed be caught with invalid argument exception.

EDIT2: I am running the no-cuda wheel. Perhaps that makes the difference?

@soumith

This comment has been minimized.

Copy link
Member

soumith commented Mar 1, 2018

So, the invalid free is a bit of a side-effect of the wheel binaries that we have.
Your original issue is what @li-roy reported, i.e.

RuntimeError: invalid argument 2: pad should be smaller than half of kernel size, but got kT: 1 kW: 3, kH: 3, padT: 1, padW: 0, padH: 0 at /data/users/royboy/pytorch/torch/lib/THNN/generic/VolumetricDilatedMaxPooling.c:52

He couldn't reproduce it probably because of using conda binaries or source installs.

the rabbit-hole is deep, but the happens roughly because of this:

  • we statically link stdc++ with binaries (because old OSes usually dont have a sufficiently recent stdc++ shipped)
  • additionally because a bug in the libstdc++ shipped with RHEL6, we also statically link weak symbols (that includes destructors)
  • the consequence of statically linking destructors of libstdc++ is a bit iffy. For certain internal data structures like std::string, each instance of libstdc++ symbols (my understanding / speculation) do their own book-keeping for construction/destruction, i.e. each string is sort of interned in a sense, internal to the library.

So what actually is happening is

  • a std::string was passed from the THNN extension side over to _C land
  • _C, after finishing using the string, calls the destructor on it
  • but because _C didn't actually allocate the string, it's internal table doesn't have a reference to the pointer, so it sees it as an invalid free

In the next binary release, I'll figure out a solution for this.

Thanks a lot for the issue and the repro.

@Coolnesss

This comment has been minimized.

Copy link

Coolnesss commented Apr 18, 2018

Hey, I got the same error, not sure if it's useful but here's a simpler reproducing code:

import torch
from torch import nn
from torch.autograd import Variable

m = nn.Sequential(
    nn.Conv1d(1, 8, kernel_size=10, stride=1),
    nn.MaxPool1d(2496)
)

a = torch.randn(256, 1, 2500)
m(Variable(a))

I also get

*** Error in `python3': free(): invalid pointer: 0x00007fa93dae18e0 ***

etc.

Can confirm this works on the latest master '0.4.0a0+533beab', but not on stable 0.3.1 (wheel)

@soumith

This comment has been minimized.

Copy link
Member

soumith commented Apr 26, 2018

fixed in 0.4

@soumith soumith closed this Apr 26, 2018

@Ploppz

This comment has been minimized.

Copy link

Ploppz commented Oct 14, 2018

In version 0.4.1.post2 installed via pip3, I still have this problem. I get free(): invalid pointer when I run the third code block here https://github.com/udacity/deep-reinforcement-learning/blob/master/dqn/solution/Deep_Q_Network_Solution.ipynb

I am on Arch Linux by the way.

@SPark9625

This comment has been minimized.

Copy link

SPark9625 commented Dec 13, 2018

I've built v1.0.0 branch from source and I get this issue as well. And I found a solution here.

user iammohitchhabra wrote that

sudo apt-get install libtcmalloc-minimal4
export LD_PRELOAD="/usr/lib/libtcmalloc_minimal.so.4"

fixes the error, and it indeed works on Ubuntu 16.04.
But does anyone know why?

@jwakely

This comment has been minimized.

Copy link

jwakely commented Feb 5, 2019

Are these crashes in std::string destructors for empty (i.e. zero-length) strings? It's possible it's related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86138 which is fixed in recent GCC versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.