Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int overflow when allocating large vector (new in vector-0.11.0.0) #98

Closed
rwbarton opened this issue Sep 18, 2015 · 9 comments
Closed

Comments

@rwbarton
Copy link

vector segfaults when allocating a mutable unboxed vector larger than 4GB. Reported on StackOverflow.

import Data.Int
import qualified Data.Vector.Unboxed.Mutable as UM

n = 1000000000

main = do
    a <- UM.new n
    UM.read a 42 :: IO Int32

The segfault is in UM.new.

Tested with GHC 7.8.4 and 7.10.1. vector-0.10.12.2 did not segfault, also tested with the same GHC versions.

@cartazio
Copy link
Contributor

Argh. I thought we did some work to fix that for 0.11.

On Friday, September 18, 2015, rwbarton notifications@github.com wrote:

vector segfaults when allocating a mutable unboxed vector larger than 4GB.
Reported on StackOverflow
http://stackoverflow.com/questions/32645089/how-come-haskell-got-a-segmentation-fault-when-the-vector-is-very-large-but-unde
.

import Data.Int
import qualified Data.Vector.Unboxed.Mutable as UM

n = 1000000000

main = do
a <- UM.new n
UM.read a 42 :: IO Int32

The segfault is in UM.new.

Tested with GHC 7.8.4 and 7.10.1. vector-0.10.12.2 did not segfault, also
tested with the same GHC versions.


Reply to this email directly or view it on GitHub
#98.

@dolio
Copy link
Contributor

dolio commented Sep 18, 2015

This is actually a primitive bug. The memset operations will overflow.

It's fixed in primitive head, but somehow I neglected to release it, even though I specifically fixed this issue in it in preparation for releasing 0.11.

I'll get primitive release out, and we should increase the required version of vector 0.11.x to use it. Do people prefer a separate release for that, or editing the bounds on hackage?

@rwbarton
Copy link
Author

Oh, an existing bug in a function in primitive that is only used by the new version of vector? I was confused at first, since I have only one version of primitive installed under 7.8.4.

@dolio
Copy link
Contributor

dolio commented Sep 18, 2015

Yes, exactly.

new in 0.11 zeroes out memory to prevent leaking uninitialized memory. But the C memset implementations in primitive take int arguments, so they overflow long before the bounds checks on allocation (which were also added in 0.11) will catch them on a 64 bit system. The unreleased version is fixed to use appropriately sized types.

0.10 just lets you see uninitialized memory with new.

@dolio
Copy link
Contributor

dolio commented Sep 20, 2015

I've released a new primitive that fixes this, version 0.6.1.0. There were no API changes, so a minor version increase was fine, and no vector changes are necessary.

@dolio dolio closed this as completed Sep 20, 2015
@treeowl
Copy link
Contributor

treeowl commented Sep 22, 2015

Did you/could you set impossible bounds for the broken version?

@dolio
Copy link
Contributor

dolio commented Sep 23, 2015

Which ones? The bug goes back at least to primitive 0.5, from 3 years ago. But actually there is already bad C code in 0.3, from 5 1/2 years ago. I'm unsure if it will actually cause problems, but its function signatures for memcpy/memmove/etc. wrappers use int.

@cartazio
Copy link
Contributor

So essentially it's impossible to shield users from hr bug unless we force
everyone to upgrade to 0.6. Which isn't an option.

On Tuesday, September 22, 2015, dolio notifications@github.com wrote:

Which ones? The bug goes back at least to primitive 0.5, from 3 years ago.
But actually there is already bad C code in 0.3, from 5 1/2 years ago. I'm
unsure if it will actually cause problems, but its function signatures for
memcpy/memmove/etc. wrappers use int.


Reply to this email directly or view it on GitHub
#98 (comment).

@dolio
Copy link
Contributor

dolio commented Sep 23, 2015

In some form, yes.

If they use 0.5 - 0.6, memset will be bugged, and for vector 0.11, new will be bugged because it uses memset.

If they use 0.3 - 0.5 or so, there's probably some undefined behavior about what happens when you memcpy and stuff with large enough arrays, because it's passing signed 64-bit sizes into signed 32-bit C ints (provided you're on a 64-bit system). vector can't build with any of those versions, but someone could.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants