Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quit using ForeignPtr in favor of ByteArray# #193

Closed
andrewthad opened this issue Nov 26, 2019 · 25 comments
Closed

Quit using ForeignPtr in favor of ByteArray# #193

andrewthad opened this issue Nov 26, 2019 · 25 comments

Comments

@andrewthad
Copy link
Contributor

Three years ago, @dcoutts wrote:

Long term we'd like to eliminate the use of ForeignPtr and use only pinned or unpinned ByteArray#s

Is this still planned. Is there still interest in doing this?

@hvr
Copy link
Member

hvr commented Dec 19, 2019

@andrewthad well, if we can figure out a good way to do that it's still on the table; but in the mean-time there's also the direction explored in #175 to slim down the type.

Eliminating the use of ForeignPtr would however come at a cost in terms of regression no? ByteStrings are used alot for IO & FFI purposes and you'd lose the ability to seamlessly refer to foreign memory (including mmap(2)ed memory and other shared buffers) with a uniform ByteString type. Would we need to introduce a separate type to cover those lost use-cases?

@andrewthad
Copy link
Contributor Author

Here is what we would lose, along with my gut feeling about how much of a problem these are:

  • Functionality: ByteString could not be used with mmap. I'm skeptical that it is ever a good idea to use ByteString with mmap. You'll either end up needing to close() a file descriptor in a finalizer, which is a great way to leak file descriptors, or you just need to open the mmap at the beginning with no plans to ever call close(). I think of this as an uncommon use case although someone may be doing it.
  • Performance: If you are dealing with a foreign function that allocates memory, you would have to copyAddrToByteArray# this into a MutableByteArray#, freeze the byte array in-place, and then free() the original memory. Most C libraries, rather than mallocing memory and returning it, have interfaces that allow the user to pass in a buffer. In these case, performance would not suffer.
  • Functionality: If the C library that handed you the memory expects you to pass it back sometime later at the same address (meaning that you could not free() it), the ByteArray#-backed variant would not work. I don't know if this ever happens. I've not integrated with a C library that requires this, but maybe there's one out there.

Here are the benefits:

  • Performance: I suspect that the overwhelming majority of ByteStrings out there use PlainPtr and do not need to be pinned. These should perform better. I have no idea what kind of improvement should be expected.
  • Space: Even with Remove the offset parameter completely #175, a PlainPtr-backed ByteString has an overhead of 8 machine words when it is on the heap (ByteString data constructor header + length field + addr field + ptr to ForeignPtrContents + PlainPtr data constructor header + ptr to ByteArray# + ByteArray# header + length). The ByteArray# variant would only have an overhead of 6 machine words, and it would result in only 2 objects on the heap instead of 3.
  • Simplicity. bytestring uses IO in a number of confusing ways. Pervasive use of ByteArray# would make it possible to use ST and uneffectful functions (like indexWord8Array#) to model effectfulness with types. Also, all of the touch-related code would go away.

In my mind, the obstacle is that it's hard to measure the performance impact. In a ideal world, I'd like to try out this change and then build something huge like GHC (or anything else with good macrobenchmarks) and see if it gets noticeably faster. However, this isn't possible because bytestring forces users to use the internal API to do a ton of common stuff. For example, it's not currently possible to implement fromByteArray# :: ByteArray# -> ByteString without importing an internal module.

What I would like to do is add functions to bytestring that capture the common patterns that show up when people import the internal modules. After this lands in a release, I could chip away at the uses of bytestring's internal API in several of the standard libraries. (The nice thing about this is that it's probably a good idea to do this anyway). Then it actually be possible to swap out the implementation in a branch and measure the impact.

@hvr
Copy link
Member

hvr commented Jan 12, 2020

If the C library that handed you the memory expects you to pass it back sometime later at the same address (meaning that you could not free() it), the ByteArray#-backed variant would not work. I don't know if this ever happens. I've not integrated with a C library that requires this, but maybe there's one out there.

In general, you have to assume that if the C library allocates and gives you a memory chunk and retains ownership, you ought not to shuffle it around in memory unless the C API tells you explicitly that you're allowed to. I can't name you a concrete C library that does this ottomh, but I recall having encountered a couple of such libraries in my past C++ life (both OSS and proprietary) where the C library would silently screw up if you handed it back a cloned memory buffer rather than the original because it actually used the pointer-addr as key into a hashmap, or did its own memory allocation management and would get utterly confused if it got passed a memory buffer back that it didn't recognize.

Interfacing with C APIs is definitely one of bytestring's primary use-cases/goals. But maybe one size doesn't fit all and maybe the solution to this is really to split up ByteString into two different types and un-conflate its two current goals (efficient representation of Word8 strict+lazy strings (+ associated operations) vs. FFI-optimized memory-pinned ByteStrings with support for ForeignPtr backed payloads).

Data.ByteString.Short kinda does already move into this direction (and I basically wrap ShortByteString in text-short); so maybe we should just expand this direction (and e.g. add more operations (some of which I can donate from text-short) as well as adding a "lazy" ShortByteString)?

What I would like to do is add functions to bytestring that capture the common patterns that show up when people import the internal modules.

Could you identify/enumerate those patterns? Reducing the need to reach for .Internals (assuming those patterns don't require allowing breaking invariants) sounds like a sensible goal in its own right.

@vdukhovni
Copy link
Contributor

  • Functionality: ByteString could not be used with mmap. I'm skeptical that it is ever a good idea to use ByteString with mmap. You'll either end up needing to close() a file descriptor in a finalizer, which is a great way to leak file descriptors, or you just need to open the mmap at the beginning with no plans to ever call close(). I think of this as an uncommon use case although someone may be doing it.

Just a comment that mmap()ed files are not automatically unmapped on close, so there's no need to close in the finalizer, the file can be closed right away. To actually remove the mapping one needs to call munmap(). This is not a comment in support of or against the proposal to use ByteArray#, just a clarification re mmap().

@lyokha
Copy link

lyokha commented Aug 24, 2020

If the C library that handed you the memory expects you to pass it back sometime later at the same address (meaning that you could not free() it), the ByteArray#-backed variant would not work. I don't know if this ever happens. I've not integrated with a C library that requires this, but maybe there's one out there.

In general, you have to assume that if the C library allocates and gives you a memory chunk and retains ownership, you ought not to shuffle it around in memory unless the C API tells you explicitly that you're allowed to. I can't name you a concrete C library that does this ottomh, but I recall having encountered a couple of such libraries in my past C++ life (both OSS and proprietary) where the C library would silently screw up if you handed it back a cloned memory buffer rather than the original because it actually used the pointer-addr as key into a hashmap, or did its own memory allocation management and would get utterly confused if it got passed a memory buffer back that it didn't recognize.

Pinning memory in ByteString is crucial in nginx-haskell-module for interoperability between the core Nginx C code and Haskell handlers which produce ByteStrings to be directly consumed in the core. When a Haskell handler finishes its task, it notifies the core via an event channel (which can be an eventfd or a pipe). The core part takes ownership of the ByteString's internals via a StablePtr to the ByteString which has been returned to the core. StablePtr guarantees that the ByteString can be reconstructed when accessed in the C core. But, if I understand this correctly, with only StablePtr as a container, there is no guarantee that the ByteString buffers will be located in the same addresses while the StablePtr is alive, and so, pinning the buffers is important in the C core as the C value is taken as C string(s) from the address of the ByteString's buffer(s). So, I am afraid that removing ByteString memory pinning would possibly make this simple sharing mechanism unreachable.

@chessai
Copy link
Member

chessai commented Aug 24, 2020 via email

@Bodigrim
Copy link
Contributor

Bodigrim commented Oct 8, 2020

What I would like to do is add functions to bytestring that capture the common patterns that show up when people import the internal modules.

@andrewthad This would be highly appreciated. Cf. #253.

@vdukhovni
Copy link
Contributor

I'd like to ask what the objective is here. Is it making the storage visible to GC, so that the total memory in use is not under-estimated, and GC is better able to infer when collection is needed, ... Or is it to try to reduce fragmentation by using unpinned storage?

ByteStrings could certainly be backed by pinned MutableByteArray#, addressing the first concern, but if the goal is indeed to "unpin" the storage, then that rather changes the compatibility story, since they are used in a broad range of FFI use-cases, where it is crucial to know that GC will not move the bytestring content.

@vdukhovni
Copy link
Contributor

Here is what we would lose, along with my gut feeling about how much of a problem these are:

  • Functionality: ByteString could not be used with mmap. I'm skeptical that it is ever a good idea to use ByteString with mmap. You'll either end up needing to close() a file descriptor in a finalizer, which is a great way to leak file descriptors, or you just need to open the mmap at the beginning with no plans to ever call close(). I think of this as an uncommon use case although someone may be doing it.

This is a misunderstanding. The mmap(2) system call DOES NOT require the file to stay open. After you mmap a region of memory, you can close the file descriptor immediately, the mapping remains valid. To release the mapping you use munmap(2). Of course one still needs a finaliser to do the munmap.

  • Performance: If you are dealing with a foreign function that allocates memory, you would have to copyAddrToByteArray# this into a MutableByteArray#, freeze the byte array in-place, and then free() the original memory. Most C libraries, rather than mallocing memory and returning it, have interfaces that allow the user to pass in a buffer. In these case, performance would not suffer.

Of course only if the underlying MutableByteArray# is not pinned in the first place. If the idea is to just expose the memory usage to the GC, but not unpin the memory, it should be easy to just change the ForeignPtr associated with ByteStrings to capture and keep alive a pinned MutableByteArray# RealWorld, and set the associated Ptr to the address returned by byteArrayContents#, and pretty much have the same behaviour as before.

  • Functionality: If the C library that handed you the memory expects you to pass it back sometime later at the same address (meaning that you could not free() it), the ByteArray#-backed variant would not work. I don't know if this ever happens. I've not integrated with a C library that requires this, but maybe there's one out there.

I don't think that's on obstacle, ByteString creation primitives could be provided that wrap around an Addr# with either no finalisation, or whatever finalisation is appropriate for the returned buffer.

Here are the benefits:

  • Performance: I suspect that the overwhelming majority of ByteStrings out there use PlainPtr and do not need to be pinned. These should perform better. I have no idea what kind of improvement should be expected.

If the bytestring only lives a short time, and does not contribute to "fragmentation" I would not expect a performance advantage from unpinned memory.

  • Simplicity. bytestring uses IO in a number of confusing ways. Pervasive use of ByteArray# would make it possible to use ST and uneffectful functions (like indexWord8Array#) to model effectfulness with types. Also, all of the touch-related code would go away.

The touch code needs to remain whenever the content is passed to FFI functions, ensuring that the storage is kept alive for the duration of the FFI call.

What I would like to do is add functions to bytestring that capture the common patterns that show up when people import the internal modules. After this lands in a release, I could chip away at the uses of bytestring's internal API in several of the standard libraries. (The nice thing about this is that it's probably a good idea to do this anyway). Then it actually be possible to swap out the implementation in a branch and measure the impact.

I've been tinkering with streaming-bytestring lately, it uses the internal representation in multiple places in that PS fp off len is expected to yield access to a foreign pointer and length that can be used to directly access the storage via peek, peekElemOff, ... Basically, bytestring needs to continue to provide Foreign.Storable + ForeigPtr interface.

@chessai
Copy link
Member

chessai commented Dec 14, 2020

What I would like to do is add functions to bytestring that capture the common patterns that show up when people import the internal modules. After this lands in a release, I could chip away at the uses of bytestring's internal API in several of the standard libraries. (The nice thing about this is that it's probably a good idea to do this anyway). Then it actually be possible to swap out the implementation in a branch and measure the impact.

I've been tinkering with streaming-bytestring lately, it uses the internal representation in multiple places in that PS fp off len is expected to yield access to a foreign pointer and length that can be used to directly access the storage via peek, peekElemOff, ... Basically, bytestring needs to continue to provide Foreign.Storable + ForeigPtr interface.

note that this can be achieved with

data ByteString = ByteString ByteArray Int Int

@mpickering
Copy link

In my opinion, the primary advantage to performing this change is to avoid fragmentation. It is very easy to end up with a severely fragmented heap by accident if you use ByteStrings, or use libraries which do (ie http-client etc..)

The cost is to people who actually use the pinned nature of ByteStrings, this can be mitigated by providing a Data.ByteString.Pinned module which uses pinned memory.

@vdukhovni
Copy link
Contributor

The Builder code fundamentally relies on FFI calls to generate the various serialisations it supports, and much of the networking code, Data.Binary, ... rely on ByteStrings not getting moved around during FFI calls. I'd say most of the uses of ByteString are not just because users want 8-bit strings, but rather because they do I/O with ByteStrings. Is memory fragmentation really that important? For ByteStrings one intents to keep around for a long time (as Map keys, ...) ShortByteString already provides a suitable API. The ByteStrings used as I/O buffers don't tend to be long-lived (at least used correctly), so I am not sure that memory fragmentation is a substantial issue.

What happens if pinning is removed? What can we still do in terms of FFI?

@mpickering
Copy link

Fragmentation is a serious issue. One small bytestring can easily retain a whole megablock (1mb) if you are allocating a lot of small bytestrings you get into this situation quite quickly.

Perhaps a better approach is to use a mesh allocation strategy for pinned objects in GHC. This solve the fragmentation issue - https://raw.githubusercontent.com/plasma-umass/Mesh/master/mesh-pldi19-powers.pdf

@andrewthad
Copy link
Contributor Author

@vdukhovni Only safe FFI calls need pinned bytestring. GHC explicitly guarantees that no objects of any type can be moved during an unsafe FFI call. So we could still do a lot in terms of the FFI. On UNIX and UNIX-like systems, network does not need the objects to be pinned since it only makes nonblocking calls. I have no idea what it does on Windows though.

@mpickering MESH is extremely cool. The paper blew my mind. Realistically, I suspect it might be the only option for GHC at this point. One wrinkle is that the implementation that the paper describes wants objects of the same size class to be located together in the same block. So a block would only have objects of size 32 bytes, 64 bytes, etc. GHC doesn't do this. So I'm not sure if MESH could be incorporated into the runtime.

@Bodigrim
Copy link
Contributor

Independently of its merits or their absence, this ticket is blocked by existing ecosystem, which routinely, without a second thought reaches out for Data.ByteString.Internal and foreign pointer guts. If someone is serious about moving this forward, I'd be happy to discuss in #253 what could be done to aleviate this situation.

@emilypi
Copy link
Member

emilypi commented May 14, 2021

I have an extremely good appetite for this breakage.

@mpickering
Copy link

An issue was reported to the GHC issue tracker which demonstrated very bad memory fragmentation due to the large amount of small ByteStrings which were allocated.

https://gitlab.haskell.org/ghc/ghc/-/issues/20065#note_366130

@Bodigrim
Copy link
Contributor

I'm tempted to close this issue. There is no migration path to unpin ByteString, so I don't see it happening. Users, concerned about fragmentation, have an option to use ShortByteString or ByteArray.

@Bodigrim
Copy link
Contributor

@sjakobi how do you feel about closing this issue?

@sjakobi
Copy link
Member

sjakobi commented Mar 20, 2022

There is no migration path to unpin ByteString, so I don't see it happening. Users, concerned about fragmentation, have an option to use ShortByteString or ByteArray.

I agree with this, although I suspect that migrating from ByteString to ByteArray#-based types in user code may not be viable in many cases.

I hope that GHC can quickly start using MESH (https://gitlab.haskell.org/ghc/ghc/-/issues/19175) to reduce memory fragmentation.

@ulidtko
Copy link
Contributor

ulidtko commented May 17, 2023

Such an embarrassment for Haskell ecosystem that this is "closed". 😕

That ByteString is backed by pinned memory and naive usage may cause heap fragmentation — is a shenanigan, it would surprise any developer not intimately familiar with GHC RTS innards, who's just looking for a type to hold some bytes. All too often, the bytes would just be pre-encoded text, with no particular FFI or IO requirements, except perhaps for easier interactions with other library APIs.

It's good that the caveat is documented:

-- ** Heap fragmentation
-- | With GHC, the 'B.ByteString' representation uses /pinned/ memory,
-- meaning it cannot be moved by the GC. This is usually the right thing to
-- do for larger strings, but for small strings using pinned memory can
-- lead to heap fragmentation which wastes space. The 'ShortByteString'
-- type (and the @Text@ type from the @text@ package) use /unpinned/ memory
-- so they do not contribute to heap fragmentation. In addition, with GHC,
-- small unpinned strings are allocated in the same way as normal heap
-- allocations, rather than in a separate pinned area.

— but IMO, it's not enough to have this on ShortByteString only; ByteString should mention it too.

I'll submit a PR if such a doc-patch is welcome. It's the least we can do.

@Bodigrim
Copy link
Contributor

Documentation improvements are welcome.

Such an embarrassment for Haskell ecosystem that this is "closed". 😕

If you can offer a specific migration plan, I'm all ears. If not, please choose your words more carefully. It's unfair to transfer the responsibility for using wrong data type from a developer to bytestring authors/maintainers. Sure, by all means, do not abuse ByteString and go for ShortByteString or ByteArray instead.

@ulidtko
Copy link
Contributor

ulidtko commented May 17, 2023

@Bodigrim I didn't try to do that... Apologies for the tone — it only reflects the deep sadness of situation for the finders of this ticket. Carries no intent to blame, as indeed, I see no viable migration plan either.

Observe that it's difficult to know upfront what abuse of ByteString constitutes — unless one has already found the fragmentation note on ShortByteString. That's why I'm offering a doc patch. There surely can't be any compatibility concerns about that, right?

@Bodigrim
Copy link
Contributor

As I said, documentation improvements are welcome.

@hasufell
Copy link
Member

ShortByteString is now on par with ByteString in terms of API, since 0.11.3.0. So there's no reason to not use it excessively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests