-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quit using ForeignPtr in favor of ByteArray# #193
Comments
@andrewthad well, if we can figure out a good way to do that it's still on the table; but in the mean-time there's also the direction explored in #175 to slim down the type. Eliminating the use of ForeignPtr would however come at a cost in terms of regression no? ByteStrings are used alot for IO & FFI purposes and you'd lose the ability to seamlessly refer to foreign memory (including |
Here is what we would lose, along with my gut feeling about how much of a problem these are:
Here are the benefits:
In my mind, the obstacle is that it's hard to measure the performance impact. In a ideal world, I'd like to try out this change and then build something huge like GHC (or anything else with good macrobenchmarks) and see if it gets noticeably faster. However, this isn't possible because What I would like to do is add functions to |
In general, you have to assume that if the C library allocates and gives you a memory chunk and retains ownership, you ought not to shuffle it around in memory unless the C API tells you explicitly that you're allowed to. I can't name you a concrete C library that does this ottomh, but I recall having encountered a couple of such libraries in my past C++ life (both OSS and proprietary) where the C library would silently screw up if you handed it back a cloned memory buffer rather than the original because it actually used the pointer-addr as key into a hashmap, or did its own memory allocation management and would get utterly confused if it got passed a memory buffer back that it didn't recognize. Interfacing with C APIs is definitely one of
Could you identify/enumerate those patterns? Reducing the need to reach for |
Just a comment that |
Pinning memory in ByteString is crucial in nginx-haskell-module for interoperability between the core Nginx C code and Haskell handlers which produce ByteStrings to be directly consumed in the core. When a Haskell handler finishes its task, it notifies the core via an event channel (which can be an eventfd or a pipe). The core part takes ownership of the ByteString's internals via a StablePtr to the ByteString which has been returned to the core. StablePtr guarantees that the ByteString can be reconstructed when accessed in the C core. But, if I understand this correctly, with only StablePtr as a container, there is no guarantee that the ByteString buffers will be located in the same addresses while the StablePtr is alive, and so, pinning the buffers is important in the C core as the C value is taken as C string(s) from the address of the ByteString's buffer(s). So, I am afraid that removing ByteString memory pinning would possibly make this simple sharing mechanism unreachable. |
It would be fairly easy to provide ways to create pinned ByteStrings using
this proposed representation, so that wouldn't really become out of reach.
…On Mon, Aug 24, 2020, 07:01 Alexey Radkov ***@***.***> wrote:
If the C library that handed you the memory expects you to pass it back
sometime later at the same address (meaning that you could not free() it),
the ByteArray#-backed variant would not work. I don't know if this ever
happens. I've not integrated with a C library that requires this, but maybe
there's one out there.
In general, you have to assume that if the C library allocates and gives
you a memory chunk and retains ownership, you ought not to shuffle it
around in memory unless the C API tells you explicitly that you're allowed
to. I can't name you a concrete C library that does this ottomh, but I
recall having encountered a couple of such libraries in my past C++ life
(both OSS and proprietary) where the C library would silently screw up if
you handed it back a cloned memory buffer rather than the original because
it actually used the pointer-addr as key into a hashmap, or did its own
memory allocation management and would get utterly confused if it got
passed a memory buffer back that it didn't recognize.
Pinning memory in ByteString is crucial in nginx-haskell-module
<https://github.com/lyokha/nginx-haskell-module> for interoperability
between the core Nginx C code and Haskell handlers which produce
ByteStrings to be directly consumed in the core. When a Haskell handler
finishes its task, it notifies the core via an event channel (which can be
an eventfd or a pipe). The core part takes ownership of the ByteString's
internals via a *StablePtr* to the ByteString which has been returned to
the core. StablePtr guarantees that the ByteString can be *reconstructed*
when accessed in the C core. But, if I understand this correctly, with only
StablePtr as a container, there is no guarantee that the ByteString buffers
will be located in the same addresses while the StablePtr is alive, and so,
pinning the buffers is important in the C core as the C value is taken as C
string(s) from the address of the ByteString's buffer(s). So, I am afraid
that removing ByteString memory pinning would possibly make this simple
sharing mechanism unreachable.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#193 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEOIX27755YILYJCG7QVGQ3SCJJCNANCNFSM4JRYAMBQ>
.
|
@andrewthad This would be highly appreciated. Cf. #253. |
I'd like to ask what the objective is here. Is it making the storage visible to GC, so that the total memory in use is not under-estimated, and GC is better able to infer when collection is needed, ... Or is it to try to reduce fragmentation by using unpinned storage? ByteStrings could certainly be backed by pinned |
This is a misunderstanding. The
Of course only if the underlying
I don't think that's on obstacle, ByteString creation primitives could be provided that wrap around an
If the bytestring only lives a short time, and does not contribute to "fragmentation" I would not expect a performance advantage from unpinned memory.
The touch code needs to remain whenever the content is passed to FFI functions, ensuring that the storage is kept alive for the duration of the FFI call.
I've been tinkering with |
note that this can be achieved with
|
In my opinion, the primary advantage to performing this change is to avoid fragmentation. It is very easy to end up with a severely fragmented heap by accident if you use ByteStrings, or use libraries which do (ie The cost is to people who actually use the pinned nature of ByteStrings, this can be mitigated by providing a |
The Builder code fundamentally relies on FFI calls to generate the various serialisations it supports, and much of the networking code, Data.Binary, ... rely on ByteStrings not getting moved around during FFI calls. I'd say most of the uses of ByteString are not just because users want 8-bit strings, but rather because they do I/O with ByteStrings. Is memory fragmentation really that important? For ByteStrings one intents to keep around for a long time (as Map keys, ...) ShortByteString already provides a suitable API. The ByteStrings used as I/O buffers don't tend to be long-lived (at least used correctly), so I am not sure that memory fragmentation is a substantial issue. What happens if pinning is removed? What can we still do in terms of FFI? |
Fragmentation is a serious issue. One small bytestring can easily retain a whole megablock (1mb) if you are allocating a lot of small bytestrings you get into this situation quite quickly. Perhaps a better approach is to use a mesh allocation strategy for pinned objects in GHC. This solve the fragmentation issue - https://raw.githubusercontent.com/plasma-umass/Mesh/master/mesh-pldi19-powers.pdf |
@vdukhovni Only @mpickering |
Independently of its merits or their absence, this ticket is blocked by existing ecosystem, which routinely, without a second thought reaches out for |
I have an extremely good appetite for this breakage. |
An issue was reported to the GHC issue tracker which demonstrated very bad memory fragmentation due to the large amount of small ByteStrings which were allocated. https://gitlab.haskell.org/ghc/ghc/-/issues/20065#note_366130 |
I'm tempted to close this issue. There is no migration path to unpin |
@sjakobi how do you feel about closing this issue? |
I agree with this, although I suspect that migrating from I hope that GHC can quickly start using MESH (https://gitlab.haskell.org/ghc/ghc/-/issues/19175) to reduce memory fragmentation. |
Such an embarrassment for Haskell ecosystem that this is "closed". 😕 That ByteString is backed by pinned memory and naive usage may cause heap fragmentation — is a shenanigan, it would surprise any developer not intimately familiar with GHC RTS innards, who's just looking for a type to hold some bytes. All too often, the bytes would just be pre-encoded text, with no particular FFI or IO requirements, except perhaps for easier interactions with other library APIs. It's good that the caveat is documented: bytestring/Data/ByteString/Short.hs Lines 60 to 68 in 54cd761
— but IMO, it's not enough to have this on I'll submit a PR if such a doc-patch is welcome. It's the least we can do. |
Documentation improvements are welcome.
If you can offer a specific migration plan, I'm all ears. If not, please choose your words more carefully. It's unfair to transfer the responsibility for using wrong data type from a developer to |
@Bodigrim I didn't try to do that... Apologies for the tone — it only reflects the deep sadness of situation for the finders of this ticket. Carries no intent to blame, as indeed, I see no viable migration plan either. Observe that it's difficult to know upfront what abuse of |
As I said, documentation improvements are welcome. |
ShortByteString is now on par with ByteString in terms of API, since 0.11.3.0. So there's no reason to not use it excessively. |
Three years ago, @dcoutts wrote:
Is this still planned. Is there still interest in doing this?
The text was updated successfully, but these errors were encountered: