Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of RecyclableMemoryStreamManager with Confluent.Kafka #342

Closed
NicolaAtorino opened this issue Apr 29, 2024 · 2 comments
Closed

Use of RecyclableMemoryStreamManager with Confluent.Kafka #342

NicolaAtorino opened this issue Apr 29, 2024 · 2 comments

Comments

@NicolaAtorino
Copy link

NicolaAtorino commented Apr 29, 2024

Hello, i would like to use the RecyclableMemoryStream manager in order to handle serialization for producing messages using the c# Kafka client Confluent.

This is the interface that confluent exposes :

public interface ISerializer<T> { byte[] Serialize(T data, SerializationContext context); }

since this requires a simple byte array to return, i am not sure there are benefits in using the RecyclableMemoryManager.

An implementation i tried was this one :

public byte[] Serialize(T data, SerializationContext context)
{
        if (data == null) return null;
        using var stream = manager.GetStream();
        MemoryPackSerializer.Serialize(stream as IBufferWriter<byte>, data, MemoryPackSerializerOptions.Utf16);
        var ros = stream.GetReadOnlySequence();
        return ros.ToArray();   
}

the documentation says that calling stream.ToArray() defeats the purpose of the library. Is it the same for GetReadOnlySequence.ToArray() ?

And another question :
90% of the cases, the resulting stream after serialization has a size of around 2 megabytes. What would be in that case the optimal configuration of BlockSize and LargeBufferMultiple ? I'm trying to understand properly how to configure but i am not able to define clearly these settings.

any help is appreciated. Thank you very much

@benmwatson
Copy link
Member

That's really unfortunate for that interface. Anything that takes a pure byte array is just going to be inefficient and almost certainly require copying the bytes to a new array.

That doesn't necessarily mean RMS won't help you. Maybe there is significant internal work that can be covered, and you'll still get some benefit, even with the built-in interface.

To get the full benefit, you'd have to find a way to use a different interface that took a byte range instead. A new version of the library, something that takes Span? completely different serialization interface/method?

I hesitate to provide optimal settings for people because so much depends on more than just buffer size. It's how it's used, how many simultaneous usages, whether you want to avoid LOH allocs, and more.

Certainly, if most of the cases end up being 2 MB, you could "collapse" the two types of buffers into one flat pool and just use a 2 MB (or larger) block size. That would avoid a memory copy whenever you need that full buffer. A little different than the original use case, but I think it could work fine.

Only real answer is to measure and see!

@NicolaAtorino
Copy link
Author

Thanks for your answer. Unfortunately we cannot change the interface, but I could try and open a request on the confluent repo to see if this would be even doable. Even if the interface is changed, if internally the system requires a byte array (and it may, since that's a wrapper around a C++ library), we still wouldn't get very far.

Thanks for the tips - will continue exploring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants