Improve marshalling of strings in CreateStringTensor #15520

tannergooding · 2023-04-14T22:27:21Z

This should resolve dotnet/machinelearning#6620

The previous logic was correct, but it was creating tensor.Length managed arrays and tensor.Length GCHandles on top of that to ensure pinning. Disposal of GCHandles can be quite expensive as can having all of these short lived allocations in the form of the arrays as it increases GC pressure.

This new approach changes to using a MarshaledString and MarshaledStringArray helper. This ends up doing native allocations instead which means they are implicitly pinned, doesn't increase GC pressure, and doesn't require costly GCHandles.

From local testing, @michaelgsharp saw that this was between 2-5x faster.

This isn't necessarily the "best" it could be. Depending on exactly what's being done, it could be more optimal to allocate 1x big native allocation and converting all the strings to it sequentially. We could likewise transcode the data directly to the native memory rather than doing it in managed and then copying.

It looks like OrtFillStringTensor is basically just copying all the individual strings into a linearized buffer and so is effectively doing the "1x big native allocation" and so some other considerations may be "even better", but as is this should significantly helper the perf of the scenario in question.

There may be other places that are doing similar things with marshalling that could be switched over.

tannergooding · 2023-04-14T22:40:48Z

It looks like there are up to 70 instances of DisposableList being used. If this PR is taken, a follow up PR that replaces all of these with MarshaledStringArray or similar functionality should be taken.

Each one should provide similar improvements to perf and a reduction in GC pressure, particularly where GCHandles are being used behind the scenes.

yuslepukhin · 2023-04-14T23:13:21Z

I have some more changes in progress, I can merge those as well to avoid parallel efforts.

yuslepukhin · 2023-04-14T23:16:51Z

It looks like there are up to 70 instances of DisposableList being used. If this PR is taken, a follow up PR that replaces all of these with MarshaledStringArray or similar functionality should be taken.

Each one should provide similar improvements to perf and a reduction in GC pressure, particularly where GCHandles are being used behind the scenes.

We only copy strings, everything else is read directly from either pinned managed memory or native memory on the way back.

GC handles are being eliminated as not necessary.

yuslepukhin · 2023-04-14T23:18:57Z

csharp/src/Microsoft.ML.OnnxRuntime/OrtValue.shared.cs

            try
            {
+                marshaledStrings = new MarshaledStringArray(tensor);
+                ortValue = new OrtValue(valueHandle);


ortValue = new OrtValue(valueHandle);

Always been curious as why this new is moved under try and then we have to check for null in the finally/catch?

It avoids an extra allocation in the case of failure or if the constructor itself fails. It's not strictly necessary, just generally considered "good practice" to handle it that way.

yuslepukhin · 2023-04-14T23:20:37Z

csharp/src/Microsoft.ML.OnnxRuntime/NativeMethods.shared.cs

@@ -1824,4 +1826,105 @@ internal class NativeLib

 #endregion
    } //class NativeMethods
+
+    internal static class EmptyArray<T>


internal static class EmptyArray

Can this be moved to utilities?

Certainly. I'm not overly familiar with the codebase so wasn't sure where the best place to put various things would be.

tannergooding · 2023-04-19T20:42:37Z

Closing in favor of #15545 which includes these changes + others

@tannergooding

### Description Reduce a number of auxillary objects created to reduce GC pressure. Eliminate GCHandle type of memory pinning in most of the places. Improve string marshalling by allocating unmanaged memory that does not require pinning. Change native methods from `IntPtr` to `byte[]` (marshalling pinning is more efficient). Allocate input/output UTF-8 names in unmanaged heap for the lifetime of InferenceSession. So we do not keep converting them and pinning on every Run. Introduce a new native API that allows to allocate and convert/copy strings directly into a native tensor. The PR delivers around 50% latency improvements and less GC pauses. Inspired by: #15520 ### Motivation and Context Client experience GC pressure and performance degradation when dealing with string tensors. Co-Authored-By: @tannergooding

Improve marshalling of strings in CreateStringTensor

35eb4e4

tannergooding force-pushed the main branch from 48df73e to 35eb4e4 Compare April 14, 2023 22:27

michaelgsharp mentioned this pull request Apr 14, 2023

[Performance] High amount GC gen2 delays with ONNX models converted to ML.Net #15488

Open

yuslepukhin reviewed Apr 14, 2023

View reviewed changes

Move EmptyArray into ArrayUtilities

b4c6056

yuslepukhin mentioned this pull request Apr 17, 2023

[C# ] Improve string marshalling and reduce GC pressure #15545

Merged

tannergooding closed this Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve marshalling of strings in CreateStringTensor #15520

Improve marshalling of strings in CreateStringTensor #15520

tannergooding commented Apr 14, 2023

tannergooding commented Apr 14, 2023

yuslepukhin commented Apr 14, 2023

yuslepukhin commented Apr 14, 2023 •

edited

Loading

yuslepukhin Apr 14, 2023

tannergooding Apr 14, 2023

yuslepukhin Apr 14, 2023

tannergooding Apr 14, 2023

tannergooding commented Apr 19, 2023

Improve marshalling of strings in CreateStringTensor #15520

Improve marshalling of strings in CreateStringTensor #15520

Conversation

tannergooding commented Apr 14, 2023

tannergooding commented Apr 14, 2023

yuslepukhin commented Apr 14, 2023

yuslepukhin commented Apr 14, 2023 • edited Loading

yuslepukhin Apr 14, 2023

Choose a reason for hiding this comment

tannergooding Apr 14, 2023

Choose a reason for hiding this comment

yuslepukhin Apr 14, 2023

Choose a reason for hiding this comment

tannergooding Apr 14, 2023

Choose a reason for hiding this comment

tannergooding commented Apr 19, 2023

yuslepukhin commented Apr 14, 2023 •

edited

Loading