Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed System.Text.Json deserialization on streamed data #866

Merged

Conversation

Sergio0694
Copy link
Contributor

What kind of change does this PR introduce?

Fixes #865.

What is the current behavior?

The SystemTextJsonContentSerializer.DeserializeAsync<T>(HttpContent content) API throws a NotSupportedException on contents that don't allow the Stream.Length property to be accessed.

What is the new behavior?

That method doesn't crash anymore in those cases.

What might this PR break?
Nothing.

Please check if the PR fulfills these requirements

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

Other information:
cc. @martincostello and @clairernovotny.
I've added a fix but I'm not 100% convinced about the workaround. As mentioned in the comment, I noticed that in some cases some Stream-s will have CanSeek return false even though they still let you access the Length property. In fact, all the unit tests were passing with the original implementation despite some of the Stream-s there not supporting seeking.

Thoughts on the current solution for this and/or suggestions are welcome 😊


try
{
var length = await stream.ReadAsync(buffer, 0, buffer.Length).ConfigureAwait(false);
streamLength = (int)stream.Length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can do this without throwing an exception?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like HttpBaseStream is internal, so we couldn't do a stream is HttpBaseStream check either.

The only other thing I can think of is to check the type name of the stream in some way, though that's a bit bleurgh and brittle.

Copy link
Contributor Author

@Sergio0694 Sergio0694 Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that'd be better, agreed. As I've mentioned though, for some reason the CanSeek property doesn't always match that behavior, even though technically (according to the docs), it should. My rationale here in writing that workaround was was:

  • A try/catch block (as far as I know) doesn't add (much) overhead if an exception is not thrown. Since I'd expect that to be the most common case, it should be fine.
  • If that fails, the additional overhead should still not matter much, as in that branch the code would fallback to the slower API using an input Stream, which would hurt performance on its own compared to the faster branch with the pooled array.

I'm open to suggestions here, not sure either solution is "perfect" 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the code more, it looks like, in this case at least, CanSeek is false, so the access on Length could be guarded by if (CanSeek)?

https://github.com/dotnet/runtime/blob/bd192ae891655adcd0e751e9ff4209f4e1bd3221/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpBaseStream.cs#L14

The exception could also maybe by filtered like catch (NotSupportedException) when (stream.CanSeek) so it isn't unconditionally swallowing the exception?

Are there scenarios where that wouldn't work? It might be that to ensure the method reasonably always succeeds, we might have to eat the performance benefit and go through the slow path if Length isn't guaranteed to work in all cases.

For what it's worth, the slow path for System.Text.Json seems to still be more efficient that Newtonsoft.Json anyway even if it's not as fast as with the pooling.

Copy link
Contributor Author

@Sergio0694 Sergio0694 Mar 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martincostello Accessing Length inside an if (CanSeek) check was the original implementation I tried, but then I noticed that as you said, CanSeek was false even when the Length property was actually accessed just fine. The main point is that a try block itself doesn't add overhead, in fact it's added implicitly by the C# compiler in each using block, which is used pretty often even in high performance code (and it's used even in that very same DeserializeAsync<T> method anyway, when getting the input stream. The slowdown would only be when catching the exception, but then again, we don't have a reliable way to avoid that since the CanSeek property can sometimes report an incorrect value.

Same goes for a filtered catch, since we're only retrieving the length in that try/finally block, regardless of the underlying reason for that exception, if we fail to get that it means that at this point we have to fallback to the Stream overload.

With all these things considered, I'm not really seeing why we should just prevent the fast path from ever happening, considering that at least from what I can see here, it's being taken and working properly pretty often (eg. it's used in all the current unit tests).

Note that the point of System.Text.Json is not so much the speed improvement (which was about 30% in my tests), but the reduced memory usage (see here).

Personally I'd prefer to keep the current solution/workaround, at least for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m happy with the PR as it is, but that’s not my call. Was just spit-balling on a possible alternative.

However without a fix of some kind I’ll have to go back to using a custom version rather than this new built-in one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah absolutely, I wasn't arguing with you here, just trying to provide as much feedback and details as possible both for you and for future reference on this PR in particular 😊

I'll see if I can put together some tests so we can move forward with this and hopefully unblock you in your project!

Copy link
Contributor

@martincostello martincostello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like what I came up with locally while trying to write a workaround. I'll leave @clairernovotny to comment regarding the tests. Maybe it needs some sort of integration test with a (test) server that streams the data back to the client to validate this scenario?

@clairernovotny
Copy link
Member

clairernovotny commented Mar 24, 2020

We already use an Http mocking library for many of the tests, can we add a test where the content-length header isn't set? You can see some examples here: https://github.com/reactiveui/refit/blob/master/Refit.Tests/ResponseTests.cs#L107

@clairernovotny
Copy link
Member

@Sergio0694 any luck on tests for this using the MockHttp client libs we're already using? I'd like to ship a fix with this and #869 asap.

Thanks

@Sergio0694
Copy link
Contributor Author

@clairernovotny Sorry for the delay! I actually hadn't had a chance to work on that new test yet, but I'll take a look right now! Hopefully I'll manage to get it working soon enough, otherwise if we're in a hurry for a fix I might get back to you and ask for more info. Will update you soon 😄

@Sergio0694
Copy link
Contributor Author

@clairernovotny Alright so, I couldn't really get that to work as I wanted, since doing that caused the returned Stream to not be seekable, but with the Length property still working fine, so the serializer was still using the fast path. I might have setup the test wrong, not excluding that possibility 😅

For now, I've added at least a separate test that directly checks the async deserialization method used by the IContentSerializer instance. If we want to test the whole thing together though, I think I'd need some guidance on how to actually get a Stream that throws in that case (that is, other than actually implementing one just to run that one test).

@martincostello
Copy link
Contributor

Maybe you could create a custom Stream derived from a MemoryStream that overrides Length and throws the same exception as the HttpBaseStream does instead?

@Sergio0694
Copy link
Contributor Author

@martincostello Ah, you're right, if I inherit from MemoryStream I should be able to just override that one accessor without having to spend time implementing anything else. I'll try that, thanks! 😄

@clairernovotny
Copy link
Member

You should be able to simulate the end-to-end with the MockHttpMessageHandler.

Here's an example test:

public async Task GetWithNoParametersTestTrailingSlashInBase()
{
var mockHttp = new MockHttpMessageHandler();
mockHttp.Expect(HttpMethod.Get, "http://foo/someendpoint")
.WithExactQueryString("")
.Respond("application/json", "Ok");
var settings = new RefitSettings
{
HttpMessageHandlerFactory = () => mockHttp
};
var fixture = RestService.For<ITrimTrailingForwardSlashApi>("http://foo/", settings);
await fixture.Get();
mockHttp.VerifyNoOutstandingExpectation();
}

So you can set up your request, and respond with the JSON you want and set whatever headers you need to trigger the problematic path. Would that work?

@@ -139,6 +139,43 @@ public async Task WhenProblemDetailsResponseContainsExtensions_ShouldHydrateExte
kvp => Assert.Equal(new KeyValuePair<string, object>(nameof(expectedContent.Baz), expectedContent.Baz), kvp));
}

[Fact]
public async Task WithNonSeekableStream_UsingSystemTextJsonContentSerializer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the bit of the test that triggers the behaviour to go through the path that caused the exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None yet, see my comment below 😄

I think I'll follow your advice and try inheriting from MemoryStream for the content.

@Sergio0694
Copy link
Contributor Author

@clairernovotny Thanks! I've added a test based on that in f676aa8, though unfortunately it's still using the fast path in that case. I'm not sure how exactly to cause the response stream to throw when getting the length, I guess I'll try implementing a new Stream specifically for that as suggested.

@martincostello Since I see you have some experience with the System.Text.Json APIs, would you happen to know why is this test failing by any chance? Not sure I understand what's up with that 🤔

@martincostello
Copy link
Contributor

@martincostello Since I see you have some experience with the System.Text.Json APIs, would you happen to know why is this test failing by any chance? Not sure I understand what's up with that 🤔

Without debugging it myself I'm not sure to be honest - it seems that maybe what's in the stream is wrong for some reasons. I'm not familiar with the existing test setup here, from what I can see it looks like it's just trying to deserialize an object with two strings.

@Sergio0694
Copy link
Contributor Author

@clairernovotny The test is working as expected now and the slow path is triggered correctly after switching to a custom Stream in 6f5a33b. The only remaining issue we have right now is that apparently the slow path is failing on .NET Core 3.1 for some reason, this is what I'm seeing on my machine (and the tests are failing in the CI as well on that target framework):

image

The fast path is always working correctly at least, so there's that.

Upon further inspection, it seems the StreamContent.ReadAsStreamAsync() method on .NET Core 3.1 is returning a non-trimmed array (specifically of length 1024 in this case, and not 127 as the actual valid JSON), I suspect because it's internally pooling arrays. Looks like this is throwing the deserializer off as it keeps reading garbage data after the end of the object. I'll see if there's an option to tell it to just stop deserializing as soon as it reaches the end of the root object 🤔

@martincostello
Copy link
Contributor

Is there something with Spans and Slice() you can do to constrain the serializer to only use the bit of the pooled array we care about?

@Sergio0694
Copy link
Contributor Author

Not in this case unfortunately, as the whole point of this "slow" path was to support streams that don't have a fixed content length 😟

@martincostello
Copy link
Contributor

martincostello commented Mar 31, 2020

So I've had a look at the code in the debugger, and I think the issue with StreamDeserialization_UsingSystemTextJsonContentSerializer under netcoreapp3.1 is due to the implementation of Read() here (or something else in PooledMemoryStream for netstandard2.1):

public override int Read(Span<byte> buffer)
{
if (pooledBuffer is null) ThrowObjectDisposedException();
var source = pooledBuffer.AsSpan(position);
var bytesCopied = Math.Min(source.Length, buffer.Length);
var destination = buffer.Slice(0, bytesCopied);
source.CopyTo(destination);
position += bytesCopied;
return bytesCopied;
}

Running this in the debugger shows that the number of bytes being copied from the buffer is larger than the length of the Stream that was written to, which is causing it to read past the end of JSON and cause the reader to assume the content is corrupt JSON.

image

I'm not sure exactly what the fix is, but I guess the length of the stream needs to be factored into the bounds of the read?

For comparison, I passed the Stream into a StreamReader in the test and called ReadToEnd() on it and it returned the expected JSON text without any trailing garbage.

@martincostello
Copy link
Contributor

martincostello commented Mar 31, 2020

For WithNonSeekableStream_UsingSystemTextJsonContentSerializer(), I can fix it by changing this line:

var source = pooledBuffer.AsSpan(position);

to:

var spanLength = Math.Min(length, bufferSize);
var source = pooledBuffer.AsSpan(position, spanLength);

@martincostello
Copy link
Contributor

For StreamDeserialization_UsingSystemTextJsonContentSerializer, I've fixed it by changing this line of code:

var source = pooledBuffer.AsSpan(position);

to:

if (position >= length) return 0;

var source = pooledBuffer.AsSpan(position, length);

I feel that both might be lacking something for large buffers, but I'm not sure.

However with these two changes, all the existing tests now pass.

@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Mar 31, 2020

Hey @martincostello - thank you for taking the time to look into this! 😄
I've made the changes in 86c18d6 and now all the tests are passing for me. I didn't exactly made the changes you suggested, as eg. here:

var source = pooledBuffer.AsSpan(position, length);

Using length there could throw if position is not 0, as that's the length of the buffer and not just the leftover data. I made some checks to count the number of available bytes and select only the maximum available range each time. You pointed me in the right direction though, so thank you! 😊

@clairernovotny This PR should be good to go now! 🎉

@martincostello
Copy link
Contributor

@Sergio0694 No problem! I figured there would be cases that the "fix" wouldn't cater for that I hadn't considered 😃

@Sergio0694
Copy link
Contributor Author

Ahahah that's what peer programming is for! 🍻
It should be alright in all cases now!

@martincostello
Copy link
Contributor

FYI I've re run the tests for the app whose tests found #865, and with the latest changes from this PR incorporated, they're now all passing again.

@clairernovotny
Copy link
Member

Thanks! does this also addresss #870?

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] NotSupportedException thrown by SystemTextJsonContentSerializer.DeserializeAsync()
3 participants