-
-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed System.Text.Json deserialization on streamed data #866
Fixed System.Text.Json deserialization on streamed data #866
Conversation
|
||
try | ||
{ | ||
var length = await stream.ReadAsync(buffer, 0, buffer.Length).ConfigureAwait(false); | ||
streamLength = (int)stream.Length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we can do this without throwing an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like HttpBaseStream
is internal, so we couldn't do a stream is HttpBaseStream
check either.
The only other thing I can think of is to check the type name of the stream in some way, though that's a bit bleurgh and brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that'd be better, agreed. As I've mentioned though, for some reason the CanSeek
property doesn't always match that behavior, even though technically (according to the docs), it should. My rationale here in writing that workaround was was:
- A
try/catch
block (as far as I know) doesn't add (much) overhead if an exception is not thrown. Since I'd expect that to be the most common case, it should be fine. - If that fails, the additional overhead should still not matter much, as in that branch the code would fallback to the slower API using an input
Stream
, which would hurt performance on its own compared to the faster branch with the pooled array.
I'm open to suggestions here, not sure either solution is "perfect" 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the code more, it looks like, in this case at least, CanSeek
is false
, so the access on Length
could be guarded by if (CanSeek)
?
The exception could also maybe by filtered like catch (NotSupportedException) when (stream.CanSeek)
so it isn't unconditionally swallowing the exception?
Are there scenarios where that wouldn't work? It might be that to ensure the method reasonably always succeeds, we might have to eat the performance benefit and go through the slow path if Length
isn't guaranteed to work in all cases.
For what it's worth, the slow path for System.Text.Json seems to still be more efficient that Newtonsoft.Json anyway even if it's not as fast as with the pooling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martincostello Accessing Length
inside an if (CanSeek)
check was the original implementation I tried, but then I noticed that as you said, CanSeek
was false
even when the Length
property was actually accessed just fine. The main point is that a try
block itself doesn't add overhead, in fact it's added implicitly by the C# compiler in each using
block, which is used pretty often even in high performance code (and it's used even in that very same DeserializeAsync<T>
method anyway, when getting the input stream. The slowdown would only be when catching the exception, but then again, we don't have a reliable way to avoid that since the CanSeek
property can sometimes report an incorrect value.
Same goes for a filtered catch, since we're only retrieving the length in that try/finally
block, regardless of the underlying reason for that exception, if we fail to get that it means that at this point we have to fallback to the Stream
overload.
With all these things considered, I'm not really seeing why we should just prevent the fast path from ever happening, considering that at least from what I can see here, it's being taken and working properly pretty often (eg. it's used in all the current unit tests).
Note that the point of System.Text.Json
is not so much the speed improvement (which was about 30% in my tests), but the reduced memory usage (see here).
Personally I'd prefer to keep the current solution/workaround, at least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m happy with the PR as it is, but that’s not my call. Was just spit-balling on a possible alternative.
However without a fix of some kind I’ll have to go back to using a custom version rather than this new built-in one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah absolutely, I wasn't arguing with you here, just trying to provide as much feedback and details as possible both for you and for future reference on this PR in particular 😊
I'll see if I can put together some tests so we can move forward with this and hopefully unblock you in your project!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like what I came up with locally while trying to write a workaround. I'll leave @clairernovotny to comment regarding the tests. Maybe it needs some sort of integration test with a (test) server that streams the data back to the client to validate this scenario?
We already use an Http mocking library for many of the tests, can we add a test where the content-length header isn't set? You can see some examples here: https://github.com/reactiveui/refit/blob/master/Refit.Tests/ResponseTests.cs#L107 |
@Sergio0694 any luck on tests for this using the MockHttp client libs we're already using? I'd like to ship a fix with this and #869 asap. Thanks |
@clairernovotny Sorry for the delay! I actually hadn't had a chance to work on that new test yet, but I'll take a look right now! Hopefully I'll manage to get it working soon enough, otherwise if we're in a hurry for a fix I might get back to you and ask for more info. Will update you soon 😄 |
@clairernovotny Alright so, I couldn't really get that to work as I wanted, since doing that caused the returned For now, I've added at least a separate test that directly checks the async deserialization method used by the |
Maybe you could create a custom Stream derived from a MemoryStream that overrides Length and throws the same exception as the HttpBaseStream does instead? |
@martincostello Ah, you're right, if I inherit from |
You should be able to simulate the end-to-end with the Here's an example test: refit/Refit.Tests/RestService.cs Lines 322 to 337 in ef4eb59
So you can set up your request, and respond with the JSON you want and set whatever headers you need to trigger the problematic path. Would that work? |
@@ -139,6 +139,43 @@ public async Task WhenProblemDetailsResponseContainsExtensions_ShouldHydrateExte | |||
kvp => Assert.Equal(new KeyValuePair<string, object>(nameof(expectedContent.Baz), expectedContent.Baz), kvp)); | |||
} | |||
|
|||
[Fact] | |||
public async Task WithNonSeekableStream_UsingSystemTextJsonContentSerializer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the bit of the test that triggers the behaviour to go through the path that caused the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None yet, see my comment below 😄
I think I'll follow your advice and try inheriting from MemoryStream
for the content.
@clairernovotny Thanks! I've added a test based on that in f676aa8, though unfortunately it's still using the fast path in that case. I'm not sure how exactly to cause the response stream to throw when getting the length, I guess I'll try implementing a new @martincostello Since I see you have some experience with the |
Without debugging it myself I'm not sure to be honest - it seems that maybe what's in the stream is wrong for some reasons. I'm not familiar with the existing test setup here, from what I can see it looks like it's just trying to deserialize an object with two strings. |
@clairernovotny The test is working as expected now and the slow path is triggered correctly after switching to a custom The fast path is always working correctly at least, so there's that. Upon further inspection, it seems the |
Is there something with Spans and |
Not in this case unfortunately, as the whole point of this "slow" path was to support streams that don't have a fixed content length 😟 |
So I've had a look at the code in the debugger, and I think the issue with refit/Refit/Buffers/PooledBufferWriter.Stream.NETStandard21.cs Lines 53 to 68 in c2e2c92
Running this in the debugger shows that the number of bytes being copied from the buffer is larger than the length of the Stream that was written to, which is causing it to read past the end of JSON and cause the reader to assume the content is corrupt JSON. I'm not sure exactly what the fix is, but I guess the length of the stream needs to be factored into the bounds of the read? For comparison, I passed the |
For
to: var spanLength = Math.Min(length, bufferSize);
var source = pooledBuffer.AsSpan(position, spanLength); |
For
to: if (position >= length) return 0;
var source = pooledBuffer.AsSpan(position, length); I feel that both might be lacking something for large buffers, but I'm not sure. However with these two changes, all the existing tests now pass. |
Hey @martincostello - thank you for taking the time to look into this! 😄 var source = pooledBuffer.AsSpan(position, length); Using @clairernovotny This PR should be good to go now! 🎉 |
@Sergio0694 No problem! I figured there would be cases that the "fix" wouldn't cater for that I hadn't considered 😃 |
Ahahah that's what peer programming is for! 🍻 |
FYI I've re run the tests for the app whose tests found #865, and with the latest changes from this PR incorporated, they're now all passing again. |
Thanks! does this also addresss #870? |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
What kind of change does this PR introduce?
Fixes #865.
What is the current behavior?
The
SystemTextJsonContentSerializer.DeserializeAsync<T>(HttpContent content)
API throws aNotSupportedException
on contents that don't allow theStream.Length
property to be accessed.What is the new behavior?
That method doesn't crash anymore in those cases.
What might this PR break?
Nothing.
Please check if the PR fulfills these requirements
Docs have been added / updated (for bug fixes / features)Other information:
cc. @martincostello and @clairernovotny.
I've added a fix but I'm not 100% convinced about the workaround. As mentioned in the comment, I noticed that in some cases some
Stream
-s will haveCanSeek
returnfalse
even though they still let you access theLength
property. In fact, all the unit tests were passing with the original implementation despite some of theStream
-s there not supporting seeking.Thoughts on the current solution for this and/or suggestions are welcome 😊