Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using System.IO.Pipelines #2035

Closed
roji opened this issue Jul 3, 2018 · 18 comments
Closed

Investigate using System.IO.Pipelines #2035

roji opened this issue Jul 3, 2018 · 18 comments

Comments

@roji
Copy link
Member

roji commented Jul 3, 2018

A great intro is Marc Gravell's post series: part 1, part 2.

Very related to #1900 (switching to Span).

Notes/caveats:

  • We need to allow random-access into each row (seeking back to previous columns, re-reading the same column). By its very design, pipelines is about consuming read data and releasing buffers back, so this may not be a good fit... It's probably possible to keep the row buffered by not advancing - get a span each time, then return it without advancing (so this is probably not an issue).
  • No stateful API for reading/writing to Span, making code really ugly (constant slicing). https://github.com/dotnet/corefx/issues/24180.
  • No means for tracking the number of bytes written, in order to ensure we don't buffer too much (https://github.com/dotnet/corefx/issues/31074).
  • There is no sync I/O option. What do we do with the sync ADO.NET APIs? Sync-over-async? Throw? See https://github.com/dotnet/corefx/issues/30822. Current best option seems to be sync-over-async with a special thread pool to avoid deadlocking. This would still incur a performance penalty though, as sync simply is faster.
@roji roji added this to the 4.1 milestone Jul 3, 2018
@roji roji self-assigned this Jul 3, 2018
@YohDeadfall
Copy link
Contributor

Just await async operations. It's a common practice.

@roji
Copy link
Member Author

roji commented Jul 3, 2018

Just await async operations. It's a common practice.

Can you explain? Do you mean implement sync APIs by calling Wait() on async underlying operations? This would be sync-over-async, and is very deadlock-prone.

See https://github.com/dotnet/corefx/issues/30822 for a question on what to do with this...

@john2014
Copy link

john2014 commented Jul 4, 2018

Can you explain?

Here is a simple example:

public async Task GetGWB()
{
HttpClient hc = new HttpClient();
await hc.GetAsync("http://www.google.com/");
return true;
}

More examples: https://www.google.com/search?q=asp.net+async+await+example

@roji
Copy link
Member Author

roji commented Jul 4, 2018

@john2014 that example is simple an async function awaiting on another async function. My question is what to do with the synchronous ADO.NET API, in other words how to implement stuff like DbConnection.Open() and DbCommand.ExecuteReader(). If under the hood that function invokes async APIs (because piplines don't have sync APIs), you may get deadlocks as a sync TP thread synchronously blocks on the result of an async operation which requires a TP thread to complete.

@john2014
Copy link

john2014 commented Jul 5, 2018

My suggestion would be see how System.Data.Sqlclient handles this scenario.

Github page: https://github.com/dotnet/corefx/tree/master/src/System.Data.SqlClient

Pipeline: https://github.com/dotnet/corefx/issues?q=is%3Aissue+is%3Aopen+pipeline+label%3Aarea-System.IO.Pipelines

@roji
Copy link
Member Author

roji commented Jul 5, 2018

@john2014 any particular reason to look at SqlClient for inspiration? That driver is already significantly slower than Npgsql, and as far as I know there are no plans whatsoever to integrate pipelines into it.

@roji
Copy link
Member Author

roji commented Jul 5, 2018

More to the point for your comment, SqlClient passes a flag down the stack which controls whether actual I/O will happen synchronously or asynchronously, and this is exactly the way Npgsql currently works. The problem is with the idea of switching to use the new pipelines API, which does not have a sync API surface at all. So I don't see how SqlClient could be relevant here.

@john2014
Copy link

john2014 commented Jul 5, 2018

Oops, after posting this I noticed that you have already posted this in corefx github page and are currently having a discussion with david fowler and others.

any particular reason to look at SqlClient for inspiration?

I assumed that Microsoft would be implementing Pipelines in their system.data.sqlclient library (since its a pretty good fit). Since they have decided not to implement pipeline, then it is definitely not relevant here 😄 . Sorry for the confusion.

@austindrenski
Copy link
Contributor

Another overview from the msdn blogs.

@roji
Copy link
Member Author

roji commented Jul 9, 2018

Thanks @austindrenski! I've already started experimenting with pipelines, as expected it's a huge change in the entire source code and I'm still trying to evaluate the exact value vs. cost etc.

@austindrenski
Copy link
Contributor

@roji Do you have a branch tracking that work? I'm excited to see how things look with the new paradigm.

@roji
Copy link
Member Author

roji commented Jul 15, 2018

I'll clean up and push my branch to GitHub tomorrow but don't expect too much - it's really WIP and very broken.

I'm still on the fence on the actual value pipelines can bring Npgsql vs. the complications they introduce... You can see some comments I added today at the top of this issue.

But it's a very interesting investigation...

@roji
Copy link
Member Author

roji commented Jul 16, 2018

@austindrenski take a look at https://github.com/roji/Npgsql/tree/span, it doesn't even build correctly at the moment but I don't have time at the moment to clean it up. But it should give a good idea what things would look like.

The first commit only converts the simple type handlers (small, fixed-length) to use Span, as part of #1900 (no pipelines). I feel there's real definitely value in #1900 for us. However, once you start thinking about how to integrate Span for non-simple type handlers (which read and write arbitrarily long values) you basically find yourself needing an API such as pipelines (although again, I'm not yet sure that pipelines is the right one here).

So beyond the first commit is pipelines work. I've only attacked the write side of things, and even that's incomplete.

@austindrenski
Copy link
Contributor

Mostly applies to #1848, but includes some discussion of pipelines as well:

Understanding the Whys, Whats, and Whens of ValueTask

@roji roji modified the milestones: 4.1, 5.0 Aug 26, 2019
@roji roji modified the milestones: 5.0, 6.0.0 Sep 17, 2020
@roji roji assigned YohDeadfall and unassigned roji Sep 17, 2020
@roji
Copy link
Member Author

roji commented Sep 17, 2020

Assigning to @YohDeadfall who is working on similar ideas as part of the type handler redo for 6.0.

@roji roji removed this from the vNext milestone Apr 8, 2021
@roji roji added this to the Backlog milestone Apr 8, 2021
@NinoFloris
Copy link
Member

Experiments were done in https://github.com/NinoFloris/Woodstar and https://github.com/NinoFloris/Slon

Our conclusions have been that while the abstraction would fit Npgsql for writing, using its reading apis would be a pain due to the chattyness of the ADO api surface, requiring either continuous unpacking of the ROSeq or creating elaborate handwritten statemachines. We decided that for it to be remotely workable an ArraySequenceReader would be necessary, allowing our current position and data to be stored in a field on a heap allocated object.

Lastly we'd have to consider the impact on synchronous reading, as pipelines has no sync apis. I worked around this in Slon by issuing a zero byte read against the underlying stream before calling the async apis on the pipereader. This won't work on TFMs that don't have zero byte read support like .net framework.

@NinoFloris NinoFloris closed this as not planned Won't fix, can't repro, duplicate, stale Aug 15, 2023
@bgrainger
Copy link
Contributor

There is no sync I/O option. What do we do with the sync ADO.NET APIs?

Lastly we'd have to consider the impact on synchronous reading, as pipelines has no sync apis.

I've wanted to experiment with Pipelines in MySqlConnector, but the lack of a sync option has always been a major blocker. (I suppose that if a viable approach hasn't materialised in the last five years, it's never going to happen?)

@YohDeadfall
Copy link
Contributor

Almost yes, Microsoft is very conservative at adding new APIs. There must be a very strong reason to get that, and taking into account that sync paths for IO is evil, I won't expect that happening ever.

@YohDeadfall YohDeadfall removed their assignment Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants