New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate ClassifiedInputStream in favour of InputStream. #1056
Eliminate ClassifiedInputStream in favour of InputStream. #1056
Conversation
Will try to carve out some time tomorrow to look at it. I agree: we should measure a few runs of streaming across the external boundary before and after this PR and get a sense for the hit we're taking before we land. |
@thegedge - I was experimenting a bit with timing between two externals, and noticed that this fix seems to cause that scenario to break. If you're on a Linux machine and run |
I'm not sure if related, but I noticed we also have the same behavior today in some internal->external pipelines: #1057 |
Ah yes, I completely forgot to feed the I'll take care of that over the next few days, and also some measurements.
In making this work for external → external, I'll actually kill two birds with one stone because I'll also make it work for internal → external 🎉 I think that's a big enough win that a small perf hit (for now) is acceptable. |
This is still a bit of a WIP, but I'm looking for some early feedback. Focal points:
|
I noted a few things. In general, I'm +1 on getting it to the point we can benchmark and then make the call. I'm also eager to work out exactly how $it will work in the future, but that's outside of the scope of this PR. I'd prefer to land this first, and then iterate on the design for external $it in a follow-up PR. "but ... | ^foo $it is significantly different than ... | ^foo" <-- very much agreed. I think what you have for this is good for now. I agree we should go ahead and separate them. |
I couldn't really get a proper benchmark set up with the code, so I used my own profiling tool to capture a bunch of samples and emit average/stddev. Here are the results for running release builds, 50 samples: ^echo 1 | catthis branch
master
^cat README.md | catthis branch
master
^cat test.md | cattest.md is 59 (copies of README.md) this branch
master
Yeah, I double checked those, ran the script several times. Apparently this branch is faster than connecting the pipes directly when there's a small amount of stuff flowing through them. Larger files are where we take a hit. |
Ignoring the CI failures, @jonathandturner, what do you think about moving this branch forward? Given the huge perf hit we'd get for larger files, should I continue working on this, or focus my efforts elsewhere? Maybe @wycats has some thoughts, since this does unify internal/external, which was one goal he mentioned he's been working on. If we're 👍, I can take care of some of the |
@thegedge - I say let's go for it. Simplifying things seems good, and we can optimize multiple external commands in a row into a single unified external pipeline for performance later. |
This is ready for review. I'm just trying to understand the CI failure:
It's confusing because I can't tell why it's different from what we do in |
Looks like I may have messed up a rebase and brought back some |
Looks good! Thanks |
* DE translation for custom_completions.md * Typos and better wording in DE translation
This PR brings external commands closer to internal commands by returning an
InputStream
instead of the specializedClassifiedInputStream
. Although we now gain the ability to pipe internal commands into external commands,external | external
will have significantly worse performance when lots of data is flowing over the pipe (we plan to address this in the future).Closes #1057
Closes #928
Closes #935 (#778 has more discussion around the same topic)
Closes #887
Closes #816 (I think?)
Closes #489
Closes #412
I think I got them all 😅
What to focus on
Besides the usual, I've had to rebase and deal with a lot of merge conflicts along the way. Making sure I haven't accidentally dropped some of the other changes that have been made would be 👍