Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: improve error performance for readSync #50033

Merged
merged 5 commits into from
Oct 21, 2023

Conversation

pluris
Copy link
Contributor

@pluris pluris commented Oct 4, 2023

  • fs.readSync(fd, buffer, offset, length[, position])
  • fs.readSync(fd, buffer[, options])
                                                 confidence improvement accuracy (*)    (**)   (***)
fs/bench-readSync.js n=10000 type='existing'                     3.73 %       ±6.45%  ±8.58% ±11.17%
fs/bench-readSync.js n=10000 type='non-existing'        ***     84.98 %      ±10.30% ±13.80% ±18.14%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 2 comparisons, you can thus expect the following amount of false-positive results:
  0.10 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.02 false positives, when considering a   1% risk acceptance (**, ***),
  0.00 false positives, when considering a 0.1% risk acceptance (***)

Refs: nodejs/performance#106

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. typings labels Oct 4, 2023
@pluris pluris force-pushed the perf/readsync branch 2 times, most recently from d0eaf29 to 5f80eae Compare October 4, 2023 02:31
@mscdex mscdex added the needs-citgm PRs that need a CITGM CI run. label Oct 4, 2023
@pluris pluris changed the title fs: improve performance for readSync fs: improve error performance for readSync Oct 4, 2023
@pluris pluris force-pushed the perf/readsync branch 2 times, most recently from 593d4ba to e757bbe Compare October 4, 2023 02:35
@pluris pluris changed the title fs: improve error performance for readSync fs: improve error performance for readSync Oct 4, 2023
@pluris pluris requested a review from anonrig October 4, 2023 04:40
@mscdex mscdex removed the needs-citgm PRs that need a CITGM CI run. label Oct 4, 2023
@anonrig anonrig added request-ci Add this label to start a Jenkins CI on a PR. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Oct 4, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 4, 2023
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@@ -157,6 +157,7 @@ declare namespace InternalFSBinding {
function read(fd: number, buffer: ArrayBufferView, offset: number, length: number, position: number, req: FSReqCallback<number>): void;
function read(fd: number, buffer: ArrayBufferView, offset: number, length: number, position: number, req: undefined, ctx: FSSyncContext): number;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function read(fd: number, buffer: ArrayBufferView, offset: number, length: number, position: number, req: undefined, ctx: FSSyncContext): number;

case 'offset-and-length':
for (let i = 0; i < n; i++) {
try {
fs.readSync(fd, Buffer.alloc(1), 0, 1, 0);
Copy link
Member

@joyeecheung joyeecheung Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the buffer should be reused, as what's normally done in the wild. Also "reading a file from offset 1 many many times, one character at a time" do not seem to be a very typical use case. It would be better to tweak the benchmark so that it reads a big file in a reused buffer (with a length of fs.statSync(file).blksize) and then read without offset until the file ends, which is the most common way to use this API. We probably do not need an additional case for the offset as reading a file repeatedly from the same offset is probably not common enough in the wild to optimize for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your good comments.
I tried modifying it, but I'm not sure if it worked as you said.
Would it be better to remove paramType and test it by passing only buffer?

try {
fs.readSync(fd, Buffer.alloc(1), 0, 1, 0);
} catch {
// Continue regardless of error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this fail if type is existing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't happen.
I tested only existing separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't happen.
I tested only existing separately.

Do you mean you only run this benchmark with existing? But this would still be part of the benchmark when e.g. the entire fs benchmark is run. So we should still avoid invalid/redundant combinations to prevent the benchmark from spending too much time on those and increasing the total runtime of the benchmark.

For now it seems the offset-and-length case is redundant, the no-offset-and-length case is already enough. The ultimate difference between 'offset-and-length' and 'no-offset-and-length' actually comes down to the position parameter - both actually would have the same offset and length after argument handling. But 'offset-and-length' would have the position set to 0 while the 'no-offset-and-length' would use the default position null (which means it will continue to read from the current position). I don't think this difference is really worth a separate benchmark because by testing them both we are only going to see how the underlying file system and the actual disk perform in difference cases after we remove what's in Node.js's control, but that isn't really something we need to care about. I think we can just keep the offset-and-length one and remove no-offset-and-length.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark is performed on non-existing and existing.
What I meant was that I separately confirmed that existing did not fail.
no-offset-and-length was removed.

@pluris
Copy link
Contributor Author

pluris commented Oct 4, 2023

Benchmark has been updated.

@aduh95
Copy link
Contributor

aduh95 commented Oct 6, 2023

Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1440

                                                 confidence improvement accuracy (*)    (**)   (***)
fs/bench-readSync.js n=10000 type='existing'                     1.60 %       ±7.67% ±10.21% ±13.31%
fs/bench-readSync.js n=10000 type='non-existing'        ***     77.28 %      ±10.83% ±14.47% ±18.96%

@anonrig anonrig added the request-ci Add this label to start a Jenkins CI on a PR. label Oct 10, 2023
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Oct 10, 2023
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@anonrig
Copy link
Member

anonrig commented Oct 14, 2023

@pluris Can you rebase and force push? This issue is fixed in main branch.

@aduh95
Copy link
Contributor

aduh95 commented Oct 14, 2023

@anonrig I don't think a rebase is necessary, you just need to restart a full CI, and the CI will rebase the PR commits on top of the base branch.

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@anonrig anonrig added commit-queue Add this label to land a pull request using GitHub Actions. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. labels Oct 21, 2023
@nodejs-github-bot nodejs-github-bot removed the commit-queue Add this label to land a pull request using GitHub Actions. label Oct 21, 2023
@nodejs-github-bot nodejs-github-bot merged commit 506858b into nodejs:main Oct 21, 2023
56 checks passed
@nodejs-github-bot
Copy link
Collaborator

Landed in 506858b

targos pushed a commit that referenced this pull request Oct 23, 2023
PR-URL: #50033
Refs: nodejs/performance#106
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
alexfernandez pushed a commit to alexfernandez/node that referenced this pull request Nov 1, 2023
PR-URL: nodejs#50033
Refs: nodejs/performance#106
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
@pluris pluris deleted the perf/readsync branch November 8, 2023 02:24
targos pushed a commit that referenced this pull request Nov 11, 2023
PR-URL: #50033
Refs: nodejs/performance#106
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. commit-queue-squash Add this label to instruct the Commit Queue to squash all the PR commits into the first one. fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. typings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants