Skip to content

sh_diropenat(): General improvements#859

Merged
McDutchie merged 1 commit intoksh93:devfrom
JohnoKing:improve-diropenat
May 31, 2025
Merged

sh_diropenat(): General improvements#859
McDutchie merged 1 commit intoksh93:devfrom
JohnoKing:improve-diropenat

Conversation

@JohnoKing
Copy link

  • Avoid duplicating the file descriptor from openat when it's already >9 to increase performance and reliability.
  • Add a feature test which tests the errno resulting from using openat on a non-directory with O_DIRECTORY|O_SEARCH. On illumos (and presumably Solaris) openat can misleadingly fail with EACCES rather than ENOTDIR, so account for that by testing the given directory in question with stat (but since stat is hardly performance-friendly, the feature test is used to ensure this is only compiled in when absolutely necessary).
  • Added a feature test to see if file descriptors obtained via O_SEARCH or O_PATH can actually be used with fchdir. This is also relevant for open because of O_SEARCH's usage in the fallback code in sh_subshell().
  • Ensure the close-on-exec bit is set for the file descriptor returned by sh_diropenat().

These changes allow for the elimination of the double openat and should help mitigate the performance loss for cd's performance as observed after the merge of 3d7b7e1 (assuming the platform supports F_DUPFD_CLOEXEC and O_CLOEXEC):
vide #816 (comment)

Results obtained on Linux amd64 using CCFLAGS=-Os.
Results for typeset -i i; time for((i=0;i<500000;i++));do cd /dev; cd /;done

ksh-no-openat-9b0b2603:
real    0m01.72s
ksh-dev-de530132:
real    0m03.55s
ksh-this-commit:
real    0m02.91s
ksh-this-commit-no-subshell-inode-test: (same due to no subshells)
real    0m02.90s
ksh93u+:
real    0m02.02s

I'll admit cd still isn't as fast as before, yet the trade off is still worthwhile because ksh93 scripts are bound to use subshells (particulary command substitutions) far more often than cd. In the contrived benchmark cd is called 1,000,000 times, which hardly reflects real world scripts. This also doesn't account for the practice of calling cd from within a subshell, which is common.

Results for typeset -i i; time for((i=0;i<500000;i++));do (cd /dev); (cd /);done

ksh-no-openat-9b0b2603:
real    0m06.73s
ksh-dev-de530132:
real    0m07.18s
ksh-this-commit:
real    0m06.34s
ksh-this-commit-no-subshell-inode-test:
real    0m04.77s
ksh93u+:
real    0m05.19s

Here a small uplift can be seen relative to both of the previous 93u+m iterations, but because of the inode sanity check present since 5ee290c cd is still a bit slower in subshells relative to ksh93u+. (Perhaps in a future commit a faster alternative could be looked into.)

Meanwhile the relevant shbench results are about the same (-l 10):

-----------------------------------------------------------------------------------------------------------------------------------
name           ksh-no-openat-9b0b2603  ksh-dev-de530132     ksh-this-commit      ksh-this-commit-no-inode-test  ksh93u+
-----------------------------------------------------------------------------------------------------------------------------------
fibonacci.ksh  0.247 [0.240-0.258]     0.227 [0.225-0.233]  0.235 [0.227-0.241]  0.235 [0.232-0.240]            0.258 [0.251-0.264]
parens.ksh     0.191 [0.185-0.197]     0.097 [0.095-0.098]  0.097 [0.095-0.100]  0.099 [0.095-0.101]            0.227 [0.221-0.234]
-----------------------------------------------------------------------------------------------------------------------------------

- Avoid duplicating the file descriptor from openat when
  it's already >9 to increase performance and reliability.
- Add a feature test which tests the errno resulting from
  using openat on a non-directory with O_DIRECTORY|O_SEARCH.
  On illumos (and presumably Solaris) openat can misleadingly
  fail with EACCES rather than ENOTDIR, so account for that
  by testing the given directory in question with stat(2).
- Add a feature test to see if file descriptors obtained via
  O_SEARCH or O_PATH can actually be used with fchdir(2).
  This is also relevant for open(2) because of O_SEARCH's usage
  in the fallback code in sh_subshell().
- Ensure the close-on-exec bit is set for the file descriptor
  returned by sh_diropenat().

These changes allow for the elimination of the double openat and
helps mitigate the performance loss for cd's performance as
observed after the merge of 3d7b7e1:
vide ksh93#816 (comment)

Results obtained on Linux amd64 using CCFLAGS=-Os.
Results for: 'typeset -i i; time for((i=0;i<500000;i++));do cd /dev; cd /;done'
ksh-no-openat-9b0b2603:
real    0m01.72s
ksh-dev-de530132:
real    0m03.55s
ksh-this-commit:
real    0m02.91s
ksh-this-commit-no-subshell-inode-test: (same due to no subshells)
real    0m02.90s
ksh93u+:
real    0m02.02s

I'll admit cd still isn't as fast as before, yet the trade off is
still worthwhile because ksh93 scripts are bound to use subshells
(particulary command substitutions) far more often than cd. In the
contrived benchmark cd is called 1,000,000 times, which hardly
reflects real world scripts. This also doesn't account for the
practice of calling cd from within a subshell, which is common.

Results for: 'typeset -i i; time for((i=0;i<500000;i++));do (cd /dev); (cd /);done'
ksh-no-openat-9b0b2603:
real    0m06.73s
ksh-dev-de530132:
real    0m07.18s
ksh-this-commit:
real    0m06.34s
ksh-this-commit-no-subshell-inode-test:
real    0m04.77s
ksh93u+:
real    0m05.19s

Here a small uplift can be seen relative to both of the previous
93u+m iterations, but because of the inode sanity check present
since 5ee290c cd is still a bit slower in subshells relative
to ksh93u+. (Perhaps in a future commit a faster alternative
could be looked into.)

Meanwhile the relevant shbench results are about the same (-l 10):
-----------------------------------------------------------------------------------------------------------------------------------
name           ksh-no-openat-9b0b2603  ksh-dev-de530132     ksh-this-commit      ksh-this-commit-no-inode-test  ksh93u+
-----------------------------------------------------------------------------------------------------------------------------------
fibonacci.ksh  0.247 [0.240-0.258]     0.227 [0.225-0.233]  0.235 [0.227-0.241]  0.235 [0.232-0.240]            0.258 [0.251-0.264]
parens.ksh     0.191 [0.185-0.197]     0.097 [0.095-0.098]  0.097 [0.095-0.100]  0.099 [0.095-0.101]            0.227 [0.221-0.234]
-----------------------------------------------------------------------------------------------------------------------------------
@McDutchie
Copy link

Looks good to me. The performance improvement for cd is nice. (Even without it, I agree the performance improvement in subshells is totally worth it.)

@McDutchie McDutchie merged commit d140eac into ksh93:dev May 31, 2025
McDutchie pushed a commit that referenced this pull request May 31, 2025
- Avoid duplicating the file descriptor from openat when
  it's already >9 to increase performance and reliability.
- Add a feature test which tests the errno resulting from
  using openat on a non-directory with O_DIRECTORY|O_SEARCH.
  On illumos (and presumably Solaris) openat can misleadingly
  fail with EACCES rather than ENOTDIR, so account for that
  by testing the given directory in question with stat(2).
- Add a feature test to see if file descriptors obtained via
  O_SEARCH or O_PATH can actually be used with fchdir(2).
  This is also relevant for open(2) because of O_SEARCH's usage
  in the fallback code in sh_subshell().
- Ensure the close-on-exec bit is set for the file descriptor
  returned by sh_diropenat().

These changes allow for the elimination of the double openat and
helps mitigate the performance loss for cd's performance as
observed after the merge of 3d7b7e1:
vide #816 (comment)

Results obtained on Linux amd64 using CCFLAGS=-Os.
Results for: 'typeset -i i; time for((i=0;i<500000;i++));do cd /dev; cd /;done'
ksh-no-openat-9b0b2603:
real    0m01.72s
ksh-dev-de530132:
real    0m03.55s
ksh-this-commit:
real    0m02.91s
ksh-this-commit-no-subshell-inode-test: (same due to no subshells)
real    0m02.90s
ksh93u+:
real    0m02.02s

I'll admit cd still isn't as fast as before, yet the trade off is
still worthwhile because ksh93 scripts are bound to use subshells
(particulary command substitutions) far more often than cd. In the
contrived benchmark cd is called 1,000,000 times, which hardly
reflects real world scripts. This also doesn't account for the
practice of calling cd from within a subshell, which is common.

Results for: 'typeset -i i; time for((i=0;i<500000;i++));do (cd /dev); (cd /);done'
ksh-no-openat-9b0b2603:
real    0m06.73s
ksh-dev-de530132:
real    0m07.18s
ksh-this-commit:
real    0m06.34s
ksh-this-commit-no-subshell-inode-test:
real    0m04.77s
ksh93u+:
real    0m05.19s

Here a small uplift can be seen relative to both of the previous
93u+m iterations, but because of the inode sanity check present
since 5ee290c cd is still a bit slower in subshells relative
to ksh93u+. (Perhaps in a future commit a faster alternative
could be looked into.)

Meanwhile the relevant shbench results are about the same (-l 10):
-----------------------------------------------------------------------------------------------------------------------------------
name           ksh-no-openat-9b0b2603  ksh-dev-de530132     ksh-this-commit      ksh-this-commit-no-inode-test  ksh93u+
-----------------------------------------------------------------------------------------------------------------------------------
fibonacci.ksh  0.247 [0.240-0.258]     0.227 [0.225-0.233]  0.235 [0.227-0.241]  0.235 [0.232-0.240]            0.258 [0.251-0.264]
parens.ksh     0.191 [0.185-0.197]     0.097 [0.095-0.098]  0.097 [0.095-0.100]  0.099 [0.095-0.101]            0.227 [0.221-0.234]
-----------------------------------------------------------------------------------------------------------------------------------
@JohnoKing JohnoKing deleted the improve-diropenat branch May 31, 2025 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants