Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the local builtin with support for dynamic local scopes #703

Open
wants to merge 55 commits into
base: dev
Choose a base branch
from

Conversation

JohnoKing
Copy link

@JohnoKing JohnoKing commented Jan 17, 2024

This is the eventual result of on/off work on the local builtin planned for inclusion in 1.1. (A declare builtin has also been added because it's a simple addtion when local is present.) Initially a port of ksh93v-'s code, this builtin was rewritten to implement local scoping in POSIX functions (i.e. name() { syntax). It's still experimental and may have bugs, but it's now stable enough to run the ksh and modernish regression test suites without failing.

Changes:

  • Implemented local and declare builtins, which both default to using dynamic local scoping regardless of the type of function. local cannot be used outside of functions, unlike declare which has no such requirement.

  • Added typeset -D, which allows typeset to use dynamic local scoping.

  • Added typeset -P/local -P, which allows for the use of static local scoping in both POSIX and KornShell functions (the flag used will likely be changed).

  • Completely overhauled POSIX function handling to run them from sh_funscope() instead of b_dot_cmd(). This made the implementation of local scoping much easier and also inadvertently fixed an out-of-bounds bug without adding a costly malloc call (cf. Array 'argv[0]' accessed at index -1, which is out of bounds att/ast#1423). (Note that KornShell functions (function name { syntax) that are executed via the . command will ignore scoping.) At present POSIX functions still use sh.dot_depth for measuring function depth (subject to change).

  • b_dot_cmd(): Removed DOTMAX, as it's just a duplicate of MAXDEPTH.

  • b_dot_cmd(): Removed superfluous double setting of sh.st.lineno to 1.

  • Removed the (surprisingly useless) sh.posix_fun pointer and implemented a more versatile sh.infunction variable for identifying the current running function type, alongside helper FUN_POSIX, FUN_KSH and FUN_KSHDOT macros.

  • For scoping purposes, the NV_DYNAMIC and NV_STATSCOPE bitflags have been added (there's also NV_SCOPES, which is a macro for (NV_GLOBAL|NV_DYNAMIC|NV_STATSCOPE).

  • To work around Changing size of nvflag in struct Namval breaks things att/ast#1038, np->dynscope has been added. When set, it is used by the scanfilter() (see also nv_cover()) and nv_open() functions to enable dynamic scoping functionality. (See the comment in sh_funscope() for more information.)

  • Added some (incomplete) scoping documentation to typeset --man.

  • Various updates to the regression tests, including the addition of a large set of scoping regression tests at ./src/cmd/ksh93/tests/local.sh. (The copyright date does accurately represent the age of that script. This pull request has been stuck in development hell for a LONG time.)

TODO List:

This implementation of local doesn't have proper scoping.
The POSIX function scoping mechanism should be implemented
in b_dot_cmd, although it should be done without breaking
backward compatibility (i.e., typeset's behavior in POSIX
functions shouldn't change as a result of implementing
scoping).

Discussion: ksh93#123

src/cmd/ksh93/sh/name.c:
- Set a dynamic scope in ksh functions for the local and
  declare builtins. This currently doesn't work in POSIX
  functions since b_dot_cmd doesn't apply any scoping
  right now.

src/cmd/ksh93/bltins/typeset.c,
src/cmd/ksh93/sh/xec.c:
- Only allow the local builtin to run inside of functions. This
  is accomplished by using a new 'SH_INFUNCTION' state set when
  ksh is inside of a function.
- Set shp->st.var_local to shp->var_tree for declare and local
  to get a dynamic scope in ksh functions. For typeset use
  shp->var_base to get a static scope in ksh functions.
  This will need bug-testing for when local is used in
  namespaces.
- Add local and declare to top comment alongside other builtins.
- Add a dictionary generator entry for declare(1).

src/cmd/ksh93/data/builtins.c,
src/cmd/ksh93/include/builtins.h:
- Add local and declare to the builtin table as declaration
  builtins. Depending on how local is specified in the next
  POSIX standard, it may need to be changed to a special builtin.
  That does break backward compatibility with existing ksh
  scripts though (such as one regression test script), so for now
  it's just a regular builtin.
- Update the man pages with references to local(1). Add
  information to the man page about the intended scoping
  of declare and local, although it's not completely
  implemented in POSIX functions.

src/cmd/ksh93/tests/builtins.sh:
- Run one of the regression tests for builtins inside of a
  function to fix a test failure with the local builtin.

src/cmd/ksh93/tests/local.sh:
- Add a number of regression tests mostly targeted at the
  local builtin. Most of the bugs tested for here are present
  in the ksh93v- and ksh2020 implementations of local and
  declare.
POSIX functions should default to placing variables on the global scope,
so this should be perfectly fine.
I suspect that the only reason dynamic scoping even somewhat works is
not because an actual scope was opened, but rather because the child
functions use the parent function's scope outright. This is no good and
will need a rewrite.
See the lengthy comment in xec.c for a detailed explanation.
Bear in mind that there are still three test failures in the modernish
test suite, which are probably cause by switching to sh_funscope() for
POSIX function handling:
  001: push;set;check;send sig;unset;pop;check  - FAIL: check traps, test var=$(trap)
  007: trap stack in a subshell                 - FAIL: wrong output from 'trap' (3)
  008: 'trap' can ignore sig if no stack traps  - FAIL: not ignored while no stack traps
These will need to be fixed at some point.
List of changes:
- A -c flag has been added to typeset that allows you to use statically
  scoped variables in POSIX functions.
- A noteable caveat regarding typeset -p's interactions with the new
  scoping functionality has been documented.
- The regression tests have been overhauled to account for the special
  handling of 'command typeset' in sh_exec(). More were also added for
  the new 'typeset -c' feature.
- The two nv_scan calls have been rolled into one faster call, which
  should have equivalent functionality (cf. the code for the
  scanfilter() function).
- Increment copyright dates using the update_copyright script from the
  wiki.
@dannyweldon
Copy link

local cannot be used inside of functions

Did you actually mean "local cannot be used outside of functions"?

@JohnoKing
Copy link
Author

Whoops, that was a typo. It has been fixed.

@McDutchie
Copy link

Thank you very much for all your work on this! I will have to take some time to fully understand the code.

Unfortunately, I already found a bug in dynamic scoping. Test script:

fun1() { local foo=set_by_fun1; fun2; echo $foo; }
fun2() { foo=changed_by_fun2; }
fun1

Expected output: changed_by_fun2; actual output: set_by_fun1.

@JohnoKing
Copy link
Author

That bug is caused by a limitation in the current scoping code, which interacts rather poorly with NV_NOSCOPE (the current code just copies the variable from the parent function and otherwise lacks a viewport to that scope). I'm considering reimplementing the ksh93v- sh.st.var_local pointer as a possible fix for that, but so far all of my attempts in that direction break static scoping in one way or another.

- local.sh: Added tests for the presence of the -D flag in output.
- sh.1: Removed this comment as this behavior is subject to change.
- xec.c: Update comment (the ksh93v- code is something I'm trying to
  salvage, but it certainly does seem unsalvageable.)
McDutchie added a commit to modernish/modernish that referenced this pull request Jan 19, 2024
McDutchie added a commit to modernish/modernish that referenced this pull request Jan 19, 2024
@ormaaj
Copy link

ormaaj commented Jan 21, 2024

Dynamic local?! I must be dreaming. :D

edit: aah and even works in posix functions! nice.

On a side note, I haven't had the chance develop local(1) further
because of a lack of time. In any case the current scoping bug is
evading a fix. The current approach needs significant changes,
as it has no backward compatibility when POSIX functions
are nested in ksh functions (the normal ksh93 behavior is to have
POSIX functions inherit the ksh function's full scope and use it
directly, which necessarily includes the ksh function's statically
scoped local variables. The current approach doesn't allow for this
as POSIX functions have their own scope for the local builtin. Backward
compatibility is currently achieved by having typeset modify the global
scope directly, but that's not really the correct thing to do. I have
tried altering typeset to modify the parent ksh function's scope
directly, but such changes cannot be seen in the POSIX function itself
after the variable in the parent function was changed, as the change
only happens in the parent scope and not the current local scope (which
defeats the whole purpose). To attempt to fix that, I then tried linking
the parent scope to the local scope, but so far that just breaks the
local scope (the cdt API lends itself very poorly to dynamic scoping).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants