Permalink
Browse files

Parse the setup.shl project, and fix error message.

Location information wasn't propagated when a word inside an array
literal couldn't be parsed.

Also:

- test/wild.sh: Change the location of language projects
- OSH manual: Fill out the section on Parsing OSH vs. sh/bash.
  • Loading branch information...
Andy Chu
Andy Chu committed Oct 26, 2017
1 parent 6f83e3a commit bcd18e4fb0ba606cee6d13ca6ed30a9b8436e655
Showing with 113 additions and 24 deletions.
  1. +85 −17 doc/osh-manual.md
  2. +12 −0 osh/cmd_parse_test.py
  3. +3 −1 osh/word_parse.py
  4. +13 −6 test/wild.sh
View
@@ -1,6 +1,91 @@
OSH Reference Manual
--------------------
NOTE: This Document is in Progress.
## Parsing OSH vs. sh/bash
(NOTE: This section should encompass all the failures from the [wild tests](http://oilshell.org/cross-ref.html?tag=wild-test#wild-test).)
OSH is meant to run all POSIX shell programs and almost all bash
programs. But it's also designed to be more strict -- i.e. it's [statically
parsed](http://www.oilshell.org/blog/2016/10/22.html) rather than dynamically
parsed.
Here is a list of differences from bash:
(1) **Array indexes that are strings should be quoted** (with either single or
double quotes).
NO:
"${SETUP_STATE[$err.cmd]}"
YES:
"${SETUP_STATE["$err.cmd"]}"
The period causes an ambiguity with respect to regular arrays vs. associative
arrays. See [Parsing Bash is Undecidable](http://www.oilshell.org/blog/2016/10/20.html).
(2) **Assignments can't have redirects.**
NO:
x=abc >out.txt
x=${y} >out.txt
x=$((1 + 2)) >out.txt
# This is the only one that makes sense (can result in a non-empty file),
# but is still disallowed.
x=$(echo hi) >out.txt
YES:
x=$(echo hi >out.txt)
(3) **Variable names must be static** -- they can't be variables themselves.
NO:
declare "$1"=abc
YES:
declare x=abc
NOTE: It would be possible to allow this. However in the Oil language, the
two constructs will have different syntax. For example, `x = 'abc'` vs.
`setvar($1, 'abc')`.
(4) **Disambiguating Arith Sub vs. Command Sub+Subshell**
NO:
$((cd / && ls))
YES:
$( (cd / && ls) ) # This is valid but usually doesn't make sense.
# Because () means subshell, not grouping.
$({ cd / && ls; }) # {} means grouping. Note trailing ;
$(cd / && ls)
Unlike bash, `$((` is always starts an arith sub. `$( (echo hi) )` is a
subshell inside a command sub. (This construct should be written `({ echo
hi;})` anyway.
(5) **Disambiguating Extended Glob vs. Negation of Expression**
- `[[ !(a == a) ]]` is always an extended glob.
- `[[ ! (a == a) ]]` is the negation of an equality test.
- In bash the rules are much more complicated, and depend on `shopt -s
extglob`. That flag is a no-op in OSH. OSH avoids dynamic parsing, while
bash does it in many places.
## set builtin
### errexit
@@ -33,22 +118,6 @@ Very good articles on bash errexit:
- http://mywiki.wooledge.org/BashFAQ/105
- http://fvue.nl/wiki/Bash:_Error_handling
## Notable Gotchas in Parsing
Arith Sub vs. Command Sub:
- Unlike bash, `$((` is always starts an arith sub. `$( (echo hi) )` is a
subshell inside a command sub. (This construct should be written
`({ echo hi;})` anyway.
Extended Glob vs. Negation of Expression:
- `[[ !(a == a) ]]` is always an extended glob.
- `[[ ! (a == a) ]]` is the negation of an equality test.
- In bash the rules are much more complicated, and depend on `shopt -s
extglob`. That flag is a no-op in OSH. OSH avoids dynamic parsing, while
bash does it in many places.
## Unicode
Encoding of programs should be utf-8.
@@ -60,7 +129,6 @@ But those programs can manipulate data in ANY encoding?
vs literal unicode vs. `echo -e`. `$''` is preferred because it's statically
parsed.
List of operations that are Unicode-aware:
- ${#s} -- number of characters in a string
View
@@ -1173,6 +1173,18 @@ def testBacktickCommentHack(self):
-subj "/CN=*.${SS_HOSTNAME}/"
""")
def testArrayLiteralFromSetup(self):
# Found in setup.shl/bin/setup -- this is the "Parsing Bash is
# Undecidable" problem.
err = _assertParseCommandListError(self, """\
errcmd=( "${SETUP_STATE[$err.cmd]}" )
""")
# Double quotes fix it.
node = assertParseCommandList(self, r"""\
errcmd=( "${SETUP_STATE["$err.cmd"]}" )
""")
class ErrorLocationsTest(unittest.TestCase):
View
@@ -921,6 +921,7 @@ def _ReadArrayLiteralPart(self):
while True:
w = w_parser.ReadWord(LexMode.OUTER)
if not w:
self.error_stack.extend(w_parser.Error())
return None
if w.tag == word_e.TokenWord:
@@ -931,7 +932,8 @@ def _ReadArrayLiteralPart(self):
elif word_id == Id.Op_Newline:
continue
else:
self.AddErrorContext('Unexpected word in array literal: %s', w, word=w)
self.AddErrorContext(
'Unexpected word in array literal: %s', w, word=w)
return None
words.append(w)
View
@@ -151,8 +151,8 @@ all-manifests() {
_sh-manifest ~/git/other/posixcube shell
# Shells themselves
_sh-manifest ~/git/other/ast shell # korn shell stuff
_sh-manifest ~/git/other/mwc-sh shell
_sh-manifest ~/git/languages/ast shell # korn shell stuff
_sh-manifest ~/git/languages/mwc-sh shell
_sh-manifest ~/src/mksh shell
#
@@ -231,6 +231,13 @@ all-manifests() {
_sh-manifest ~/git/wild/esoteric/wwwoosh esoteric
_sh-manifest ~/git/wild/esoteric/lishp esoteric
src=~/git/wild/esoteric/setup.shl
_manifest esoteric/setup.shl $src \
$(find $src \
-type f -a \
'(' -name '*.shl' -o -name setup -o -name Setup ')' -a \
-printf '%P\n')
src=~/git/wild/esoteric/mal/bash
_manifest esoteric/make-a-lisp-bash $src \
$(find $src '(' -name '*.sh' ')' -a -printf '%P\n')
@@ -267,11 +274,11 @@ all-manifests() {
# Other Languages
#
_sh-manifest ~/git/other/julia
_sh-manifest ~/git/other/reason
_sh-manifest ~/git/other/sdk # Dart SDK?
_sh-manifest ~/git/languages/julia
_sh-manifest ~/git/languages/reason
_sh-manifest ~/git/languages/sdk # Dart SDK?
_sh-manifest ~/git/other/micropython
_sh-manifest ~/git/languages/micropython
_sh-manifest ~/git/other/staticpython # statically linked build
_sh-manifest ~/git/other/exp # Go experimental repo

0 comments on commit bcd18e4

Please sign in to comment.