Permalink
Please sign in to comment.
Browse files
Introduce the concept of asdl.const.NO_INTEGER.
It is the default value for unspecified ASDL integers. See the comment in const.py for the rationale. Change span_id and line_id to be 0-based, with const.NO_INTEGER as the uninitialized/unknown value. All unit tests and spec tests pass. Also: benchmarks/oheap.sh: Try to encode everything in benchmarks/testdata. This revealed that we should properly encode Array<Str>, which is now done. However there are still some lingering negative numbers.
- Loading branch information...
Showing
with
173 additions
and 51 deletions.
- +5 −3 asdl/arith_ast_test.py
- +36 −0 asdl/const.py
- +31 −12 asdl/encode.py
- +5 −0 asdl/encode_test.py
- +9 −0 asdl/osh_demo.cc
- +3 −2 asdl/py_meta.py
- +1 −0 asdl/run.sh
- +41 −0 benchmarks/oheap.sh
- +0 −3 benchmarks/osh-parser.sh
- +7 −4 core/alloc.py
- +1 −1 core/alloc_test.py
- +3 −1 core/cmd_exec.py
- +6 −6 core/lexer.py
- +4 −4 core/ui.py
- +3 −2 core/util.py
- +12 −11 core/word.py
- +3 −1 osh/cmd_parse.py
- +3 −1 osh/word_parse_test.py
| @@ -0,0 +1,36 @@ | ||
| #!/usr/bin/python | ||
| """ | ||
| const.py | ||
| """ | ||
| DEFAULT_INT_WIDTH = 3 # 24 bits | ||
| # 2^24 - 1 is used as an invalid/uninitialized value for ASDL integers. | ||
| # Why? We have a few use cases for invalid/sentinel values: | ||
| # - span_id, line_id. Sometimes we don't have a span ID. | ||
| # - file descriptor: 'read x < f.txt' vs 'read x 0< f.txt' | ||
| # | ||
| # Other options for representation: | ||
| # | ||
| # 1. ADSL could use signed integers, then -1 is valid. | ||
| # 2. Use a type like fd = None | Some(int fd) | ||
| # | ||
| # I don't like #1 because ASDL is lazily-decoded, and then we have to do sign | ||
| # extension on demand. (24 bits to 32 or 64). As far as I can tell, sign | ||
| # extension requires a branch, at least in portable C (on the sign bit). | ||
| # | ||
| # Thes second option is semantically cleaner. But it needlessly | ||
| # inflates the size of both the source code and the data. Instead of having a | ||
| # single "inline" integer, we would need a reference to another value. | ||
| # | ||
| # We could also try to do some fancy thing like fd = None | | ||
| # Range<1..max_fd>(fd), with smart encoding. But that is overkill for these | ||
| # use cases. | ||
| # | ||
| # Using InvalidInt instead of -1 seems like a good compromise. | ||
| NO_INTEGER = (1 << (DEFAULT_INT_WIDTH * 8)) - 1 | ||
| # NOTE: In Python: 1 << (n * 8) - 1 is wrong! I thought that bit shift would | ||
| # have higher precedence. |
| @@ -0,0 +1,41 @@ | ||
| #!/bin/bash | ||
| # | ||
| # Test the size of file, encoding, and decoding speed. | ||
| # | ||
| # Usage: | ||
| # ./oheap.sh <function name> | ||
| set -o nounset | ||
| set -o pipefail | ||
| set -o errexit | ||
| encode-one() { | ||
| local script=$1 | ||
| local oheap_out=$2 | ||
| bin/osh -n --ast-format oheap "$script" > $oheap_out | ||
| } | ||
| task-spec() { | ||
| while read path; do | ||
| echo "$path _tmp/oheap/$(basename $path).oheap" | ||
| done < benchmarks/osh-parser-files.txt | ||
| } | ||
| run() { | ||
| mkdir -p _tmp/oheap | ||
| local results=_tmp/oheap/results.csv | ||
| echo 'status,elapsed_secs' > $results | ||
| task-spec | xargs -n 2 --verbose -- \ | ||
| benchmarks/time.py --output $results -- \ | ||
| $0 encode-one | ||
| } | ||
| stats() { | ||
| ls -l -h _tmp/oheap | ||
| echo | ||
| cat _tmp/oheap/results.csv | ||
| } | ||
| "$@" |
Oops, something went wrong.
0 comments on commit
10c0897