Permalink
Browse files

Isolated and fixed a buggy interaction between compiler2 and byterun.

The bug was because:

- byterun doesn't implement the "optimized namespace access" -- e.g.
  LOAD_FAST instead of LOAD_NAME.  So it had a workaround to look for
  the co_name "<genexpr>" in the code object.  There a Python stdlib bug
  where inspect.getcallargs doesn't detect the pattern.
  http://bugs.python.org/issue19611
- compile.c uses the "<genexpr>" name for a generator expression, while
  compiler2 uses "lambda.1".

The fix was to look for ".0" in co_varnames.  That is the anonymous
argument for a generator expression, which is implemented as its own
code object.

Result: Got OSH unit tests compiled by compiler2 running under byterun!

Also:
- Useful function to print co_flags in inspect_pyc.py
- Start documenting opcodes
- Notes about iterator implementation
  • Loading branch information...
Andy Chu
Andy Chu committed Apr 6, 2017
1 parent 139a82c commit aa082c2b0ed51196fa3293150b0ef4dd896f8d56
Showing with 205 additions and 29 deletions.
  1. +1 −1 opy/compiler2/pyassem.py
  2. +5 −0 opy/compiler2/pycodegen.py
  3. +26 −2 opy/misc/inspect_pyc.py
  4. +91 −0 opy/opcodes.md
  5. +41 −25 opy/run.sh
  6. +28 −0 opy/tests/genexpr.py
  7. +12 −0 opy/tests/genexpr_simple.py
  8. +1 −1 opy/tools/dumppyc.py
View
@@ -256,7 +256,7 @@ class PyFlowGraph(FlowGraph):
def __init__(self, name, filename, args=(), optimized=0, klass=None):
self.super_init()
self.name = name
self.name = name # name that is put in the code object
self.filename = filename
self.docstring = None
self.args = args # XXX
@@ -1431,6 +1431,11 @@ class GenExprCodeGenerator(NestedScopeMixin, AbstractFunctionCode,
def __init__(self, gexp, scopes, class_name, mod):
self.scopes = scopes
self.scope = scopes[gexp]
# NOTE: isLambda=1, which causes the code object to be named
# "lambda.<n>". To match Python, we can thread an argument through and
# name it "<genexpr>". byterun has a hack due to
# http://bugs.python.org/issue19611 that relies on this. But we worked
# around it there.
self.__super_init(gexp, scopes, 1, class_name, mod)
self.graph.setFreeVars(self.scope.get_free_vars())
self.graph.setCellVars(self.scope.get_cell_vars())
View
@@ -66,10 +66,34 @@ def show_bytecode(code, level=0):
buffer = StringIO()
sys.stdout = buffer
# NOTE: This format has addresses in it, disable for now
#dis.disassemble(code)
dis.disassemble(code)
sys.stdout = sys.__stdout__
print(indent + buffer.getvalue().replace("\n", "\n"+indent))
# TODO: Do this in a cleaner way. Right now I'm avoiding modifying the
# consts module.
def build_flags_def(consts, co_flags_def):
for name in dir(consts):
if name.startswith('CO_'):
co_flags_def[name] = getattr(consts, name)
from compiler2 import consts
_CO_FLAGS_DEF = {}
build_flags_def(consts, _CO_FLAGS_DEF)
def show_flags(value):
names = []
for name, bit in _CO_FLAGS_DEF.items():
if value & bit:
names.append(name)
h = "0x%05x" % value
if names:
return '%s %s' % (h, ' '.join(names))
else:
return h
def show_code(code, level=0):
indent = INDENT*level
@@ -82,7 +106,7 @@ def show_code(code, level=0):
if isinstance(value, str):
value = repr(value)
elif name == "co_flags":
value = "0x%05x" % value
value = show_flags(value)
elif name == "co_lnotab":
value = "0x(%s)" % to_hexstr(value)
print("%s%s%s" % (indent, (name+":").ljust(NAME_OFFSET), value))
View
@@ -0,0 +1,91 @@
Notes on VM Opcodes
===================
This is an elaboration on:
https://docs.python.org/2/library/dis.html
I copy the descriptions and add my notes, based on what I'm working on.
`SETUP_LOOP(delta)`
Pushes a block for a loop onto the block stack. The block spans from the
current instruction with a size of delta bytes.
NOTES: compiler2 generates an extra SETUP_LOOP, for generator expressions,
along with POP_BLOCK.
`POP_BLOCK()`
Removes one block from the block stack. Per frame, there is a stack of blocks,
denoting nested loops, try statements, and such.
`LOAD_CLOSURE(i)`
Pushes a reference to the cell contained in slot `i` of the cell and free
variable storage. The name of the variable is `co_cellvars[i]` if i is less
than the length of `co_cellvars`. Otherwise it is
`co_freevars[i - len(co_cellvars)]`.
NOTES: compiler2 generates an extra one of these
`MAKE_CLOSURE(argc)`
Creates a new function object, sets its `func_closure` slot, and pushes it on
the stack. `TOS` is the code associated with the function, `TOS1` the tuple
containing cells for the closure’s free variables. The function also has `argc`
default parameters, which are found below the cells.
`LOAD_DEREF(i)`
Loads the cell contained in slot `i` of the cell and free variable storage.
Pushes a reference to the object the cell contains on the stack.
`GET_ITER()`
Implements TOS = iter(TOS).
NOTES: Hm how do I implement this? It turns it from a collection into an
iterator. Gah.
PyObject *iter = PyObject_GetIter(iterable);
objects/abstract.c -
objects/iterobject.c - PySeqIter_New
PySeqIter_Type has a it_seq field. The PyObject being iterated over. It
maintains an index too.
How does items() work as an iterable then?
Then iter_iternext() calls:
PySequence_GetItem(seq, it->it_index)
`LOAD_FAST(var_num)`
Pushes a reference to the local `co_varnames[var_num]` onto the stack.
NOTES:
This still does a named lookup? Generator expressions do `LOAD_FAST 0 (.0)`
since there is no formal parameter name.
Oh I see, there is a `PyObject** fastlocals` in EvalFrame
It's initialized to `f->f_localsplus` -- frame holds them. Oh I see, that's
where the frame setup is different! Don't need inspect.callargs.
FastCall populates fastlocals from `PyObject** args` and `nargs`.
View
@@ -11,6 +11,11 @@ source compare.sh
readonly PY=$PY36
die() {
echo "FATAL: $@" 1>&2
exit 1
}
_parse-one() {
PYTHONPATH=. ./opy_main.py 2to3.grammar parse "$@"
}
@@ -257,6 +262,10 @@ compile-opy-tree() {
_compile-tree $src $dest stdlib "${files[@]}"
}
inspect-pyc() {
PYTHONPATH=. misc/inspect_pyc.py "$@"
}
# For comparing different bytecode.
compare-files() {
local left=$1
@@ -265,8 +274,8 @@ compare-files() {
md5sum "$@"
ls -l "$@"
misc/inspect_pyc.py $left > _tmp/pyc-left.txt
misc/inspect_pyc.py $right > _tmp/pyc-right.txt
inspect-pyc $left > _tmp/pyc-left.txt
inspect-pyc $right > _tmp/pyc-right.txt
$DIFF _tmp/pyc-{left,right}.txt || true
return
@@ -309,6 +318,7 @@ compare-opy-tree() {
compare-files _tmp/opy-{stdlib,compile2}/opy_main.pyc
}
compile-osh-tree() {
local src=$(cd .. && echo $PWD)
local files=( $(find $src \
@@ -322,6 +332,12 @@ compile-osh-tree() {
_compile-tree $src _tmp/osh-compile2/ compiler2 "${files[@]}"
}
compare-osh-tree() {
#diff -u _tmp/opy-{stdlib,stdlib2}/SIZES.txt || true
#compare-files _tmp/osh-{ccompile,compile2}/core/id_kind_test.pyc
compare-files _tmp/osh-{ccompile,compile2}/core/testdbg.pyc
}
fill-osh-tree() {
local dir=${1:-_tmp/osh-stdlib}
cp -v ../osh/osh.asdl $dir/osh
@@ -369,30 +385,30 @@ unit-osh() {
popd
}
# Compile and byterun
# Weird interaction:
#
# ccompile / run std VM -- OK
# ccompile / byterun VM -- OK
# stdlib-compiler or compile2 / run with std VM -- OK
#
# stdlib-compiler or compiler2 / byterun VM -- weird exception!
#
# So each component works with the python VM, but not with each other.
#
# Oh you don't have a method of compling with the python VM. Then run with
# byterun. That would be a good comparison.
# Combinatios of {ccompile, compiler2} x {cpython, byterun}
compile-run-one() {
local compiler=${1:-ccompile} # or compile2
local vm=${2:-byterun} # or cpython
local py=$3
shift 3
pyc-byterun() {
local t=${1:-core/id_kind_test.py}
pushd ..
if ! { test $compiler = ccompile || test $compiler = compile2; } then
die "Invalid compiler $compiler"
fi
python -c 'from core import id_kind_test' || true
ls -l ${t}c
local dir="_tmp/osh-$compiler"
local pyc="$dir/$(basename $py)c"
_$compiler-one $py $pyc
PYTHONPATH=. byterun -c ${t}c
popd
export PYTHONPATH=$dir
if test $vm = cpython; then
python $pyc "$@"
elif test $vm = byterun; then
#byterun -v -c $pyc "$@"
byterun -c $pyc "$@"
else
die $vm
fi
}
byterun() {
@@ -404,7 +420,7 @@ byterun() {
opy-parse-on-byterun() {
local g=$PWD/2to3.grammar
local arg=$PWD/opy_main.py
pushd _tmp/opy-stdlib
pushd _tmp/opy-compile2
byterun -c opy_main.pyc $g parse $arg
popd
}
@@ -416,7 +432,7 @@ osh-parse-on-byterun() {
echo ---
byterun -c _tmp/osh-stdlib/bin/oil.pyc "${cmd[@]}"
byterun -c _tmp/osh-compile2/bin/oil.pyc "${cmd[@]}"
}
compare-sizes() {
View
@@ -0,0 +1,28 @@
#!/usr/bin/python
"""
Test for generator expressions.
"""
def MakeLookup(p):
#return dict([(pat, tok) for _, pat, tok in p])
# Something is broken about how we compile this...
# Difference in compilation is SETUP_LOOP. So CPython handle this fine,
# but bytern doesn't.
return dict((pat, tok) for _, pat, tok in p)
# This should be an error but isn't. Looks like it's not compiled
# correctly.
#return list(i for (i, j) in p)
fake_pairs = [
(False, '-a', 0),
(False, '-b', 1),
(False, '-c', 2),
#(False, '-d', 3),
#(False, '-e', 4),
#(False, '-f', 5),
]
#lookup = MakeLookup(id_kind.ID_SPEC.LexerPairs(Kind.BoolUnary))
lookup = MakeLookup(fake_pairs)
print('LOOKUP ***************', len(lookup))
print(lookup)
@@ -0,0 +1,12 @@
#!/usr/bin/python
"""
Simpler test for generator expressions.
"""
def MakeLookup(p):
return list(i for i in p)
#return list([i for i in p])
print(MakeLookup([66]))
# This runs but prints []
#print(MakeLookup([1,2]))
View
@@ -1,4 +1,4 @@
#!/usr/bin/env python3
#!/usr/bin/python
import marshal
import dis

0 comments on commit aa082c2

Please sign in to comment.