Skip to content
This repository has been archived by the owner on Mar 2, 2022. It is now read-only.

A Python bytecode compiler written in Python. Based on a fork of https//github.com/pfalcon/python-compiler

License

Notifications You must be signed in to change notification settings

facebookarchive/python-compiler

Repository files navigation

Build Status

Python Bytecode Compiler Written in Python

This is WIP port of Python2 stdlib compiler package to Python3.

Motivation: to have an easily hackable Python compiler for experimenting (e.g. various optimizations, instrumentation, semantic variants, etc.)

The porting project concentrates on the conversion of AST (as provided by the builtin "ast" module) to bytecode and code objects. The original Python2 package included another important part: conversion of concrete parse tree into Abstract Syntax Tree (AST). While it would be interesting to ultimately have complete pure-Python closed loop for Python compilation, to keep this specific project maintainable, lexing, parsing, AST building are outside of its scope. Other projects are welcome to provide integrated maintainable solutions for those areas (indeed, generic/non-integrated solutions for them definitely exist).

Short-term goals:

  • Port the original "compiler" package to work with AST as produced by Python3's "ast" module.
  • Initially, implement support for Python3.5 syntax and bytecode.
  • Cleanup the original code.

History of Python2 "compiler" package:

  1. The code is based on earlier work done by Greg Stein and Bill Tutt for Python2C (aka Py2C aka p2c) project circa 1997-1999. That code however didn't include bytecode compiler, but just transformer.py module, which converted low-level Python parse tree, as produced by the built "parser" module, into a higher-level Abstract Syntax Tree (AST). The Python2C project itself generated C code from this AST.
  2. Actual bytecode compiler was started and largely written by Jeremy Hylton. Initial commits importing Python2C files and starting pycodegen.py were made on 2000-02-04.
  3. 66 commits were made in 2000, 73 in 2001, 10 in 2002, 6 in 2003, 15 in 2004, 9 in 2005, 51 in 2006, 16 in 2007.
  4. In May 2007, complaints are heard that it's hard to maintain and regularly broken: https://mail.python.org/pipermail/python-3000/2007-May/007575.html
  5. Those transformed into an entry in PEP3108 for its removal.
  6. Removed in of 3.x branch in revision a8add0ec5ef05c26e1641b8310b65ddd75c0fec3 on 2007-05-14.
  7. The funtionality wasn't totally gone, instead functionality of internal C-based compiler was exposed in a similar fashion (albeit with changed/cleaned up API). E.g., compiler.ast and compiler.transformer was replaced with builtin "_ast" module (in other words, AST node type definitions and transformation of parse tree to AST are now done on C level). compiler.visitor was replaced with Python-level "ast" module. Compilation of AST into bytecode is handled using builtin compile() function with suitable parameters.

Usage

Currently, the package is intended to work with CPython3.5 only. If you don't have this version handy, you can easily build a binary with a script supplied in this repository, see the next section.

python3.5 -m compiler --help
python3.5 -m compiler <input.py>

By default, the command above compiles source to in-memory code objects and executes it. If -c switch is passed, instead of execution, it will be saved to .pyc file. If --dis is passed, code will be disassembed before executing/saving.

Running Tests

The projects includes a builtin test corpus of various syntactic constructs to verify codegeneration against reference output produced by CPython3.5. Currently, the project does not include peephole optimizer, run as a postprocessing pass in CPython. This means that testing should happen against modified CPython3.5 build with the peephole optimizer disabled.

The patch is available at https://github.com/pfalcon/cpython/tree/3.5-noopt This repository includes build-cpython-compiler.sh helper script to download and build it. It will produce a python3.5-nopeephole symlink in the top-level directory, where scripts below expect to find it.

To produce reference code generation output from python3.5-nopeephole, run:

./test_testcorpus_prepare.py

This needs to be done once. Afterwards, you can run

./test_testcorpus_run.py

to compare the output produced by this compiler package against the reference.

Beyond the simple test corpus, the tests can be a applied to the compiler code itself:

./test_dogfeed.sh

Finally, the test can be run against the entire CPython3.5 standard library:

./test_stdlib_3.5_prepare.py
./test_stdlib_3.5_run.py

(All files under Lib/ are tested. Those which don't represent valid Python3 source files are explicitly excluded (there are such, to test CPython interpreter on them). Beyond that, there may be files excluded for which this compiler package doesn't produce the bytecode, matching the reference interpreter output. See the test_stdlib_3.5_run.py file for the up to date list.)

Authorship and Licensing Info

The source code is based on the "compiler" package from Python2 standard library. It is licensed under Python Software Foundation License v2. See complete licensing terms and details in file LICENSE.

The "compiler" package is a result of dedicated work of a number of individuals, listed below (based on the git history of the official CPython repository).

Porting of the code to Python3 and further maintenance is handled by Paul Sokolovsky.

Contributors to Python2 version of the package:

$ git clone https://github.com/python/cpython
$ cd cpython
$ git checkout 2.7
$ git log --follow Lib/compiler | grep ^Author | sed -e 's/<.*>//' | sort | uniq -c | sort -n -r
# Email addresses not included to minimize spam
    143 Author: Jeremy Hylton
     18 Author: Neal Norwitz
     18 Author: Guido van Rossum
     17 Author: Georg Brandl
     11 Author: Tim Peters
      9 Author: Thomas Wouters
      7 Author: Neil Schemenauer
      6 Author: Michael W. Hudson
      6 Author: Martin v. Löwis
      4 Author: Brett Cannon
      3 Author: Nick Coghlan
      3 Author: Antoine Pitrou
      2 Author: Raymond Hettinger
      2 Author: Ezio Melotti
      2 Author: Christian Heimes
      2 Author: Benjamin Peterson
      2 Author: Anthony Baxter
      2 Author: Andrew M. Kuchling
      2 Author: Alexandre Vassalotti
      1 Author: Serhiy Storchaka
      1 Author: Phillip J. Eby
      1 Author: Jeffrey Yasskin
      1 Author: Gustavo Niemeyer
      1 Author: Greg Stein
      1 Author: Facundo Batista
      1 Author: Eric Smith
      1 Author: Collin Winter
      1 Author: Barry Warsaw
      1 Author: Amaury Forgeot d'Arc

Related Projects

Aka "why I took Python2's compiler package and spent all this effort on it instead of using something else".

  • PyPy is the obvious direction of thought. PyPy is a very advanced and big project. Somewhere in there a small hadron collider may be lurking. But what if you want a simple Python compiler without a hadron collider? You may be out of luck to extract it, or at least I was.
  • YaPyPy - looks absolutely great, but its docs miss to explain how to actually run it. I mean, that small part of docs (README) which is written in English. There're more docs in Chinese, but unfortunately, I can't read it.
  • tailbiter is a great project showing how to develop a Python bytecode compiler from scratch, and is highly recommended for that purpose. But I wanted to get a "production ready" (read: behaving the same way as CPython's) compiler in reasonable time with reasonable effort, not write one from scratch.

About

A Python bytecode compiler written in Python. Based on a fork of https//github.com/pfalcon/python-compiler

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published