Skip to content

Compiling python with emscripten

jallwine edited this page Jun 6, 2013 · 32 revisions

This is the process I used to compile python with emscripten.

Install vagrant box

Make sure vagrant is installed. I am using a Mac, which introduces its own unique issues when building python. To avoid those issues, I'm using vagrant. I used https://github.com/semmypurewal/node-dev-bootstrap to get my vagrant box started as it has node.js installed.

git clone https://github.com/semmypurewal/node-dev-bootstrap
cd node-dev-bootstrap
vagrant up
mkdir python-sandbox
cd python-sandbox

Install emscripten

Clone emscripten:

git clone https://github.com/kripken/emscripten.git
cd emscripten
git checkout -b old_shared_libs origin/old_shared_libs 
cd ..

Clang and LLVM 3.2 are required. Let's snag the binaries for ubuntu using curl (or wget if that's what you have):

curl -O http://llvm.org/releases/3.2/clang+llvm-3.2-x86-linux-ubuntu-12.04.tar.gz
tar xzf clang+llvm-3.2-x86-linux-ubuntu-12.04.tar.gz

Install java on your vagrant box (note that after issuing 'vagrant ssh' all subsequent commands are run on your vagrant box):

cd ..
vagrant ssh
sudo apt-get install openjdk-7-jre

This is optional, but I like to use vim for text editing. Since we're using vagrant you can edit the files in your vagrant box natively on your computer, but sometimes it's handy to be able to run vim within vagrant.

sudo apt-get install vim

Go to your emscripten directory and run emcc for the first time. This will generate a .emscripten file in your home directory:

cd /vagrant/python-sandbox/emscripten
./emcc

Now let's set the LLVM path:

echo "export LLVM=/vagrant/python-sandbox/clang+llvm-3.2-x86-linux-ubuntu-12.04/bin" >> ~/.bashrc
echo "export PATH=/vagrant/python-sandbox/clang+llvm-3.2-x86-linux-ubuntu-12.04/bin:/vagrant/python-sandbox/emscripten:\$PATH" >> ~/.bashrc
. ~/.bashrc

Verify that clang, node and emscripten work:

clang tests/hello_world.cpp
./a.out
# should print out "hello, world!"

node tests/hello_world.js
# should print out "hello, world!"

emcc tests/hello_world.cpp
node a.out.js
# should print out "hello, world!"

If that all works, then you should be good to go!

#Compiling Python

First, let's install the necessary headers for building python:

sudo apt-get install python-dev

Now let's snag the source for python.

cd /vagrant/python-sandbox
wget http://www.python.org/ftp/python/2.7.4/Python-2.7.4.tgz
tar xzf Python-2.7.4.tgz

Since the build process for python requires compiling and running the parser natively, we copy the source to second directory. One we'll build natively, the other we'll compile to javascript.

cp -r Python-2.7.4 Python-2.7.4-js

Let's begin the build process for the javascript version:

cd Python-2.7.4-js
emconfigure ./configure --without-threads --without-pymalloc --enable-shared --disable-ipv6

emconfigure will run the configure script swapping out certain gcc commands with emscripten's emcc. When it finishes you'll have to edit pyconfig.h.

vim pyconfig.h

Remove the line with HAVE_GCC_ASM_FOR_X87 and all lines with HAVE_SIG except for HAVE_SIGNAL_H. Then add a line with:

#define PY_NO_SHORT_FLOAT_REPR

Then start the build:

make

This will eventually error saying it doesn't have permission to run Parser/pgen. This is because we're trying to run the parser that we just built using emscripten. We need to build pgen natively, so that we can run it.

cd ../Python-2.7.4
./configure --without-threads --without-pymalloc --disable-ipv6
make
cp Parser/pgen ../Python-2.7.4-js/Parser/pgen
chmod +x ../Python-2.7.4-js/Parser/pgen

Return to the js python dir and resume the build:

cd ../Python-2.7.4-js
make

Because we're using Vagrant, the build will now error when trying to create a hard link to libpython. Instead, we'll modify that line in the Makefile to create a full blown copy instead:

vim Makefile

On line 488 change:

$(LN) -f $(INSTSONAME) $@; \

To:

cp $(INSTSONAME) $@; \

Exit Makefile and again run:

make

It will error one last time saying it doesn't have permission to run python. This error is ok, because we don't actually need to run python. Now we can just link the dynamic library and the python executable and then compile them to javascript:

llvm-link libpython2.7.so python.exe -o python.bc
emcc -O2 python.bc -o python.js

Note that because I'm running this on a Mac, the build generates python.exe. Even though I'm using Vagrant, my filesystem is still case insensitive which requires python to have the extension .exe so it doesn't conflict with the Python directory (that's my guess anyway). If you're not on a Mac (or Windows), then you probably won't need the .exe extension.

#What's next

Even though we have python built, it still doesn't do anything. We need to set up the filesystem and compile the modules so python can be useful. Much of what's next is taken from replit's empythoned git repo and updated to use the latest version of emscripten. Also, instead of directly using llvm-gcc which is now deprecated in favor of clang, I'll just be using emscripten's emcc command.

Compiling the python library modules

I took replit's build script and trimmed it down to just build the modules. The modified script is here. The script contains the filenames for all the necessary source files for each module and compiles them all into a .so.js file. At runtime python uses dlopen to dynamically read in the .so.js file.

A couple notes:

  1. At the time of writing this, dynamically linked libraries were deprecated in emscripten. The main developer of emscripten (Alon Zakai), informed me that he'd be reimplementing the over the summer (of 2013). So, to build the modules requires using the old_shared_libs branch.

  2. A change to the way reading files was implemented inadvertently broke dlopen in the WebWorker context (the one we're using for python in a browser). This pull request fixes the issue and is now merged into the old_shared_libs branch.

Now let's build the modules:

wget https://raw.github.com/jallwine/emscripten_test/master/python_build_files/build_modules
chmod +x build_modules
./build_modules

Setting up the filesystem

In a browser, we don't have access to the filesystem. Emscripten sets up a fake in-memory filesystem and all file io calls modify it. That means that it's not persistent when you write to it, unless you explicitly write your own persistence methods (which I haven't yet investigated). To set up the directory and file structure, you use emscripten's FS module. You can read emscripten's docs for FS here. I'll go over the methods we're concerned with for implementing python.

FS.createFolder

This is copied directly from the docs:

  • FS.createFolder(parent, name, canRead, canWrite): Creates a single empty folder and returns a reference to it.

    • (string|object) parent: The parent folder, either as a path (e.g. '/usr/lib') or an object previously returned from a FS.createFolder() or FS.createPath() call.
    • string name: The name of the new folder.
    • bool canRead: Whether this folder should have read permissions set from the program's point of view.
    • bool canWrite: Whether this folder should have write permissions set from the program's point of view.

We'll be using this method to set up our directory structure which is as follows:

/
    lib/
        python2.7/
    include/
        python2.7/
    sandbox/

The include directory is readonly and will have the pyconfig.h file that we used to build python. The lib directory is readonly and will have all the python modules (.py files as well as compiled .so libraries). The sandbox directory is writable and will be our current directory when we start python.

FS.createLazyFile

Again, copied from the docs:

  • FS.createLazyFile(parent, name, url, canRead, canWrite): Creates a file that will be loaded lazily on first access from a given URL or local filesystem path, and returns a reference to it. WARNING: Firefox and Chrome have recently disabled synchronous binary XHRs, which means this cannot work for Javascript in regular HTML pages (but it works within WebWorkers). Instead, use createDataFile.

    • (string|object) parent: The parent folder, either as a path (e.g. '/usr/lib') or an object previously returned from a FS.createFolder() or FS.createPath() call.
    • string name: The name of the new file.
    • string url: In the browser, this is the URL whose contents will be returned when this file is accessed. In a command line engine, this will be the local (real) filesystem path from where the contents will be loaded. Note that writes to this file are virtual.
    • bool canRead: Whether the file should have read permissions set from the program's point of view.
    • bool canWrite: Whether the file should have write permissions set from the program's point of view.

We'll be using FS.createLazyFile to create all the files in our directory structure. Since the standard python libraries consists of 1800+ files, we don't want to download them all immediately. Instead, when one is accessed, it is then fetched from the server.

A couple notes:

  1. Python will run in a WebWorkers context, so the warning about synchronous XHRs can be ignored.

  2. This method changed since replit's empythoned was compiled. To handle reading files starting at arbitrary byte offsets, chunked file read was implemented. Instead of reading the entire file, a specific chunk of the file can be read if the server supports it. To make this possible, it was issuing HEAD requests for every file on startup to get their file size, which caused significant delays. I was able to change it, so those HEAD requests don't happen until the file is accessed. This pull request fixed that issue: https://github.com/kripken/emscripten/pull/1160.

Generating our filesystem.

We'll be using a script from replit's git repo called map_filesystem.py. It takes a path as a command line argument. It will traverse the directory structure starting at the path you specify, outputting FS.createFolder and FS.createLazyFile for every folder and file it finds. So let's go ahead and run it. First we'll copy all the necessary files to a dist directory and then use map_filesystem.py to generate the necessary FS calls. Then we'll stick those calls between some extra javascript code that will handle setting up the python environment in a browser. This extra code was taken from empythoned's entry_point.js.

From the Python-2.7.4-js folder run (most of this is pulled from empythoned's build script:

mkdir -p dist/lib/python2.7/config
cp -r Lib/* dist/lib/python2.7
rm -rf dist/lib/python2.7/{idlelib,lib-tk,multiprocessing,curses,bsddb}
rm -rf dist/lib/python2.7/plat-{aix3,aix4,atheos,beos5,darwin,freebsd4,freebsd5,freebsd6,freebsd7,freebsd8,generic,irix5,irix6,mac,netbsd1,next3,os2emx,riscos,sunos5,unixware7}
rm -rf dist/lib/python2.7/test
rm -rf dist/lib/python2.7/*/test{,s}

cp Makefile Modules/{Setup*,config.c} dist/lib/python2.7/config
cp pyconfig.h dist/include/python2.7/

wget https://raw.github.com/replit/empythoned/master/map_filesystem.py
wget https://raw.github.com/jallwine/emscripten_test/master/python_build_files/pre_fs.js
wget https://raw.github.com/jallwine/emscripten_test/master/python_build_files/post_fs.js

cat pre_fs.js > fs.js

python map_filesystem.py dist >> fs.js

cat post_fs.js >> fs.js

Ok, now that we have fs.js, we need to build python again using --pre-fs fs.js. We also need to use several other flags:

  • -s NAMED_GLOBALS=1 - this is necessary to have access to certain python variables in fs.js
  • -s INVOKE_RUN=0 - this tells emscripten not to execute the main function, we'll handle that ourselves in fs.js
  • -s EXPORTED_FUNCTIONS="[...]" - we need these functions exposed so python runs correctly
  • -s ASM_JS=0 - make sure we're not using asm js optimizations

So, all together that's:

emcc -O2 python.bc -s NAMED_GLOBALS=1 -s INVOKE_RUN=0 --pre-js fs.js -s EXPORTED_FUNCTIONS="['_Py_Initialize', '_PySys_SetArgv', '_PyErr_Clear', '_PyEval_EvalCode', '_PyString_AsString', '_Py_DecRef', '_PyErr_Print', '_PyErr_Fetch']" -s ASM_JS=0 -o dist/python.js

Browser hooks

Now we have python compiled and ready to be used on the web. You can pull down these files to set up a demo:

cd dist
wget https://raw.github.com/jallwine/emscripten_test/master/public/python2.7.4/index.html
wget https://raw.github.com/jallwine/emscripten_test/master/public/python2.7.4/worker.js

The dist directory is now ready to be hosted on a web server. Once it's hosted you can go open the index.html file and start executing python code!