Grammar Variational Autoencoder
This repository contains training and sampling code for the paper: Grammar Variational Autoencoder.
Install (CPU version) using
pip install -r requirements.txt
For GPU compatibility, replace the fourth line in requirements.txt with: https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.12.1-cp27-none-linux_x86_64.whl
To create the molecule datasets, call:
To train the molecule models, call:
python train_zinc.py% the grammar model
python train_zinc.py --latent_dim=2 --epochs=50% train a model with a 2D latent space and 50 epochs
python train_eq.py% the grammar model
python train_eq.py --latent_dim=2 --epochs=50% train a model with a 2D latent space and 50 epochs
The file molecule_vae.py can be used to encode and decode SMILES strings. For a demo run:
The analogous file equation_vae.py can encode and decode equation strings. Run:
The Bayesian optimization experiments use sparse Gaussian processes coded in theano.
We use a modified version of theano with a few add ons, e.g. to compute the log determinant of a positive definite matrix in a numerically stable manner. The modified version of theano can be insalled by going to the folder Theano-master and typing
python setup.py install
The experiments with molecules require the rdkit library, which can be installed as described in http://www.rdkit.org/docs/Install.html.
The Bayesian optimization experiments can be replicated as follows:
1 - Generate the latent representations of molecules and equations. For this, go to the folders
2 - Go to the folders
nohup python run_bo.py &
Repeat this step for all the simulation folders (simulation2,...,simulation10). For speed, it is recommended to do this in a computer cluster in parallel.
2 - Extract the results by going to the folders