Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This is the code for the paper on

This repo introduces two things :

  • A new Variational Auto-Encoder (VAE) architecture that goes from a molecular graph to a sequence representation (and especially SELFIEs).
  • An optimization pipeline to iteratively shift a prior distribution to maximize a black-box scoring function. In our implemntation, this back-box function is the score returned by a docking software.

The necessary packages are packaged as yml files available for cpu or cuda10 usage.

conda env create -f ymls/cpu.yml 

Prior model training: learning a distribution in molecules space

Data loading

We use Molecular Sets ( to train our model : After installing the moses python library, the data can be reached by running

python data_processing/ 

To train a graph2selfies model, selfies need to be precomputed for the train set by running To compute selfies for another dataset stored in csv, the molecules should be in a column entitled 'smiles', run :

python data_processing/ -i [path_to_my_csv_dataset]

Model training

To train the model run

python --train [my_dataset.csv] --n [your_model_name]

The csv must contain columns entitled 'smiles' and 'selfies'

Embedding molecules

To compute embeddings for molecules in csv file:

python -i [path_to_csv] --name [your_model_name] -v [smiles]/[selfies]

The column containing the smiles/selfies should be labeled 'smiles'.

Generating samples

To generate samples from a trained model, run :

python generate/ -N [number_of_samples] --name [name_of_the_model] -b [use beam search decoding]

Moses metrics

To compute the Moses benchmark metrics for the samples (recommended 30k samples), run

python eval/ -i [path_to_txt_with_samples]

OptiMol: Generating samples that maximize a black-box objective function :


Optimization of binding affinities in chemical space for drug discovery






No releases published


No packages published