Download and preprocess data steps

Molecule generation using deep learning and evaluation tools from rdkit.

Download and preprocess data steps

The following datasets are provided : chembl22, zinc12 and zinc15. The first step is to download them using scripts provided in data/raw/chembl22, data/raw/zinc12 and data/raw/zinc15.

For zinc12 (the same for zinc15 and chembl22, just replace zinc12 by zinc15 or chembl22):

cd data/raw/zinc12
bash get.sh

This will create a file data/raw/zinc12.csv

The second step is to preprocess the data. The goal of preprocessing is to convert data to numpy format (but still strings), shuffle the data, and select only strings which dont exceed some maximum length. the default maximum length is 120.

python tools/preprocess.py data/raw/zinc12.csv data/zinc12.npz

This will create a file in data/zinc12.npz

Run example

Once data/zinc12.npz has been created, the example can be run:

cd examples/
python example.py

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
data		data
examples		examples
molecules		molecules
tools		tools
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Download and preprocess data steps

Run example

About

Releases

Packages

Languages

machinedesign/molecules

Folders and files

Latest commit

History

Repository files navigation

Download and preprocess data steps

Run example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages