Skip to content

ixsluo/cond-cdvae

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cond-CDVAE

This software implementes Conditional Crystal Diffusion Variational AutoEncoder (Cond-CDVAE), which generates the periodic structure of materials under user-defined chemical compositions and external pressure.

[Paper] [Datasets]

Installation

Install with pip

(torch2.0.1+cu118 for example)

It is suggested to use conda (by conda or miniconda) to create a python>=3.8(3.11 is suggested) environment first, then run the following pip commands in this environment.

pip install torch -i https://download.pytorch.org/whl/cu118
pip install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.0.0+cu118.html
pip install -r requirements.txt
pip install -e .

Setting up environment variables

Modify the following environment variables in file by vi .env.

  • PROJECT_ROOT: path to the folder that contains this repo, can get by pwd
  • HYDRA_JOBS: path to a folder to store hydra outputs, if in this repo, git hash can be record by hydra
export PROJECT_ROOT=/path/to/this/project
export HYDRA_JOBS=/path/to/this/project/log

Datasets

All datasets are directly available on data/ with train/valication/test splits. You don't need to download them again. If you use these datasets, please consider to cite the original papers from which we curate these datasets.

Find more about these datasets by going to our Datasets page.

Training Cond-CDVAE

Example:

python cdvae/run.py \
    model=vae/vae_nocond \  # vae is default
    project=... group=... expname=... \
    data=... \  # file name without .yml suffix in ./conf/data/
    optim.optimizer.lr=1e-4 optim.lr_scheduler.min_lr=1e-5 \
    data.teacher_forcing_max_epoch=250 data.train_max_epochs=4000 \
    model.beta=0.01 \
    model.fc_num_layers=1 model.latent_dim=... \
    model.hidden_dim=... model.lattice_dropout=... \  # MLP part
    model.hidden_dim=... model.latent_dim=... \
    [model.conditions.cond_dim=...] \

For more control options see ./conf.

To train with multi-gpu:

CUDA_VISIBLE_DEVICES=0,1 python cdvae/run.py \
    ... \  # can take the same options as before
    train.pl_trainer.devices=2 \
    +train.pl_trainer.strategy=ddp_find_unused_parameters_true

Cond-CDVAE uses hydra to configure hyperparameters, and users can modify them with the command line or configure files in conf/ folder.

After training, model checkpoints can be found in $HYDRA_JOBS/singlerun/project/group/expname.

Training Cond-CDVAE-4M on MP60-CALYPSO

First to modify root_path key in file conf/data/caly-mp/230617/mp60-B-SiO2+calyhalf2.yaml

# Train
HYDRA_FULL_ERROR=1 nohup python -u cdvae/run.py \
  model=vae data=mp60-CALYPSO/mp60-B-SiO2+calyhalf2 project=cond_cdvae group=mp60-calypso expname=model-4m \
  optim.optimizer.lr=1e-4 optim.lr_scheduler.min_lr=1e-5 model.zgivenc.no_mlp=False model.predict_property=False model.encoder.hidden_channels=128 model.encoder.int_emb_size=128 model.encoder.out_emb_channels=128 model.latent_dim=128 model.encoder.num_blocks=4 model.decoder.num_blocks=4 model.conditions.types.pressure.n_basis=80 model.conditions.types.pressure.stop=5 \
  train.pl_trainer.devices=3 +train.pl_trainer.strategy=ddp_find_unused_parameters_true model.prec=32 \
  data.teacher_forcing_max_epoch=60 > model-4m.log 2>&1 &

Training Cond-CDVAE-86M on MP60-CALYPSO

Remember to modify root_path key in file conf/data/caly-mp/230617/mp60-B-SiO2+calyhalf2.yaml

# Train
HYDRA_FULL_ERROR=1 nohup python -u cdvae/run.py \
  model=vae data=mp60-CALYPSO/mp60-B-SiO2+calyhalf2 project=cond_cdvae group=mp60-calypso expname=model-86m \
  optim.optimizer.lr=1e-4 optim.lr_scheduler.min_lr=1e-5 model.zgivenc.no_mlp=False model.predict_property=False model.encoder.hidden_channels=512 model.encoder.int_emb_size=256 model.encoder.out_emb_channels=512 model.latent_dim=512 model.encoder.num_blocks=6 model.decoder.hidden_dim=512 model.decoder.num_blocks=6 model.conditions.types.pressure.n_basis=80 model.conditions.types.pressure.stop=5 \
  train.pl_trainer.devices=3 +train.pl_trainer.strategy=ddp_find_unused_parameters_true model.prec=32 \
  data.teacher_forcing_max_epoch=60 > model-86m.log 2>&1 &

Generating materials

To evaluate reconstruction performance:

python scripts/evaluate.py --model_path MODEL_PATH --tasks recon

To generate materials:

python scripts/evaluate.py --model_path MODEL_PATH --tasks gen \
    [--formula=H2O/--train_data=*.pkl] \
    [--pressure=100] \  # if pressure conditioned
    [--label=xxx] \
    --batch_size=50

MODEL_PATH will be the path to the trained model. Users can choose one or several of the 3 tasks:

  • recon: reconstruction, reconstructs all materials in the test data. Outputs can be found in eval_recon.ptl
  • gen: generate new material structures by sampling from the latent space. Outputs can be found in eval_gen.pt.
  • opt: generate new material strucutre by minimizing the trained property in the latent space (requires model.predict_property=True). Outputs can be found in eval_opt.pt.

eval_recon.pt, eval_gen.pt, eval_opt.pt are pytorch pickles files containing multiple tensors that describes the structures of M materials batched together. Each material can have different number of atoms, and we assume there are in total N atoms. num_evals denote the number of Langevin dynamics we perform for each material.

  • frac_coords: fractional coordinates of each atom, shape (num_evals, N, 3)
  • atom_types: atomic number of each atom, shape (num_evals, N)
  • lengths: the lengths of the lattice, shape (num_evals, M, 3)
  • angles: the angles of the lattice, shape (num_evals, M, 3)
  • num_atoms: the number of atoms in each material, shape (num_evals, M)

Evaluating model

To compute evaluation metrics, run the following command:

python scripts/compute_metrics.py --root_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. All evaluation metrics will be saved in eval_metrics.json.

Authors and acknowledgements

The software is primary written by Xiaoshan Luo based on CDVAE.

The GNN codebase and many utility functions are adapted from the ocp-models by the Open Catalyst Project. Especially, the GNN implementations of DimeNet++ and GemNet are used.

The main structure of the codebase is built from NN Template.

Citation

Please consider citing the following paper if you find our code & data useful.

About

An SE(3)-invariant autoencoder for generating the periodic structure of materials [ICLR 2022]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Jupyter Notebook 1.0%