ATMOL

Requirements

python             3.7.2
pytorch            1.8.2+cu111
torch-cluster      1.5.9 
torch-geometric    1.7.2      
torch-scatter      2.0.8                    
torch-sparse       0.6.9
torchvision        0.9.2+cu111         
tornado            5.1         
tqdm               4.26.0
pandas             0.23.4 
matplotlib         2.2.2
numpy              1.18.5
scikit-learn       0.22
rdkit              2018.09.1   conda-forge
deepchem           2.6.0.dev20211018190358

Pretrain Dataset

Dataset

in-vitro.csv includes 306,347 SMILES from ZINC substance channel.

now.csv selected 3,000,000 molecules from the now set(now set includes 9,814,569 SMILES from ZINC substance channel)

Process

utils_gat_pretrain.py make the graph ready from SMILES by PyTorch Geometrics and RDKIT, finally torch.save() to XXX.pt

Downstream Dataset

Dataset

For downstream performance evaluation, we chose 7 datasets from MoleculeNet, which collected more than forty molecular property prediction tasks.

The 7 datasets were BBBP, BACE, HIV, ClinTox, Tox21, SIDER and MUV.

Process

utils_clr_downstream.py make the graph ready from SMILES by PyTorch Geometrics and Deepchem, finally torch.save() to XXX.pt

ATMOl Usage

Train Model

python main_pretain.py --batch_size 512 --epochs 500 --datafile in-vitro
optional arguments:
--feature_dim                 Feature dim for latent vector [default value is 128]
--temperature                 Temperature used in softmax [default value is 0.5]
--k                           Top k most similar images used to predict the label [default value is 200]
--batch_size                  Number of images in each mini-batch [default value is 512]
--epochs                      Number of sweeps over the dataset to train [default value is 500]
--datafile                    orginal data for smile input [in-vitro, now]

Attention-wise mask for graph augmentation

model_gat_pre.py    change mask methods in def attention_del
del_indices = get_allIndex(weight, len(weight)) + count
    chose max-weight or min-weight 
    def get_allIndex():
        # max-weight descending=True  or  min-weight descending=False
        sorted, indices = torch.sort(weight, dim=0, descending=True)

del_indices = get_randomIndex(weight,len(weight))+count
del_indices = get_rouletteIndex(weight, len(weight)) + count

Downstream Evaluation

python model_clr_downstream.py --batch_size 128 --epochs 500 
optional arguments:
--model_path                  The pretrained model path [default value is 'results/128_0.5_200_512_500_model.pth']
--batch_size                  Number of images in each mini-batch [default value is 128]
--epochs                      Number of sweeps over the dataset to train [default value is 500]

Results

the model was trained on Three NVIDIA GPU 3090.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
data/pretrain/data		data/pretrain/data
README.md		README.md
__init__.py		__init__.py
creat_data_DC.py		creat_data_DC.py
encoder_gat.py		encoder_gat.py
main_clr_downstream.py		main_clr_downstream.py
main_pretrain.py		main_pretrain.py
model_clr_downstream.py		model_clr_downstream.py
model_gat_pre.py		model_gat_pre.py
nt_xent.py		nt_xent.py
utils_clr_downstream.py		utils_clr_downstream.py
utils_gat_pretrain.py		utils_gat_pretrain.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATMOL

Requirements

Pretrain Dataset

Dataset

Process

Downstream Dataset

Dataset

Process

ATMOl Usage

Train Model

Attention-wise mask for graph augmentation

Downstream Evaluation

Results

About

Releases

Packages

Languages

moen-hyb/ATMOL

Folders and files

Latest commit

History

Repository files navigation

ATMOL

Requirements

Pretrain Dataset

Dataset

Process

Downstream Dataset

Dataset

Process

ATMOl Usage

Train Model

Attention-wise mask for graph augmentation

Downstream Evaluation

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages