Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/feat/load data from archive #8

Merged
merged 38 commits into from
Jun 26, 2024
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
fb185f5
use correct type annotation (Path not str)
nmaeder Jun 14, 2024
11b7fad
add git keep for data to be downloaded
nmaeder Jun 14, 2024
bf6a0fa
initial commit
nmaeder Jun 14, 2024
6d82fa5
add function skeletons
nmaeder Jun 14, 2024
8b351dc
more functionality added
nmaeder Jun 14, 2024
631ab23
more exceptions for data setup
nmaeder Jun 14, 2024
c3c2835
formatting
nmaeder Jun 14, 2024
3a2cf04
add functionality and tests
nmaeder Jun 14, 2024
a1eee7d
initial commit
nmaeder Jun 14, 2024
338e70d
added sepearate tests directory with retrieve data tests and testfiles
nmaeder Jun 17, 2024
6489db3
add tree types and tree factory
nmaeder Jun 17, 2024
4b4b9df
formatting
nmaeder Jun 17, 2024
377e12b
open for rist test by @MTLehner
nmaeder Jun 18, 2024
89b4f77
cleanup paths and fix tests
nmaeder Jun 18, 2024
e0a8f8c
remove data fetch from setup
nmaeder Jun 18, 2024
01b7ada
remove unused import
nmaeder Jun 18, 2024
7ae4cab
fixed tree import and tree-file download
MTLehner Jun 18, 2024
9cb902f
fix python 3.9 incompatible syntax
MTLehner Jun 18, 2024
6f9c391
update README
MTLehner Jun 18, 2024
f5b0cfe
fix pre-commit
MTLehner Jun 18, 2024
0b25c77
fix typo
MTLehner Jun 18, 2024
1018336
fix table nameing
MTLehner Jun 18, 2024
13dc907
remove last parts of versioneer
MTLehner Jun 18, 2024
5d78517
remove last-last parts of versioneer
MTLehner Jun 18, 2024
ebceaea
thoroughly test retrieve_data
nmaeder Jun 19, 2024
14e508e
remove pytorch channels
nmaeder Jun 19, 2024
f1e803a
run new tests too in ci
nmaeder Jun 19, 2024
6115a55
change paths
nmaeder Jun 19, 2024
a4d21d9
add utils tests
nmaeder Jun 19, 2024
d7fddc1
add utils tests
nmaeder Jun 19, 2024
6bbde08
run it on ci
nmaeder Jun 19, 2024
47c25c8
formatting
nmaeder Jun 20, 2024
60ab6e6
fist test
nmaeder Jun 20, 2024
735a1e4
fix CI skipif condition
nmaeder Jun 21, 2024
be7b102
add tree_factory tests
nmaeder Jun 21, 2024
9472c36
dont use setup.py for installing
nmaeder Jun 21, 2024
25f8b0c
also add testfiles
nmaeder Jun 21, 2024
4045261
pre-commit run
nmaeder Jun 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .codecov.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
ignore:
- "serenityff/charge/data"
- "serenityff/charge/examples"
- "serenityff/charge/_version.py"
2 changes: 1 addition & 1 deletion .github/workflows/CI.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
shell: bash -l {0}
run: |
python setup.py install
pytest -v --color=yes serenityff/charge/tests/ --cov=serenityff --cov-report=xml
pytest -v --color=yes --cov=serenityff --cov-report=xml serenityff/charge/tests/ tests/

analyze:
name: Analyze
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -133,3 +133,5 @@ dmypy.json

# SerenityFF charge dev files
dev
serenityff/charge/data/additional_data/**
!serenityff/charge/data/additional_data/.gitkeep
11 changes: 0 additions & 11 deletions .lgtm.yml

This file was deleted.

1 change: 0 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ repos:
# Ignore imports in init files
"--per-file-ignores=
*/__init__.py:F401,setup.py:E121
versioneer.py:W605
",
# ignore long comments (E501), as long lines are formatted by black
# ignore Whitespace before ':' (E203)
Expand Down
2 changes: 0 additions & 2 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,2 @@
include LICENSE
include MANIFEST.in
include versioneer.py
include serenityff/_version.py
58 changes: 47 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ Welcome to DASH

[//]: # (Badges)
[![CI](https://github.com/rinikerlab/DASH-tree/actions/workflows/CI.yaml/badge.svg)](https://github.com/rinikerlab/DASH-tree/actions/workflows/CI.yaml)
[![pre-commit](https://github.com/rinikerlab/DASH-tree/actions/workflows/pre-commit.yml/badge.svg?branch=main)](https://github.com/rinikerlab/DASH-tree/actions/workflows/pre-commit.yml)
[![arXiv](https://img.shields.io/badge/arXiv-2305.15981-b31b1b.svg)](https://doi.org/10.48550/arXiv.2305.15981)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
[![arXiv DASH](https://img.shields.io/badge/arXiv:DASH-2305.15981-b31b1b.svg)](https://doi.org/10.48550/arXiv.2305.15981)
[![arXiv Props](https://img.shields.io/badge/chemrxiv:Props-2024_0ks0p-b31b1b.svg)](https://chemrxiv.org/engage/chemrxiv/article-details/6666e12112188379d8c44aa5)



Expand All @@ -13,18 +14,18 @@ Description

Welcome to DASH. This repository is a collection of scripts, tools and other resources for partial charges in MD simulations.

It contains tools to generate partial charges for given molecules quickly, using the DASH-charge workflow and using openff to generate all other parameters. A pre computed DASH (Dynamic Attention-based Substructure Hierarchy) tree can be used to generate charges for a given molecule.
It contains tools to generate partial charges and other atomic and molecular properties for given molecules quickly. By using the DASH-charge workflow and using the OpenFF plugin and Forcefield to parametrize molecules quickly with QM-like charge quality. A pre-computed DASH (Dynamic Attention-based Substructure Hierarchy) tree is included and can be used to generate charges for a given molecule. Threes with different properties are available for download from the ETHZ Research Collection.

Additionally, this repository contains all tools and functions needed to generate a new decision tree for partial charge assignmend, based on the attention data of a graph neural network, cabable of predicting the partial charges of a molecule.
Additionally, this repository contains all tools and functions needed to generate a new DASH tree for any property, like partial charge assignments, based on the attention data of a graph neural network, capable of predicting the partial charges of a molecule.

This repository contains code for the publication by M. Lehner et al. DOI: [arXiv:2305.15981](https://doi.org/10.48550/arXiv.2305.15981) and [10.1021/acs.jcim.3c00800](https://pubs.acs.org/doi/full/10.1021/acs.jcim.3c00800)
This repository contains code for the publication by M. Lehner et al. DOI: [arXiv:2305.15981](https://doi.org/10.48550/arXiv.2305.15981), [10.1021/acs.jcim.3c00800](https://pubs.acs.org/doi/full/10.1021/acs.jcim.3c00800), and [chemrxiv](https://chemrxiv.org/engage/chemrxiv/article-details/6666e12112188379d8c44aa5).


Content
-------------

* **Data Preperation**
* Select data from Database (ChEMBL)
* Select data from the Database (ChEMBL)
* Generate diverse data set
* Generate feature vectors

Expand All @@ -35,7 +36,7 @@ Content
* Attention Extraction (GNNExplainer)

* **DASH Tree**
* Generate DASH tree (attention based)
* Generate DASH tree (attention-based)
* Tools for DASH tree
* file I/O
* pruning
Expand All @@ -50,7 +51,7 @@ Content
* Tools for validation of the charges with OpenFF-Evaluator

* **Examples**
* Examples for all important functions and tools
* Examples of all important functions and tools
* A good starting point for new users


Expand All @@ -60,9 +61,44 @@ Installation
This repository comes with a conda environment file. To install the environment, run the following command in the root directory of this repository:

```bash
conda env create -f environment.yml
conda env create -f min_environment.yml
conda activate dash
python setup.py install
conda develop .
```

This will create a conda enviroment with the correct packages and install the openff plugin for partial charge assignment in openff. If you plan on only using the DASH tree and not developing new trees you can also use the file `min_environment.yml` instead, which does not contain any pytorch and pytorch geometric libraries.
This will create a conda environment with the correct packages. The environment contains the minimal dependencies required to use the tree. If the OpenFF plugin to automatically assign charge is required, use the file `environment.yml` instead. This will create the environment, the DASH package and install the OpenFF plugin for partial charge assignment in OpenFF. The file `environment.yml` also contains all torch dependencies, used for DASH tree development.


Usage
-------------

A default tree for MBIS partial charges is included in the repository. To use the tree, the following code can be used:

```python
# Import the DASH tree and RDKit
from rdkit import Chem
from serenityff.charge.tree.dash_tree import DASHTree
# Load the default tree
tree = DASHTree()
# Create a RDKit molecule
example_mol = Chem.AddHs(Chem.MolFromSmiles('CCO'))
# Assign charges
charges = tree.get_molecules_partial_charges(example_mol)["charges"]
```

More atomic and molecular properties can be calculated from a DASH tree populated with these properties.

```python
# Import the DASH tree and RDKit
from rdkit import Chem
from serenityff.charge.tree.dash_tree import DASHTree, TreeType
from serenityff.charge.data import dash_props_tree_path
# Load the property tree.
# Note, that the files will be automatically downloaded the first time the tree is loaded from the ETHZ Research Collection.
tree = DASHTree(tree_folder_path=dash_props_tree_path, tree_type=TreeType.FULL)
# Create a RDKit molecule
example_mol = Chem.AddHs(Chem.MolFromSmiles('CCO'))
# Get a new property
tree.get_property_noNAN(mol=example_mol, atom=0, property_name="DFTD4:C6")
# Or get partial charges with a different model
charges = tree.get_molecules_partial_charges(example_mol, chg_key="AM1BCC", chg_std_key="AM1BCC_std")["charges"]
4 changes: 2 additions & 2 deletions serenityff/charge/__init__.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# from serenityff.charge.gnn import ChargeCorrectedNodeWiseAttentiveFP, NodeWiseAttentiveFP, Extractor, Trainer

from . import _version
# from . import _version

__version__ = _version.get_versions()["version"]
__version__ = "2.0.1"

# __all__ = [
# Trainer,
Expand Down
Loading
Loading