Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added README.md and minor documentation changes #9

Merged
merged 4 commits into from
Sep 11, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Code/GraphMol/DetermineBonds/DetermineBonds.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ void determineBonds(RWMol &mol, bool useHueckel, int charge, double covFactor,

determineConnectivity(mol, useHueckel, charge, covFactor);
determineBondOrder(mol, charge, allowChargedFragments, embedChiral, useAtomMap);
}
} // determineBonds()

} // namespace RDKit

Expand Down
4 changes: 2 additions & 2 deletions Code/GraphMol/DetermineBonds/DetermineBonds.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ namespace RDKit {
// ! assigns atomic connectivity to a molecule using atomic coordinates, disregarding pre-existing bonds
/*!
\param mol is the molecule of interest; it must have a 3D conformer
\param useHueckel (optional) if this is \c true, the extended Hueckel theorem will be used to determine connectivity
\param useHueckel (optional) if this is \c true, extended Hueckel theory will be used to determine connectivity
rather than the van der Waals method
\param charge (optional) the charge of the molecule; it must be provided if the Hueckel method is used and charge is non-zero
\param covFactor (optional) the factor with which to multiply each covalent radius if the van der Waals method is used
Expand All @@ -42,7 +42,7 @@ void determineBondOrder(RWMol &mol, int charge=0,
// it is recommended to sanitize the molecule after calling this function if embedChiral is not set to true
/*!
\param mol is the molecule of interest; it must have a 3D conformer
\param useHueckel (optional) if this is \c true, the extended Hueckel theorem will be used to determine connectivity
\param useHueckel (optional) if this is \c true, extended Hueckel theory will be used to determine connectivity
rather than the van der Waals method
\param charge (optional) the charge of the molecule; it must be provided if charge is non-zero
\param covFactor (optional) the factor with which to multiply each covalent radius if the van der Waals method is used
Expand Down
51 changes: 51 additions & 0 deletions Code/GraphMol/DetermineBonds/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# GSOC 2022: Integrating xyz2mol Into The RDKit

### Organization: OpenChemistry
### Student: Sreya Gogineni
### Mentors: Greg Landrum, Joey Storer, Jan H. Jensen

This summer, I worked on integrating 'xyz2mol' into the RDKit, an opensource cheminformatics library. '[xyz2mol](https://github.com/jensengroup/xyz2mol)' was originally developed by Professor Jan H. Jensen's research group at the Unversity of Copenhagen, based off of the work published in [this paper](https://onlinelibrary.wiley.com/doi/10.1002/bkcs.10334) (DOI: 10.1002/bkcs.10334).

The program, given a molecule's charge and the spatial location of each atom, could predict the molecule's most favorable set of bonds. A user would would pass in the molecule's XYZ file, a file format often used in computational chemistry that delivers each atom's coordinates, and would in return get an RDKit molecule object with predicted bonds in place.

As the original program was written in Python, the nucleus of this project was translation into C++, the language of the RDKit core.

Integrating xyz2mol into the RDKit required
- adding an XYZ file parser,
- implementing atomic connectivity determination (knowing which atoms are bonded to each other),
- implementing bond order determination (knowing whether each bond is single, double, or triple), and
- adding Python and Java bindings.

As of the end of the GSOC coding period, the first 3 steps have been completed. The final step, adding bindings to make the features available to RDKit Python and Java users, remains to be finished.

## The XYZ File Parser

As with other RDKit file parsers (such as the Mol file parser), the XYZ parser contructs an RDKit molecule from the file data. Since the only information an XYZ file contains is the element and location of each atom, the molecule built from the parser contains only atoms and not bonds, as well as a conformer containing the atomic coordinates. The function ```XYZFileToMol()``` calls the file parser.

## Atomic Connectivity Determination

The original xyz2mol offers two methods of predicting connectivity: 'the van der Waals' method and 'Hueckel' method. The former considers atoms' covalent radii to predict bonding, while the Hueckel method uses extended Hueckel theory.

These two methods were made available through the function ```determineConnectivity()```, which modifies a passed in molecule object in place and adds single bonds wherever a bond is predicted.

## Bond Order Determination

Determining bond order (whether a bond is single, double, or triple) was the largest part of this project. Given a molecule object with bonds corresponding to atomic connectivity, the function, ```determineBondOrdering()``` further modifes the molecule to have a favorable bond ordering. Also added, the function ```determineBonds()``` calls both ```determineConnectivity()``` and ```determineBondOrdering()``` and gives users of the original xyz2mol the ability to use a similar workflow.

Some interesting tasks while implementing the function included writing an algorithm to calculate the Cartesian product with an arbitrary number of input vectors of arbitrary size and using the Boost graph library.

## Looking Ahead

Through the integration of xyz2mol into the RDKit, its capabilities were made more modular. While the original program did file parsing, connectivity determination, and bond order determination at once, users can now do the three tasks independently of one another, enabling them to potentially swap out atomic connectivity and bond order determination methods or simply read in an XYZ file without using the rest of xyz2mol.

A lot of progress was made this summer in integrating xyz2mol into the RDKit, but there's yet more work to be done. The first order of business is doing a more comprehensive review of bond order determination for accuracy. This will involve thorough code review and also possibly testing the work with a larger, more diverse set of molecules. And, as mentioned earlier, Python and Java bindings still need to be added.