Machine learning tools for predicting rate constants from enzyme dynamics.
The scripts in dl_kcat
train deep learning models to predict an enzyme mutant's specific activity based on its pre-reaction structural dynamics. The structural dynamics are captured by a set of manually engineered features (interatomic distances, angles, and torsions) that collectively describe the structure of the active site over some time interval. Therefore, the data are a multivariate time series comprised of 70 features and typically 30+ time points. These time series are drawn from molecular dynamics simulations of attempted reactions, which were sampled using transition interface sampling (TIS).
The reaction that was simulated was a methyl transfer:
Note that
We use TIS to collect examples of both successful, as well as failed, attempted reactions by the enzyme.
Substrate is shown in light orange, and its orientation corresponds with the above Lewis structure. NADPH is light purple. Mg ions are shown as white spheres with their coordinating waters in stick representation. The enzyme is shown in light gray. The migrating methyl,
Note that
This repo provides scripts for handling data from these kinds of simulations and using them to train transformer- or LSTM-based models for various learning tasks. The purposes for these models and analyses are to (i) understand and identify structural drivers of catalysis and (ii) predict a mutant enzyme's catalytic activity from limited data.
The main working scripts are ./scripts/transformer_1.py
and ./scripts/lstm_1.py
, which both source functions primarily from ./sctipts/pred_kcat.py
.
For a more in depth demonstration, check out the ./demo/
folder.