https://doi.org/10.1021/acs.chemrestox.0c00304
Aim:
- Development of a battery of in silico models for a set of targets involved in molecular initiating events (MIEs) of thyroid hormone homeostasis: deiodinases 1, 2, and 3, thyroid peroxidase (TPO), thyroid hormone receptor (TR), sodium/iodide symporter, thyrotropin-releasing hormone receptor, and thyroid-stimulating hormone receptor
Data type:
- ToxCast data for the MIEs (inactive/active compounds)
- Data taken from the literature for the negative compounds
  - For DIO1, TPO, and NIS, the ToxCast database only includes compounds that were tested in a multiconcentration assay (after they had previously been tested active in a single-concentration assay). Therefore, information on inactive compounds (these are the compounds that were tested negative in the single-concentration assay) was collected from the scientific literature (note that these works originate from the same lab as large parts of the ToxCast database)
- Any compounds with inhibition rates in the multiconcentration assay of 20% or higher were labeled as “active”; all other compounds, including those showing <50% inhibition in the single-concentration assay, were labeled as “inactive”
Data curation:
- For the seven data sets collected from the ToxCast database, data points tagged with any flag that indicate a potential quality issue were filtered out
- Datasets were filtered for cytotoxicity and nonspecificity
- KNIME was used for the preparation of the structures (with the ChemAxon Standardizer and RDKit Canon SMILES nodes) and descriptor calculation (RDKit Count-Based Fingerprint and RDKit Descriptor calculation nodes)
- ChemAxon Standardizer node in KNIME was used for removing solvents, stripping salts, detecting and annotating aromaticity, removing stereochemical information, neutralizing charges, mesomerizing structures, and removing small fragments
- Canonical SMILES were derived from the standardized molecules with RDKit (with default parameters) and used for deduplication. Duplicate compounds with conflicting activity labels for an assay were removed
- The global thyroid toxicity data set, generated by merging the nine end-point-specific data sets based on the previously generated canonical SMILES, consists of 8001 substances
Data issue:
- The data modelled originated from high-throughput screening assays and are therefore often error-prone
- False positive outcomes may occur if, for instance, a nonspecific interaction between a compound and a protein is measured, or if a compound is falsely perceived as active due to its cytotoxicity
- False negative outcomes may be caused by the volatility or low solubility of compounds, which reduces their concentration in the assay sample. In some cases, they may also be caused by the cytotoxicity of compounds, as it impedes the identification of a possible interaction
Descriptor calculation:
- Count-based Morgan fingerprints with a radius of 2 bonds and a length of 2048 bits were calculated with the “RDKit Count-Based Fingerprint” node of RDKit in KNIME
- All 119 one-dimensional (1D) and two-dimensional (2D) physicochemical property descriptors implemented in the “RDKit Descriptor Calculation” node were computed, which describe, among other properties, the number of particular types of atoms, the numbers of bonds and rings in a molecule, as well as polarity and solubility
- Prior to model building, the 1D and 2D descriptors were subjected to Z-score normalization using the “Normalizer” node in KNIME
- Descriptors for which no variance was observed for the global thyroid data set were removed
Dimensionality reduction:
- Dimensionality reduction was performed on the global thyroid data set with the PCA implementation of scikit-learn, based on a subset of 23 physically meaningful and interpretable molecular descriptors generated with RDKit
Data imbalance:
- Weight balancing, undersampling, and oversampling techniques were explored
- For the weight balancing approach, balanced weights for the active and inactive classes were calculated with scikit-learn and employed in combination with the ML methods: RF, LR, SVM, and NN. For XGB, balanced weights were not used, as the method itself is designed to deal with class imbalance by successively constructing training sets with misclassified examples
- For the undersampling approach, a workflow was developed (check the paper), which generates an ensemble of models built on different training sets
- For the oversampling approach, the SMOTENC method was employed. Molecular fingerprints were defined as categorical features, and the “sampling strategy” parameter, which defines the resulting ratio between the minority and majority class, was set to 0.7
Chemical groups:
- The relationship between specific chemical groups and active compounds for the different assays was analysed by searching the list “SMARTS Patterns for Functional Group Classification” distributed by Open Babel, which contains 309 SMARTS patterns, in the respective inactive and active compounds of each data set
- The number of hits per class was analysed, and a ratio, defined as the number of hits in active compounds divided by the number of hits in inactive compounds, was calculated. Only functional groups with ratios >1.7 were considered
Approaches:
- The optimization of hyperparameters was performed during a grid search within a 10-fold cross-validation framework
  - The F1 score was used as the optimization criterion
- PCA as well as model training and evaluation were performed in Python with the packages scikit-learn and Keras
- A multitask model was generated based on the global thyroid toxicity data set besides the single task model
- The chemical space represented by the training data defines the applicability domain of a model
- Multi-task classification model was done using NN
- On the single-task models, only one problem (assay result) is solved at a time, while multi-task models can learn and solve different problems simultaneously and may hence benefit from regularization and transfer learning
Model performance:
- Precision, recall, F1 score, MCC, balanced accuracy, Area under the receiver operating curve (AUC)
Key learnings:
- A tendency for ML models to perform best when trained on oversampled data was observed
- The combination of these two types of data (data measured in single- and multiconcentration assays) may increase the uncertainty of the models and result in lower performance
- An important limitation of the data used for model development is related to assay technology, which employs fluorescent antibodies coupled to a second messenger to derive the activity of the compounds against TSHR. Since this second messenger is nonspecific and may be activated via several pathways, and fluorescence measurements may be positive due to fluorescent compounds and dyes, the false positive rate in the data may be substantial
- Determining the similarity of new compounds to those in the training sets can therefore help to estimate the reliability of the predictions
- The performance of the multi-task NN models was similar to that of the single-task NN model implementing the oversampling approach
- The reliability of the predictions is correlated with the similarity of the test compounds and the training instances as well as with the distance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garcia de Lomana et al (2021).md

Garcia de Lomana et al (2021).md

Files

Garcia de Lomana et al (2021).md

Latest commit

History

Garcia de Lomana et al (2021).md

File metadata and controls