/ bio_embeddings Public
Releases · sacdallago/bio_embeddings
- Added the
esm1vembedder from Meier et al. 2021, which is part of facebook's esm. Note that this is an ensemble model, so you need to pass
ensemble_idwith a value from 1 to 5 to select which weights to use.
- Added the
bindEmbed21DLextract protocol which is an ensemble of 5 convolutional neural network that predicts of 3 different types of binding residues (metal, nucleic acids, small molecules).
- Fix model download
- Update jaxlib to fix pip installation
- BETA: in-silico mutagenesis using ProtTransBertBFD. This computes the likelihood that, according to Bert, a residue in a protein can be a certain amino acid, which can be used as an estimate for the effect of a mutation. This adds two a new
mutagenesisand a new protocol
visualizestages, of which the first one computes the probabilities and writes them to a csv file while the latter visualizes the results as interactive plotly figure.
- Fix a
n_components: 2in the plotly protocol
- Added the
ProtTransT5XLU50Embedderembedder from the latest ProtTrans revision. You should use this over
projected_embeddings_file.csvof project stages has been renamed to
projected_reduced_embeddings_file.h5. For backwards compatibility,
projected_embeddings_file.csvis still written.
projected_embeddings_fileparameter of visualize stages has been renamed to
projected_reduced_embeddings_fileand takes an h5 file. For backwards compatibility,
projected_embeddings_fileand csv files are still accepted.
- Added the pb_tucker model as project stage. Tucker is a contrastive learning model trained to distinguish CATH superfamilies. It consumes prottrans_bert_bfd embeddings and reduces the embedding dimensionality from 1024 to 128. See https://www.biorxiv.org/content/10.1101/2021.01.21.427551v1
ProtTransT5UniRef50Embedder. This version improves over T5 BFD by being finetuned on UniRef50.
- Added a
half_modeloption to both T5 models (
prottrans_t5_bfd). On the tested GPU (Quadro RTX 3000)
half_model: Truereduces memory consumption
from 12GB to 7GB while the effect in benchmarks is negligible (±0.1 percentages points in different sets,
generally below standard error). We therefore recommend switching to
half_model: Truefor T5.
- Added DeepBLAST from Protein Structural Alignments From Sequence (see example/deepblast for an example)
- Dropped python 3.6 support and added python 3.9 support
- Updated the docker example to cache weights