v1.1.0
25.09.2025 - Version 1.1.0
Feature
- Adding blosum62 predefined embedder via the
blosumpython package using the blosum substitution matrix as embeddings - Adding AAOntology predefined embedder from https://doi.org/10.1016/j.jmb.2024.168717 using amino acid feature
scales - Adding biotrainer-ready quickstart
datasets (subcellular location
and secondary structure) in theREADME.md - Adding masked language modeling (MLM) task via residue_to_class protocol, CNN decoder and
random_maskingoption in
finetuning config - Adding lora examples for MLM and downstream tasks
- [BETA] Adding
residue_to_valueprotocol
Breaking
- Refactoring confidence range calculation to use empirical distribution.
Bootstrapping and MCD used assumption of normal distribution, which is okay for large sample sizes due to CLT. But it
is more appropriate to use the empirical distribution, giving better upper and lower bounds especially for small
sample sizes - autoeval: Adding framework name to task name in autoeval. This makes it easier to add multiple frameworks in the
future - autoeval: Changing autoeval FLIP scl protocol to sequence_to_class. This requires less resources but is also valid
to evaluate plms
Maintenance
- Updating dependencies
Fixes
- Fixing broken use_half_precision embeddings mode and adding comment about downstream float32 precision usage