v1.0.0
1.0.0 (2025-08-18)
Bug Fixes
- 3to1 (ab6b4b2)
- adapt naming of regression tests to match new names (c44b387)
- add 'overwrite' option to view_pymol to avoid updating existing structures (#64) (ac0f12d)
- add
maketo apptainer (7cba23e) - add back readme (#1) (831bc23)
- add back stacking msas by recycle (#2) (fbe0c32)
- add conda init (2e0a0c2)
- add current data to fail log for ease of analysis (da4bdf7)
- add links to the ccd & pdb mirrors (430ae71)
- add missing default (4f020cf)
- add missing test files for local test (68f2e0a)
- add missing transforms in AF3 pipeline (39a465d)
- add new logo and changes of urls to public url (b753e57)
- add test (80b6113)
- add test cases (11dbb61)
- add test coverage bit (57166f5)
- add testpypi setup: (7ded1bf)
- add tests for
fix_formal_charge, ruff (0a4072d) - adding badges (4588955)
- address minor pipeline issues in af3 (3943cd7)
- adjust error type on transform history tracking (a468233)
- af3 parsing (#130) (37c6791)
- allow
remove_unsupported_chain_typesto work without specified query_pn_unit_iids. Implement functional API while we're at it. (126b846) - allow AddRFTemplates to proceed when no
pdb_idgiven (c63f10e) - Allow compatibility with newer rdkit version. (#122) (e6ecbac)
- allow more general covalent bonds (1ef9858)
- allow parsing entries with multiple methods (e.g.
5e5j) (28ad455) - allow passing on boolean annotations, allowing distogram bins to be a list (9253102)
- allow processing to continue in the case of covalent bonds between... (88036e4)
- allow saving of failed examples to error, default to a user-based failures path on scratch (c3160de)
- allow unknown users for CI (fa14dda)
- apptainer creation to expose /net (24b8be4)
- apptainer spec (a0c3294)
- arg_fixing: swap coordinates of nh1/nh2 instead of renaming when resolving ARG naming ambiguity, since otherwise charges & bond order are inconsistent (NH2 carries positive charge & double bond by convention) (#41) (8d4b0a6)
- argument error (9c5daba)
- atom level embeddings (#159) (ebaaf51)
- automorphisms (#36) (7cd6ad2)
- avoid building covalent bonds with water or crystallization aids (951a12c)
- bad ligands, new test dataset (0234fab)
- bonds (#125) (2b1a714)
- bug fixes for inference (#46) (e5254d9)
- bug in initializing chain info (7c89186)
- bugfix when using get_residue_starts and general annot_start_stop_idxs, which incorrectly used len() instead of .array_length() to determine the size of an AtomArrayStack (#65) (9b2cc83)
- Bugfixes in get_within_group_res_idx and get_within_poly_res_idx (#121) (4955d19)
- bugs in tests (6b72a3f)
- bugs in using MSAs for inference, supporting MSAs with # headers (f7c2c44)
- build apptainer (ce3c4d6)
- build assembly arguments (905e6b9)
- by default cast aromatic bonds to same order when comparing atom arrays for graph hashes (4587d10)
- cached conformers with chirals (#149) (cec9f83)
- calculate rf2aa chirals off af3 centers (so they are correct) (#114) (64bfca9)
- categories: keep residues not in the CCD instead of converting to UNL (#47) (6a9b0a1)
- chain type miss (0099133)
- chain_id to _iid in Frank's hotfix (9fe6186)
- chains with all resolved tokens (886ffc3)
- changing chain_iid to pn_unit_iid in AF3 features (181467e)
- changing inference ligand residue names to use non-conflicting characters (641f1e6)
- charges (d730b8a)
- chirals (#105) (732af76)
- ci (119b5fa)
- ci (7720ee7)
- cif files for inference (#79) (f552453)
- cif: remove automatic writing of 2d categories to cif (20bbb1f)
- clash enum (f2f2613)
- correct bug with assume_residues_all_resolved when parsing pdb file (#22) (33838cb)
- correct for bonds from nucleophilic additions (#100) (eb65472)
- correct handling of dative bonds, improve timeout error handling in conformer generation (e66c03b)
- correct ligand filtering expressions to deal with None/NaN values, remove superfluous transform history test assertion (35cf5fa)
- correct usage of
fix_formal_chargesto only work on inter-residue bonds (0cf5e10) - corrected bond adding in
get_structure(#84) (6d69f4f) - correctly resolve atoms to closest resolved residue in sequence (#88) (0de26f1)
- create
featsdict only if it doesn't exist yet (0f01c8d) - datasets (8f128f0)
- dealing with sequence heterogeneity (e.g.
3nez) (79f60cb) - decouple linting of biotite and legacy cifutils (3c8c535)
- default to original atom id if renaming cannot be performed (4d1efa3)
- dna tests (09dcb86)
- documentation (483abb9)
- downgrade
cluster size < len(df)assertion to warning, fix rare issue with templates havinginfconfidence (0d6d4d7) - dynamic template paths (#104) (bb12509)
- dynamically generate cifutils version & track any commited or uncommited changes (151fa37)
- embedding dim, FilePath dataset generalization (#156) (ef867be)
- empty struct_conn (3b02789)
- enable datahub version extraction even when symlinked (718b023)
- enable looking for alt_atom_id's in parsing struct_conn connections as well (96b06cd)
- enable version extraction when cifutils is sym-linked (fa24093)
- enforce correct numpy shapes (3e4b02a)
- ensure /squash gets mounted in ci (e7e519c)
- ensure dataloader wrappers & encoding definitions are pickle-able for use in
spawnmultiprocessing (a9e414e) - ensure debug path is read/writeable by others (c74fbeb)
- ensure element is given as
str(4529045) - ensure element is str even if atomic number is given (83ec2d3)
- env token (4031d8f)
- env token (350c2de)
- explicitly specify NA values (4fe2121)
- fallback CCD coordinates (#101) (d6c797b)
- first try installing via BIOTITE_INSTALL_TOKEN (e0903af)
- fix
pad_dnatests (1ca0367) - fix accidental nesting (c0430e6)
- fix arginine ambiguity resolving function (1b96ae1)
- fix CI paths & add stage for slow tests) (0cf3679)
- fix conftest import (075545e)
- fix file type test (4ad62b9)
- fix flaky pad_dna tests (0019c2a)
- fix matching of bond atom ids and names (6437d26)
- fix old tests which broke due to function signature change (eb64dee)
- fix operation expression parsing for rare exceptions (a398911)
- fix ref_space_uid to res_id, not token_id (#97) (ae0ac04)
- fix regression tests to have
nancoords for unoccupied atoms (d964165) - fix remaining issues with conformer generation transform (e59a65e)
- fixes for data preprocessing (#124) (ed10245)
- force pip upgrade of biotite (c2b6de7)
- formats, subset to keys (670736d)
- further CI improvements (e9b3476)
- further test fixes from refactor & logging improvements (625cfc9)
- handle annotation carry over from atom_array to full_atom_array by id matching instead of via ordering. This resolves the remaining matching problems (36a1a0b)
- handle CIF files with no resolved atoms (19ae934)
- hydrogen addition placement in parser affecting resolving residu… (#74) (86a2973)
- hydrogen policy (97efd06)
- import error (801165e)
- imporve error message (4266666)
- improve error message (b8046d7)
- improvements for PadDNA (#110) (c35443f)
- in case of unknown atom names, do not rename (fix to allow us testing against assinging unknown atom names 0 occupancy in
test_parser) (0a900cf) - include apptainer build in ci (#8) (05546e2)
- include chembl smirks in package (4f48f45)
- include nucleic acids when masking residues with unresolved back… (#96) (9bbc326)
- increase stringency on inferred polymer bond creation. Only create bonds between AA-AA or NA-NA like residues automatically. Everything else must be defined in struct_conn (1f362f9)
- inference: fixes for inference (#59) (e6a62da)
- inferred sequence (c74946c)
- io_utils: allow writing scalars (707ef8a)
- io_utils: also allow passing pathlib.Path to read_any (7c887fb)
- io_utils: do not write empty CIF categories, which otherwise causes an error (7c49f15)
- io: backwards compatibility bug (2965ee3)
- issues with RDKit pickling that lead to information loss for inferring atom names (4940fbe)
- leaving_groups: fix leaving group computation for edge case of only hydrogens, add further tests (fd749ee)
- loading entity from spoofed cif (#131) (c1b0abb)
- make
PadDNAoptional (2dcc610) - make arginine renaming work (b3c55dd)
- make cif parsing more robust to non-existent fields (0220241)
- make compatible with cifutils hydrogens (#77) (68f321b)
- make conformer generation timeout more lenient (79e7b08)
- make IPython part optional (CI containers don't have ipython) (23c1712)
- make leaving group identification insensitive to hydrogens for robustness (2e6a889)
- make openbabel dependency optional by try-excepting import (c66639c)
- make rdkit-dependent regression tests pass regardless of operating system (e5f0f0b)
- make test import function properly with pytest without requiring module to be installed via pip (a4e8235)
- make type hints in tests backward compatible with python <3.10 (5d169a4)
- make typehints backwards compatible with python versions <3.10 (8fbe3ac)
- metadata check in test parser (495ba44)
- migrate viz_utils to cifutils (f9a9eb9)
- minor bugfixes to tests (c7c214c)
- minor fixes to tests (e841dfc)
- minor improvements & generalisations (removing unneccessary reliance on
extra_info, allowing templates to work with chain types as enums/ints/strings, ...) (1887e5d) - minor integration bugs from renaming, template masking (4f4f38c)
- minor test fixes (d967d8f)
- minor test issues, make ARG renaming optional to compare to legacy parser (159fb59)
- minor updates for production (#158) (64ca81b)
- misc bug fixes (e98792d)
- missing residues get
nancoords (fa877a8) - missing test PDB IDs (b0fc6f4)
- molecule_iids (064a69a)
- more test bugs (7d9846f)
- more workers for CI (21ecddc)
- MR comments, tests (383f24a)
- msa bug (69ad2c8)
- MSA caching (eefb05f)
- MSAs with NCAA (#101) (813862f)
- mse tests (a54c7dc)
- multiple ligands during inference (581b23c)
- multiple ligands during inference (54b840c)
- name parameter in tests (360a373)
- names (fdaf9b0)
- naming (b6753a3)
- new env token for pypi (#3) (1948c74)
- non-update of AtomArrayStack and missing fields in added H atoms (#62) (65afa5d), closes #63 #64 #65
- np.full (b2823ed)
- only log if heavy atoms failed to match (b1659f0)
- PadDNA and rdkit bugs (#124) (19f80d9)
- PadDNA asserts to warnings (#102) (e9848da)
- parsers and warnings (#129) (71a54a0)
- pass args to parse_from_cif/pdb (2d001d2)
- patch biotite array (#107) (5375829)
- patch biotite's get_residue_starts function to differentiate between residues of different transformation ids (dbef1c4)
- patch error where leavingroups were overwritten by latest found group instead of accumulated (c13d795)
- paths (b3c5258)
- paths to biotite (91878b9)
- PDBs with polymers and NPs on same chain (7f5fdf1)
- peptides as polymers during inference (#82) (bc915aa)
- peptides in AF3 validation splits (70ece97)
- per default, set output of
atom_array_from_rdkitto hetero atoms (28a432c) - permissions for caching (e3843e9)
- place unresolved atoms (#86) (1af2f20)
- rdkit from smiles (#127) (d5d1e3c)
- rdkit: utf-8 encoding (#20) (f69f683)
- re-use transform for hydrogen removal, set default hydrogen policy to 'keep' when transforming to atom arrays (#61) (28672ee)
- readability improvements (fbe9d28)
- refactoring to not change ground truth (e237a4a)
- reintroduce _get_matching_atom for error handling (57926ef)
- reintroduce masking of residues that had a heavy atom mismatch (c13c36e)
- relax tolerance (dc0718c)
- remaining path changes (2296fe1)
- remove
needsfrom .gitlab-ci.yml file (d112a3a) - remove automorphisms from rdkit (6272b3a)
- remove close pn units column (0dea67c)
- remove cuda (6b087ed)
- remove custom error type wrapping due to pickling issues (a4587f8)
- remove deprecated
onlystatement (64e4ae1) - remove deprecated
remove_hydrogensargument (6b90a72) - remove duplicate test (a36afad)
- Remove erroneous assert statement in GenericDFParser (#120) (8f950cf)
- remove legacy parser samples that do not match up anymore due to parser improvements (42588cb)
- remove query seq from extra msa first row (1f43f74)
- remove redundant #TODO (110bc9d)
- remove spurious argument (c380e88)
- remove unecessary dataclass which causes errors downstream (58710fd)
- remove unused imports, add missing imports, clean up whitespaces (e8a5254)
- rename msa to msa_stack in AF-3 pipeline (2c76c09)
- replace README (#1) (cbe76c0)
- resolve atom_order mismatch bug (bc82059)
- resolve occasional duplicate indices (e.g.
4xkw) by also specifying res_name (5ccd8f0) - revert CI (763a295)
- run apptainer job on worker (c982a56)
- safeguard coordinate extraction from ideal rdkit conformers in case no ideal rdkit conformer coordinates are provided in the cif (33fa08b)
- samplers (#111) (ae9cf6f)
- selection strings, utils (#85) (bf9157f)
- semantic release as single source of truth for versions (cd467f2)
- set
crop_center_atom_id,..._atom_idxand...token_idxeven when not cropping for forward compatibility with further atom-level cropping (#155) (cc39788) - set CI to fail if CI job gets killed (12f1256)
- set conformer default to not use forcefield optimization (b40bc5e)
- set log levels from warning to info (5c7479a)
- setup testpypi (#2) (63b8e95)
- simplify and fix CI (1b0d453)
- skip bind/nobind tests (c1f73d8)
- skip pad DNA tests (#163) (a9cd7db)
- sort imports, clean up
parseAPI (a5271c8) - specify biotite internal dtypes for matching to avoid rare cases of long chain ID's etc (2455d37)
- speed up
parseby 3x by vectorizing various subroutines, reducing the amount of subsetting and adding a cache hierarchy (#44) (d3b7f54) - speed up tests (8070409)
- standardize heavy atom naming for matching rare cases where alt atom id's are used (fe69d4b)
- subset atom id renaming to heavy atoms (8cea0eb)
- support AF-3-style CIFs (#67) (947d1cd)
- switch to patched
get_residue_startsfunction to avoid rare bugs where, after cropping, two residues that only differ bytransformation_idare sequential in the atom array and get misinterpreted as a single residue (593eebb) - sym center trans id type (cea2377)
- template: correct usage of
fix_formal_charges(89b7ab0) - template: keep hydrogens around until fixing formal charges (f774b4b)
- test bug (3fbffe8)
- test for updated inference with multiple ligands (a8fc1ca)
- test restricting to CI to merge requests onto
mainand non-drafts only (1a26c34) - testing speed (435ce48)
- tests (193a9cd)
- tests (b4116df)
- tests (4e19c9a)
- tests (4220698)
- tests (b994834)
- tests for AF3 pipeline (c93dc4c)
- tests for build assembly arguments (1bb9ff2)
- tests refactored transforms (de4c7b5)
- tests: fix import error (99ce9b4)
- transforms/base: treat edge case of 0 probabilities in RandomRoute (7cd74fd)
- try/except ccd loading (9c43eec)
- try/except ccd loading for inference (6f3fc99)
- type-cast for older torch versions (c83eccd)
- type-cast issue (dae0479)
- typo (e269513)
- typo (25d9fc7)
- typo (ee5f811)
- typo (00e1d5e)
- typo fix in ruff config (c65895f)
- typo in list o_O (a52193f)
- unconditional support, try/except metrics (#116) (19bc1dc)
- undo accidental too deep nesting (e5795c7)
- unnest bucketize (ea69aeb)
- unwanted ions in validation (#41) (0926ec9)
- update caching to save as .pkl.gz and fix path creation (dc34984)
- update ci (8fd01c6)
- update ci for github (#10) (1fe9024)
- update dependencies to align with rf3 (cd9e8b7)
- update description (f376f09)
- update deserialization checks (3101a77)
- update encodings to work with AF3SequenceEncoding (a62feba)
- update outdated escape pattern (6a85641)
- update paths to new datahub repo (e98c490)
- update pipeline regression tests (7d6f57b)
- update RDKit version (8386c69)
- update regression tests (0a12fca)
- update regression tests to ignore hydrogens, fix typos, add debug code to regression tests (9a599e5)
- update standardize heavy atom id renaming to deal with elements as integers and strings (12c807b)
- update test coverage (2939f46)
- update tests to reload general CIFParser object for test speed (d2be606)
- update to biotite main, remove hack for NaN coordinates (#54) (48a49b3)
- update to latest cifutils (fde3f3f)
- update to latest cifutils apptainer (73bb693)
- upgrade > force-reinstall to enforce updating even on non-version bump commits (c1629e6)
- use extra info dictionary with all_pn_unit_iids (d1a8380)
- validation: update to biotite main, fix msa bug, add LoadBalancedDistributedSampler (#74) (4711256)
- various bug fixes / refactors (e989c3e)
- various minor fixes to get pipeline profiling running again & extend to af3 (4b7547f)
chore
- update cifutils (7474b0d)
Features
- add .bcif parsing test (951a0c2)
- add
category_to_dictutil (9aa4c9f) - add
query,maskandidxsfunctionality (#111) (73247d8) - add
show_cartoonargument toview(#63) (f66071e) - add
sum_string_arraysutil to sum string arrays with dynamic dtype resizing (e185fa6) - add
view_pymolfunctionality and improveto_cifexports (#38) (05df1c6) - add AF2 FB Distillation dataset & corresponding tests (b27cbee)
- add af3 inference pipeline (d842d05)
- add AF3 token level features (96af662)
- add arginine renaming tests (61ae5a5)
- add atomworks CLI (dc84b1b)
- add autoformatting commands and make script (682119f)
- add automorphisms to AF3 pipeline (45b01f9)
- add capabilities to slice atom array by segments (e.g. ResIdx / ChainIdx segments) (#72) (576a16d)
- add centering and principle components to atomselectionstack (#114) (9f3eb39)
- add chain types for 'water', 'branched' & 'macrolide' (44ac820)
- add CI apptainer building stage, improve test speed, fix minor CI bugs, add CI secrets, bump environments, add testmon & xdist pytest plugins for speeding up tests (9344fad)
- add code for plotting pipeline performance (46a1344)
- add confidence head processing to af3 pipeline (#56) (2d590aa), closes #41 #45 #44
- add conformer generation for smiles to keep stereochemical annotation (#78) (6952251)
- add contants, remove old assets (55f2266)
- add convenience API for ChainType enums (0bc5dc6)
- add convenience readability utils (0ea9676)
- add crystallization aid & ligands to remove data (4b3248b)
- add custom context for handling errors (#90) (09e9642)
- add dynamical string size resizing to get minimum length (a615926)
- add encdoing to pipeline (a03f3d4)
- add environment specification (b3edcb5)
- add fixing of formal charges for atom arrays for inference (dd7fc18)
- add flag to fix formal charges (a4eafae)
- add full fledged AF3 Encoding (3cf571c)
- add functional API for remove hydrogens (7d43400)
- add functional API for spatial cropping (c07475a)
- add functionality to remove crystallization aids, including a test (2efc191)
- add further functional API (beeb6fb)
- add further rf2aa assumption check that ensures that no individual chain can ever be entirely unresolved (as can happen e.g. with chain AB in 3rj1) (fa04418)
- add geometry utils (0adf3c7)
- add ground truth ref_pos through new track (#115) (2bcf1bb)
- add group scatter utils (#119) (baa1739)
- add hydrogens via biotite supported hydride library (#56) (9df9801)
- add immutable_lru_cache, add mapping of chem_comp_types to their corresponding UNKNOWN ccd (c00bbd8)
- add inference utils, add rdkit utils and clean up base cifutils (enums, constants) (#18) (5071084)
- add is_same_in_segment convenience function (#91) (8e7aa7b)
- add ligand of interest information (840a3b0)
- add mapping of noncanonicals to canonical residues (60fddc4)
- add metal elements as constants (c036af5)
- add missing ChEMBL rules for fixing (b006ddf)
- add MSA paths into chain info (#52) (615bba3)
- add offset-slope timeout for rdkit conformer generation (a824994)
- add pdb example from FB distillation set (93f3633)
- add prior bugs as test cases (e15ca01)
- add reference molecule feature transforms (3c66009)
- add RemovePolymersWithTooFewResolvedResidues Transform to pipelines (c6b756d)
- add scaffold for environment, apptainer spec, readme, add test coverage (2cc9912)
- add scaffold test for fixing operations (14552a3)
- add script for fast ColabFold-style MSA generation with MMseqs-GPU (#71) (06ba64b), closes #76 #95 #99
- add scripts for convenient IPD specific setup, add documentation (2e86cdd)
- add scripts to get the ccd & pdb mirrors and replace digs-specific paths through the corresponding mirror paths (753485c)
- add standard NAs & AAs, remove old assets (9d648a7)
- add support for bcif & pdb filetypes, add universal loading, improve to_cif functionalities to allow outputting arbitary metadata (f380319)
- add support for rf2aa inference pipeline (8b48008)
- add test for selection utils (3dd5e4f)
- add tests for geometry utils (9a428db)
- add tests for timeout utils (aeb4c24)
- add tests for visualize (ce6674e)
- add tests for writing out cifs (292c8ad)
- add the abillity to randomly pad DNA (#84) (5cd4088), closes baker-laboratory/cifutils#72 #85 #87
- add timeout context manager (25beacc)
- add timeout decorator (72ed6a2)
- add tipatom constants (9f7fe70)
- add to_pdb_string tests (c638ee9)
- add tools for nested dictionaries (2ccd3ad)
- add tools to fix partially corrupted molecules, streamline atomarray <> rdkit interconversions (999dc39)
- add transform to compute spatial k-nn masks useful for spatially local attention (#60) (23665ff)
- add transform to further shrink a crop if the crop at token level would result in a crop that exceeds a specified max number of atoms (#9) (0769a8a)
- add unresolved residue handling to pipeline (09cc692)
- add util to compute rng hash for convenient debugging & easy random state comparisons (#98) (d2d9d63)
- add utility to get RDKit conformers from res names with timeout & fallback to idealized coords (b6cc8e3)
- add utils for writing cif files regardless of where they come from, enable view_pymol to visualize CIFBlock & BinaryCIFBlocks (#66) (d03df1a)
- add utils to get automorphims from rdkit (2561644)
- add utils to get idxs and masks for representative tokens in AF3 (87c390a)
- add utils to go directly from res_name to rdkit molecules, fix capitalization of element lookup (0c44352)
- add utils to patch metals at symmetry centers (bfca0f8)
- add utils to standardize atom id's to the standard atom id instead of alternative atom id (44da2a3)
- add visualization utils for atom arrays (043c263)
- address bad conformer id issue for molecules with many rotatable bonds (0caefe3)
- af-3 validation dataset loaders initial commit (920bbfa)
- allow
token_startsre-use in token utils, add safeguards for ensuring each token has a representative atom (causes dataloader failures instead of model failures downstream) (3359279) - arbitrary nested datasets (11cafee)
- assign stereo-chemistry when converting an atom_array to rdkit based on the coordinates (if possible). Make ccd_code_to_rdkit caching immutable. Add nan-coord utils for AtomArrays & Stacks (#77) (4992c7e)
- atom-level embeddings (#151) (58f10c3)
- AtomArrayPlus, AtomArrayPlusStack (#109) (7c6622e)
- bump biotite version, fix CI, speed up test collection, clean up pyproject.toml (96e8dc3)
- bump ruff version (41a9b6f)
- caching stores parameters (#45) (ec714ee)
- chiral center processing bugfix (#103) (631ab8e)
- ci improvements for post-merge pipeline, adding auto-coverage and releases (5fe345d)
- convert_af3_model_output_to_atom_array (5d517b5)
- database utils for bind/no-bind project (#154) (2f298eb)
- disentangle AF3 token representative and token center definitions, bump cifutils version (b61f466)
- doc updates (c5772d5)
- docs and release setup (#126) (357a681)
- enable parallel tests to worksteal (f7f96ca)
- enable reading 'all' extra_fields in parser (#76) (a5e70bc)
- expose altloc specification during loading, fix saving of altloc id when none specified, to avoid biotite parsing issues (7214edd)
- expose datetime and user when logging failed examples, only log per default if user is given (c39ef16)
- expose option to choose RDKit conformer generation method (e0760bd)
- expose pdb id (640010a)
- extract RF2AA assumptions check into its own Transform, generalize PDBDataset (3285be0)
- featurization of unresolved residues to avoid distribution shifts (e78da27)
- final splits (7e9b2b6)
- first attempt at github ci (#2) (a64bfc0)
- from_pymol_str for AtomSelection (0a66dad)
- generalized-preprocessing (#44) (f20c85c)
- ground truth reference conformer (#83) (c5f9a89)
- gzip cifs by default (#53) (4fdd6e8)
- implement
to_pdbmethods (8728781) - implement
WorkStealDataLoaderto de-bottleneck dataloading when parsing from files with highly variable runtimes (bcf5868) - implement AF3 template featurization and harmonize RF2AA & AF3 names (07c5867)
- implement automatic semantic versioning release (ae2d775)
- implement automorphism features for AF3 (f1bf992)
- implement building multiple assemblies (e591844)
- implement fixing formal charges after bond formation (e06e39d)
- implement MSE to MET conversion (cf8c0d6)
- implement proper passing on of random seeds to RDKit (so random… (#93) (92f5cc8)
- implement resolving of ARG naming ambiguities (391fbff)
- implement standard to alternate atom id translation (37cd4e7)
- implement test for bioassembly building (5846d2e)
- implement user error when trying to save files with ambiguous bond information (498ad64)
- imporve timeout error messages (cf945c3)
- improve base transforms by avoiding error masking by TransformPipelineError, implement + operator for transforms, implement basic
ApplyFunctiontransform (f8eb1e0) - improve ci with auto-coverage tests (9bd6fe8)
- improve ruff rules & add option to configure number of cores to run pytest on (7a9589f)
- include an apptainer build stage in CI (2e5b087)
- include atom order tests with PDBs that violate atom ordering (af587fb)
- inference bugs (#63) (ee49a31)
- inference like AF3 (#104) (526ae95)
- initial split notebook (3e65ed5)
- integrate pdb parsing into CIF Parser (a26451d)
- integrate templates into AF3 pipeline (3dcd50e)
- integrate timeouts in rdkit conformer generation (9310df1)
- interface splits (ccddbcf)
- io_utils,visualize: add bcif output capabilities, generalize
view_pymol, (#60) (4216f16) - load cached reference conformers (#145) (257845c)
- mask residues with unresolved backbone atoms (84ba100)
- md5 hash for hash_atom_array (ec4bd85)
- migrate setupy.py instruction into pyproject.toml (b530685)
- more robust incrementing of chain ids (4199bfd)
- move init arguments to
parse, integrate ARG ambiguity resolving (6069770) - MSAs from multiple directories (015a75f)
- mse_met conversion test improvements, tightening of typing (9efca8a)
- NCAA for inference (#86) (c0c6977)
- nested datasets bug fixes (48f36ef)
- networkx automorphisms, no tests (cedf164)
- parse AtomArrays directly, introduce PDBOrCIFFileComponents (#94) (0ff8a0a)
- patch symmetry centers (875a44c)
- peptide sampling (127bbc9)
- pH info metadata (#122) (9e8fea4)
- pipe: add flag to return atom array from af3 pipeline (#76) (a79abe3)
- random remove ligands (#144) (775853b)
- remove polymer chains with too few resolved residues; refactor filters (5a93d8f)
- script to count AF3 tokens initial draft (dbdd489)
- selection: add
n_body='all'option toget_annotation_categories()(#116) (da5f09f) - separate peptides (be894b4)
- SequenceSelection utils (#80) (406cb87)
- set up CI (774f271)
- specify covalent bonds during inference (#27) (e815a67)
- standardize atom ordering within each residue to CCD order (as some PDBs have incorrect ordering) (44c8d48)
- start adding AF3 pipeline (47e8355)
- subsample templates and rotate conformers (#35) (3c65428)
- support buffers in parse (#48) (68406fa)
- support models as af3 outputs (#39) (7d269df)
- support MSAs during inference (6643f68)
- support UNL for inference (#54) (ef8e75d)
- switch from black,isort,autoflake > ruff (d2038b6)
- switch from internal biotite to public biotite 1.1 (98eae71)
- take first chiral subordering (#125) (c383977)
- template: implement template creation and matching (9e04e5e)
- update AF3 pipeline to include reference features (660ee92)
- update residue library creation from CCD to enable idealized coordinates... (9238b6f)
- update scripts to count tokens (37e5ab8)
- update timeout utils to support both signal & subprocessing based timeouts (subprocess strategy needed for RDKit timeouts) (f651ba4)
- update to local biotite installation (5d3ba27)
- update to public biotite (v1.1.0) and make apptainer digs independent (#55) (ddfe62f)
- updates from 2d conditioning (#139) (3f1b34a), closes #142
- upgrade CI to use git clone instead of apptainer rebuilding, fix broken chirals (#147) (41b6c80)
- upweight LOI (92df8e2)
- use element for atom name char of atomized tokens (#67) (e35e2b0)
- user-friendly MSA generation entrypoint (#138) (d55b9c4)
- versioning: Include automatic versioning of datahub version, in… (#59) (a8636ad)
- visualize: add
slotcapabilities forview_pymol(#115) (06f6c16)
Performance Improvements
- significantly speed up representative coordinate fetching (9326e0c)
- vectorize custom inter-residue bond removal (3d8a6ab)
BREAKING CHANGES
-
Renamed utils files in cifutils
-
feat: add ipd setup utils, fix missing numpy import
-
fix(ci): add missing files for CI
-
chore(ci): group lint & test together again to avoid buggy github ci
-
feat: add af2 distillation dataset path for ipd setup
-
fix: combine fast & slow tests in CI, expose number of CPU cores to run on
-
fix(ci): update time limits
-
chore: clean up ci & add documentation
-
chore(ci): robustness improvements in setup.sh
-
refactor: implement src-architecture for datahub repo
-
The old
exportpaths to datahub won't work anymore and will need to add /src -
chore: cleanup for release
-
chore: format
-
fix: preprocessing pipelines
-
chore: update validation scripts, new test dataframes
-
chore: update validation notebook
-
fix: tests broken by using new test datasets
-
chore: bump cifutils version
-
update cifutils to version 1.1.0, which includes a significant refactor and breaking changes
-
Switch to updated cifutils