Skip to content

v1.0.0

Choose a tag to compare

@github-actions github-actions released this 18 Aug 18:08
· 299 commits to production since this release
3c2fd43

1.0.0 (2025-08-18)

  • BREAKING CHANGE: update to cifutils 2.0 (#50) (77dd6fd)

Bug Fixes

  • 3to1 (ab6b4b2)
  • adapt naming of regression tests to match new names (c44b387)
  • add 'overwrite' option to view_pymol to avoid updating existing structures (#64) (ac0f12d)
  • add make to apptainer (7cba23e)
  • add back readme (#1) (831bc23)
  • add back stacking msas by recycle (#2) (fbe0c32)
  • add conda init (2e0a0c2)
  • add current data to fail log for ease of analysis (da4bdf7)
  • add links to the ccd & pdb mirrors (430ae71)
  • add missing default (4f020cf)
  • add missing test files for local test (68f2e0a)
  • add missing transforms in AF3 pipeline (39a465d)
  • add new logo and changes of urls to public url (b753e57)
  • add test (80b6113)
  • add test cases (11dbb61)
  • add test coverage bit (57166f5)
  • add testpypi setup: (7ded1bf)
  • add tests for fix_formal_charge, ruff (0a4072d)
  • adding badges (4588955)
  • address minor pipeline issues in af3 (3943cd7)
  • adjust error type on transform history tracking (a468233)
  • af3 parsing (#130) (37c6791)
  • allow remove_unsupported_chain_types to work without specified query_pn_unit_iids. Implement functional API while we're at it. (126b846)
  • allow AddRFTemplates to proceed when no pdb_id given (c63f10e)
  • Allow compatibility with newer rdkit version. (#122) (e6ecbac)
  • allow more general covalent bonds (1ef9858)
  • allow parsing entries with multiple methods (e.g. 5e5j) (28ad455)
  • allow passing on boolean annotations, allowing distogram bins to be a list (9253102)
  • allow processing to continue in the case of covalent bonds between... (88036e4)
  • allow saving of failed examples to error, default to a user-based failures path on scratch (c3160de)
  • allow unknown users for CI (fa14dda)
  • apptainer creation to expose /net (24b8be4)
  • apptainer spec (a0c3294)
  • arg_fixing: swap coordinates of nh1/nh2 instead of renaming when resolving ARG naming ambiguity, since otherwise charges & bond order are inconsistent (NH2 carries positive charge & double bond by convention) (#41) (8d4b0a6)
  • argument error (9c5daba)
  • atom level embeddings (#159) (ebaaf51)
  • automorphisms (#36) (7cd6ad2)
  • avoid building covalent bonds with water or crystallization aids (951a12c)
  • bad ligands, new test dataset (0234fab)
  • bonds (#125) (2b1a714)
  • bug fixes for inference (#46) (e5254d9)
  • bug in initializing chain info (7c89186)
  • bugfix when using get_residue_starts and general annot_start_stop_idxs, which incorrectly used len() instead of .array_length() to determine the size of an AtomArrayStack (#65) (9b2cc83)
  • Bugfixes in get_within_group_res_idx and get_within_poly_res_idx (#121) (4955d19)
  • bugs in tests (6b72a3f)
  • bugs in using MSAs for inference, supporting MSAs with # headers (f7c2c44)
  • build apptainer (ce3c4d6)
  • build assembly arguments (905e6b9)
  • by default cast aromatic bonds to same order when comparing atom arrays for graph hashes (4587d10)
  • cached conformers with chirals (#149) (cec9f83)
  • calculate rf2aa chirals off af3 centers (so they are correct) (#114) (64bfca9)
  • categories: keep residues not in the CCD instead of converting to UNL (#47) (6a9b0a1)
  • chain type miss (0099133)
  • chain_id to _iid in Frank's hotfix (9fe6186)
  • chains with all resolved tokens (886ffc3)
  • changing chain_iid to pn_unit_iid in AF3 features (181467e)
  • changing inference ligand residue names to use non-conflicting characters (641f1e6)
  • charges (d730b8a)
  • chirals (#105) (732af76)
  • ci (119b5fa)
  • ci (7720ee7)
  • cif files for inference (#79) (f552453)
  • cif: remove automatic writing of 2d categories to cif (20bbb1f)
  • clash enum (f2f2613)
  • correct bug with assume_residues_all_resolved when parsing pdb file (#22) (33838cb)
  • correct for bonds from nucleophilic additions (#100) (eb65472)
  • correct handling of dative bonds, improve timeout error handling in conformer generation (e66c03b)
  • correct ligand filtering expressions to deal with None/NaN values, remove superfluous transform history test assertion (35cf5fa)
  • correct usage of fix_formal_charges to only work on inter-residue bonds (0cf5e10)
  • corrected bond adding in get_structure (#84) (6d69f4f)
  • correctly resolve atoms to closest resolved residue in sequence (#88) (0de26f1)
  • create feats dict only if it doesn't exist yet (0f01c8d)
  • datasets (8f128f0)
  • dealing with sequence heterogeneity (e.g. 3nez) (79f60cb)
  • decouple linting of biotite and legacy cifutils (3c8c535)
  • default to original atom id if renaming cannot be performed (4d1efa3)
  • dna tests (09dcb86)
  • documentation (483abb9)
  • downgrade cluster size < len(df) assertion to warning, fix rare issue with templates having inf confidence (0d6d4d7)
  • dynamic template paths (#104) (bb12509)
  • dynamically generate cifutils version & track any commited or uncommited changes (151fa37)
  • embedding dim, FilePath dataset generalization (#156) (ef867be)
  • empty struct_conn (3b02789)
  • enable datahub version extraction even when symlinked (718b023)
  • enable looking for alt_atom_id's in parsing struct_conn connections as well (96b06cd)
  • enable version extraction when cifutils is sym-linked (fa24093)
  • enforce correct numpy shapes (3e4b02a)
  • ensure /squash gets mounted in ci (e7e519c)
  • ensure dataloader wrappers & encoding definitions are pickle-able for use in spawn multiprocessing (a9e414e)
  • ensure debug path is read/writeable by others (c74fbeb)
  • ensure element is given as str (4529045)
  • ensure element is str even if atomic number is given (83ec2d3)
  • env token (4031d8f)
  • env token (350c2de)
  • explicitly specify NA values (4fe2121)
  • fallback CCD coordinates (#101) (d6c797b)
  • first try installing via BIOTITE_INSTALL_TOKEN (e0903af)
  • fix pad_dna tests (1ca0367)
  • fix accidental nesting (c0430e6)
  • fix arginine ambiguity resolving function (1b96ae1)
  • fix CI paths & add stage for slow tests) (0cf3679)
  • fix conftest import (075545e)
  • fix file type test (4ad62b9)
  • fix flaky pad_dna tests (0019c2a)
  • fix matching of bond atom ids and names (6437d26)
  • fix old tests which broke due to function signature change (eb64dee)
  • fix operation expression parsing for rare exceptions (a398911)
  • fix ref_space_uid to res_id, not token_id (#97) (ae0ac04)
  • fix regression tests to have nan coords for unoccupied atoms (d964165)
  • fix remaining issues with conformer generation transform (e59a65e)
  • fixes for data preprocessing (#124) (ed10245)
  • force pip upgrade of biotite (c2b6de7)
  • formats, subset to keys (670736d)
  • further CI improvements (e9b3476)
  • further test fixes from refactor & logging improvements (625cfc9)
  • handle annotation carry over from atom_array to full_atom_array by id matching instead of via ordering. This resolves the remaining matching problems (36a1a0b)
  • handle CIF files with no resolved atoms (19ae934)
  • hydrogen addition placement in parser affecting resolving residu… (#74) (86a2973)
  • hydrogen policy (97efd06)
  • import error (801165e)
  • imporve error message (4266666)
  • improve error message (b8046d7)
  • improvements for PadDNA (#110) (c35443f)
  • in case of unknown atom names, do not rename (fix to allow us testing against assinging unknown atom names 0 occupancy in test_parser) (0a900cf)
  • include apptainer build in ci (#8) (05546e2)
  • include chembl smirks in package (4f48f45)
  • include nucleic acids when masking residues with unresolved back… (#96) (9bbc326)
  • increase stringency on inferred polymer bond creation. Only create bonds between AA-AA or NA-NA like residues automatically. Everything else must be defined in struct_conn (1f362f9)
  • inference: fixes for inference (#59) (e6a62da)
  • inferred sequence (c74946c)
  • io_utils: allow writing scalars (707ef8a)
  • io_utils: also allow passing pathlib.Path to read_any (7c887fb)
  • io_utils: do not write empty CIF categories, which otherwise causes an error (7c49f15)
  • io: backwards compatibility bug (2965ee3)
  • issues with RDKit pickling that lead to information loss for inferring atom names (4940fbe)
  • leaving_groups: fix leaving group computation for edge case of only hydrogens, add further tests (fd749ee)
  • loading entity from spoofed cif (#131) (c1b0abb)
  • make PadDNA optional (2dcc610)
  • make arginine renaming work (b3c55dd)
  • make cif parsing more robust to non-existent fields (0220241)
  • make compatible with cifutils hydrogens (#77) (68f321b)
  • make conformer generation timeout more lenient (79e7b08)
  • make IPython part optional (CI containers don't have ipython) (23c1712)
  • make leaving group identification insensitive to hydrogens for robustness (2e6a889)
  • make openbabel dependency optional by try-excepting import (c66639c)
  • make rdkit-dependent regression tests pass regardless of operating system (e5f0f0b)
  • make test import function properly with pytest without requiring module to be installed via pip (a4e8235)
  • make type hints in tests backward compatible with python <3.10 (5d169a4)
  • make typehints backwards compatible with python versions <3.10 (8fbe3ac)
  • metadata check in test parser (495ba44)
  • migrate viz_utils to cifutils (f9a9eb9)
  • minor bugfixes to tests (c7c214c)
  • minor fixes to tests (e841dfc)
  • minor improvements & generalisations (removing unneccessary reliance on extra_info, allowing templates to work with chain types as enums/ints/strings, ...) (1887e5d)
  • minor integration bugs from renaming, template masking (4f4f38c)
  • minor test fixes (d967d8f)
  • minor test issues, make ARG renaming optional to compare to legacy parser (159fb59)
  • minor updates for production (#158) (64ca81b)
  • misc bug fixes (e98792d)
  • missing residues get nan coords (fa877a8)
  • missing test PDB IDs (b0fc6f4)
  • molecule_iids (064a69a)
  • more test bugs (7d9846f)
  • more workers for CI (21ecddc)
  • MR comments, tests (383f24a)
  • msa bug (69ad2c8)
  • MSA caching (eefb05f)
  • MSAs with NCAA (#101) (813862f)
  • mse tests (a54c7dc)
  • multiple ligands during inference (581b23c)
  • multiple ligands during inference (54b840c)
  • name parameter in tests (360a373)
  • names (fdaf9b0)
  • naming (b6753a3)
  • new env token for pypi (#3) (1948c74)
  • non-update of AtomArrayStack and missing fields in added H atoms (#62) (65afa5d), closes #63 #64 #65
  • np.full (b2823ed)
  • only log if heavy atoms failed to match (b1659f0)
  • PadDNA and rdkit bugs (#124) (19f80d9)
  • PadDNA asserts to warnings (#102) (e9848da)
  • parsers and warnings (#129) (71a54a0)
  • pass args to parse_from_cif/pdb (2d001d2)
  • patch biotite array (#107) (5375829)
  • patch biotite's get_residue_starts function to differentiate between residues of different transformation ids (dbef1c4)
  • patch error where leavingroups were overwritten by latest found group instead of accumulated (c13d795)
  • paths (b3c5258)
  • paths to biotite (91878b9)
  • PDBs with polymers and NPs on same chain (7f5fdf1)
  • peptides as polymers during inference (#82) (bc915aa)
  • peptides in AF3 validation splits (70ece97)
  • per default, set output of atom_array_from_rdkit to hetero atoms (28a432c)
  • permissions for caching (e3843e9)
  • place unresolved atoms (#86) (1af2f20)
  • rdkit from smiles (#127) (d5d1e3c)
  • rdkit: utf-8 encoding (#20) (f69f683)
  • re-use transform for hydrogen removal, set default hydrogen policy to 'keep' when transforming to atom arrays (#61) (28672ee)
  • readability improvements (fbe9d28)
  • refactoring to not change ground truth (e237a4a)
  • reintroduce _get_matching_atom for error handling (57926ef)
  • reintroduce masking of residues that had a heavy atom mismatch (c13c36e)
  • relax tolerance (dc0718c)
  • remaining path changes (2296fe1)
  • remove needs from .gitlab-ci.yml file (d112a3a)
  • remove automorphisms from rdkit (6272b3a)
  • remove close pn units column (0dea67c)
  • remove cuda (6b087ed)
  • remove custom error type wrapping due to pickling issues (a4587f8)
  • remove deprecated only statement (64e4ae1)
  • remove deprecated remove_hydrogens argument (6b90a72)
  • remove duplicate test (a36afad)
  • Remove erroneous assert statement in GenericDFParser (#120) (8f950cf)
  • remove legacy parser samples that do not match up anymore due to parser improvements (42588cb)
  • remove query seq from extra msa first row (1f43f74)
  • remove redundant #TODO (110bc9d)
  • remove spurious argument (c380e88)
  • remove unecessary dataclass which causes errors downstream (58710fd)
  • remove unused imports, add missing imports, clean up whitespaces (e8a5254)
  • rename msa to msa_stack in AF-3 pipeline (2c76c09)
  • replace README (#1) (cbe76c0)
  • resolve atom_order mismatch bug (bc82059)
  • resolve occasional duplicate indices (e.g. 4xkw) by also specifying res_name (5ccd8f0)
  • revert CI (763a295)
  • run apptainer job on worker (c982a56)
  • safeguard coordinate extraction from ideal rdkit conformers in case no ideal rdkit conformer coordinates are provided in the cif (33fa08b)
  • samplers (#111) (ae9cf6f)
  • selection strings, utils (#85) (bf9157f)
  • semantic release as single source of truth for versions (cd467f2)
  • set crop_center_atom_id, ..._atom_idx and ...token_idx even when not cropping for forward compatibility with further atom-level cropping (#155) (cc39788)
  • set CI to fail if CI job gets killed (12f1256)
  • set conformer default to not use forcefield optimization (b40bc5e)
  • set log levels from warning to info (5c7479a)
  • setup testpypi (#2) (63b8e95)
  • simplify and fix CI (1b0d453)
  • skip bind/nobind tests (c1f73d8)
  • skip pad DNA tests (#163) (a9cd7db)
  • sort imports, clean up parse API (a5271c8)
  • specify biotite internal dtypes for matching to avoid rare cases of long chain ID's etc (2455d37)
  • speed up parse by 3x by vectorizing various subroutines, reducing the amount of subsetting and adding a cache hierarchy (#44) (d3b7f54)
  • speed up tests (8070409)
  • standardize heavy atom naming for matching rare cases where alt atom id's are used (fe69d4b)
  • subset atom id renaming to heavy atoms (8cea0eb)
  • support AF-3-style CIFs (#67) (947d1cd)
  • switch to patched get_residue_starts function to avoid rare bugs where, after cropping, two residues that only differ by transformation_id are sequential in the atom array and get misinterpreted as a single residue (593eebb)
  • sym center trans id type (cea2377)
  • template: correct usage of fix_formal_charges (89b7ab0)
  • template: keep hydrogens around until fixing formal charges (f774b4b)
  • test bug (3fbffe8)
  • test for updated inference with multiple ligands (a8fc1ca)
  • test restricting to CI to merge requests onto main and non-drafts only (1a26c34)
  • testing speed (435ce48)
  • tests (193a9cd)
  • tests (b4116df)
  • tests (4e19c9a)
  • tests (4220698)
  • tests (b994834)
  • tests for AF3 pipeline (c93dc4c)
  • tests for build assembly arguments (1bb9ff2)
  • tests refactored transforms (de4c7b5)
  • tests: fix import error (99ce9b4)
  • transforms/base: treat edge case of 0 probabilities in RandomRoute (7cd74fd)
  • try/except ccd loading (9c43eec)
  • try/except ccd loading for inference (6f3fc99)
  • type-cast for older torch versions (c83eccd)
  • type-cast issue (dae0479)
  • typo (e269513)
  • typo (25d9fc7)
  • typo (ee5f811)
  • typo (00e1d5e)
  • typo fix in ruff config (c65895f)
  • typo in list o_O (a52193f)
  • unconditional support, try/except metrics (#116) (19bc1dc)
  • undo accidental too deep nesting (e5795c7)
  • unnest bucketize (ea69aeb)
  • unwanted ions in validation (#41) (0926ec9)
  • update caching to save as .pkl.gz and fix path creation (dc34984)
  • update ci (8fd01c6)
  • update ci for github (#10) (1fe9024)
  • update dependencies to align with rf3 (cd9e8b7)
  • update description (f376f09)
  • update deserialization checks (3101a77)
  • update encodings to work with AF3SequenceEncoding (a62feba)
  • update outdated escape pattern (6a85641)
  • update paths to new datahub repo (e98c490)
  • update pipeline regression tests (7d6f57b)
  • update RDKit version (8386c69)
  • update regression tests (0a12fca)
  • update regression tests to ignore hydrogens, fix typos, add debug code to regression tests (9a599e5)
  • update standardize heavy atom id renaming to deal with elements as integers and strings (12c807b)
  • update test coverage (2939f46)
  • update tests to reload general CIFParser object for test speed (d2be606)
  • update to biotite main, remove hack for NaN coordinates (#54) (48a49b3)
  • update to latest cifutils (fde3f3f)
  • update to latest cifutils apptainer (73bb693)
  • upgrade > force-reinstall to enforce updating even on non-version bump commits (c1629e6)
  • use extra info dictionary with all_pn_unit_iids (d1a8380)
  • validation: update to biotite main, fix msa bug, add LoadBalancedDistributedSampler (#74) (4711256)
  • various bug fixes / refactors (e989c3e)
  • various minor fixes to get pipeline profiling running again & extend to af3 (4b7547f)

chore

Features

  • add .bcif parsing test (951a0c2)
  • add category_to_dict util (9aa4c9f)
  • add query, mask and idxs functionality (#111) (73247d8)
  • add show_cartoon argument to view (#63) (f66071e)
  • add sum_string_arrays util to sum string arrays with dynamic dtype resizing (e185fa6)
  • add view_pymol functionality and improve to_cif exports (#38) (05df1c6)
  • add AF2 FB Distillation dataset & corresponding tests (b27cbee)
  • add af3 inference pipeline (d842d05)
  • add AF3 token level features (96af662)
  • add arginine renaming tests (61ae5a5)
  • add atomworks CLI (dc84b1b)
  • add autoformatting commands and make script (682119f)
  • add automorphisms to AF3 pipeline (45b01f9)
  • add capabilities to slice atom array by segments (e.g. ResIdx / ChainIdx segments) (#72) (576a16d)
  • add centering and principle components to atomselectionstack (#114) (9f3eb39)
  • add chain types for 'water', 'branched' & 'macrolide' (44ac820)
  • add CI apptainer building stage, improve test speed, fix minor CI bugs, add CI secrets, bump environments, add testmon & xdist pytest plugins for speeding up tests (9344fad)
  • add code for plotting pipeline performance (46a1344)
  • add confidence head processing to af3 pipeline (#56) (2d590aa), closes #41 #45 #44
  • add conformer generation for smiles to keep stereochemical annotation (#78) (6952251)
  • add contants, remove old assets (55f2266)
  • add convenience API for ChainType enums (0bc5dc6)
  • add convenience readability utils (0ea9676)
  • add crystallization aid & ligands to remove data (4b3248b)
  • add custom context for handling errors (#90) (09e9642)
  • add dynamical string size resizing to get minimum length (a615926)
  • add encdoing to pipeline (a03f3d4)
  • add environment specification (b3edcb5)
  • add fixing of formal charges for atom arrays for inference (dd7fc18)
  • add flag to fix formal charges (a4eafae)
  • add full fledged AF3 Encoding (3cf571c)
  • add functional API for remove hydrogens (7d43400)
  • add functional API for spatial cropping (c07475a)
  • add functionality to remove crystallization aids, including a test (2efc191)
  • add further functional API (beeb6fb)
  • add further rf2aa assumption check that ensures that no individual chain can ever be entirely unresolved (as can happen e.g. with chain AB in 3rj1) (fa04418)
  • add geometry utils (0adf3c7)
  • add ground truth ref_pos through new track (#115) (2bcf1bb)
  • add group scatter utils (#119) (baa1739)
  • add hydrogens via biotite supported hydride library (#56) (9df9801)
  • add immutable_lru_cache, add mapping of chem_comp_types to their corresponding UNKNOWN ccd (c00bbd8)
  • add inference utils, add rdkit utils and clean up base cifutils (enums, constants) (#18) (5071084)
  • add is_same_in_segment convenience function (#91) (8e7aa7b)
  • add ligand of interest information (840a3b0)
  • add mapping of noncanonicals to canonical residues (60fddc4)
  • add metal elements as constants (c036af5)
  • add missing ChEMBL rules for fixing (b006ddf)
  • add MSA paths into chain info (#52) (615bba3)
  • add offset-slope timeout for rdkit conformer generation (a824994)
  • add pdb example from FB distillation set (93f3633)
  • add prior bugs as test cases (e15ca01)
  • add reference molecule feature transforms (3c66009)
  • add RemovePolymersWithTooFewResolvedResidues Transform to pipelines (c6b756d)
  • add scaffold for environment, apptainer spec, readme, add test coverage (2cc9912)
  • add scaffold test for fixing operations (14552a3)
  • add script for fast ColabFold-style MSA generation with MMseqs-GPU (#71) (06ba64b), closes #76 #95 #99
  • add scripts for convenient IPD specific setup, add documentation (2e86cdd)
  • add scripts to get the ccd & pdb mirrors and replace digs-specific paths through the corresponding mirror paths (753485c)
  • add standard NAs & AAs, remove old assets (9d648a7)
  • add support for bcif & pdb filetypes, add universal loading, improve to_cif functionalities to allow outputting arbitary metadata (f380319)
  • add support for rf2aa inference pipeline (8b48008)
  • add test for selection utils (3dd5e4f)
  • add tests for geometry utils (9a428db)
  • add tests for timeout utils (aeb4c24)
  • add tests for visualize (ce6674e)
  • add tests for writing out cifs (292c8ad)
  • add the abillity to randomly pad DNA (#84) (5cd4088), closes baker-laboratory/cifutils#72 #85 #87
  • add timeout context manager (25beacc)
  • add timeout decorator (72ed6a2)
  • add tipatom constants (9f7fe70)
  • add to_pdb_string tests (c638ee9)
  • add tools for nested dictionaries (2ccd3ad)
  • add tools to fix partially corrupted molecules, streamline atomarray <> rdkit interconversions (999dc39)
  • add transform to compute spatial k-nn masks useful for spatially local attention (#60) (23665ff)
  • add transform to further shrink a crop if the crop at token level would result in a crop that exceeds a specified max number of atoms (#9) (0769a8a)
  • add unresolved residue handling to pipeline (09cc692)
  • add util to compute rng hash for convenient debugging & easy random state comparisons (#98) (d2d9d63)
  • add utility to get RDKit conformers from res names with timeout & fallback to idealized coords (b6cc8e3)
  • add utils for writing cif files regardless of where they come from, enable view_pymol to visualize CIFBlock & BinaryCIFBlocks (#66) (d03df1a)
  • add utils to get automorphims from rdkit (2561644)
  • add utils to get idxs and masks for representative tokens in AF3 (87c390a)
  • add utils to go directly from res_name to rdkit molecules, fix capitalization of element lookup (0c44352)
  • add utils to patch metals at symmetry centers (bfca0f8)
  • add utils to standardize atom id's to the standard atom id instead of alternative atom id (44da2a3)
  • add visualization utils for atom arrays (043c263)
  • address bad conformer id issue for molecules with many rotatable bonds (0caefe3)
  • af-3 validation dataset loaders initial commit (920bbfa)
  • allow token_starts re-use in token utils, add safeguards for ensuring each token has a representative atom (causes dataloader failures instead of model failures downstream) (3359279)
  • arbitrary nested datasets (11cafee)
  • assign stereo-chemistry when converting an atom_array to rdkit based on the coordinates (if possible). Make ccd_code_to_rdkit caching immutable. Add nan-coord utils for AtomArrays & Stacks (#77) (4992c7e)
  • atom-level embeddings (#151) (58f10c3)
  • AtomArrayPlus, AtomArrayPlusStack (#109) (7c6622e)
  • bump biotite version, fix CI, speed up test collection, clean up pyproject.toml (96e8dc3)
  • bump ruff version (41a9b6f)
  • caching stores parameters (#45) (ec714ee)
  • chiral center processing bugfix (#103) (631ab8e)
  • ci improvements for post-merge pipeline, adding auto-coverage and releases (5fe345d)
  • convert_af3_model_output_to_atom_array (5d517b5)
  • database utils for bind/no-bind project (#154) (2f298eb)
  • disentangle AF3 token representative and token center definitions, bump cifutils version (b61f466)
  • doc updates (c5772d5)
  • docs and release setup (#126) (357a681)
  • enable parallel tests to worksteal (f7f96ca)
  • enable reading 'all' extra_fields in parser (#76) (a5e70bc)
  • expose altloc specification during loading, fix saving of altloc id when none specified, to avoid biotite parsing issues (7214edd)
  • expose datetime and user when logging failed examples, only log per default if user is given (c39ef16)
  • expose option to choose RDKit conformer generation method (e0760bd)
  • expose pdb id (640010a)
  • extract RF2AA assumptions check into its own Transform, generalize PDBDataset (3285be0)
  • featurization of unresolved residues to avoid distribution shifts (e78da27)
  • final splits (7e9b2b6)
  • first attempt at github ci (#2) (a64bfc0)
  • from_pymol_str for AtomSelection (0a66dad)
  • generalized-preprocessing (#44) (f20c85c)
  • ground truth reference conformer (#83) (c5f9a89)
  • gzip cifs by default (#53) (4fdd6e8)
  • implement to_pdb methods (8728781)
  • implement WorkStealDataLoader to de-bottleneck dataloading when parsing from files with highly variable runtimes (bcf5868)
  • implement AF3 template featurization and harmonize RF2AA & AF3 names (07c5867)
  • implement automatic semantic versioning release (ae2d775)
  • implement automorphism features for AF3 (f1bf992)
  • implement building multiple assemblies (e591844)
  • implement fixing formal charges after bond formation (e06e39d)
  • implement MSE to MET conversion (cf8c0d6)
  • implement proper passing on of random seeds to RDKit (so random… (#93) (92f5cc8)
  • implement resolving of ARG naming ambiguities (391fbff)
  • implement standard to alternate atom id translation (37cd4e7)
  • implement test for bioassembly building (5846d2e)
  • implement user error when trying to save files with ambiguous bond information (498ad64)
  • imporve timeout error messages (cf945c3)
  • improve base transforms by avoiding error masking by TransformPipelineError, implement + operator for transforms, implement basic ApplyFunction transform (f8eb1e0)
  • improve ci with auto-coverage tests (9bd6fe8)
  • improve ruff rules & add option to configure number of cores to run pytest on (7a9589f)
  • include an apptainer build stage in CI (2e5b087)
  • include atom order tests with PDBs that violate atom ordering (af587fb)
  • inference bugs (#63) (ee49a31)
  • inference like AF3 (#104) (526ae95)
  • initial split notebook (3e65ed5)
  • integrate pdb parsing into CIF Parser (a26451d)
  • integrate templates into AF3 pipeline (3dcd50e)
  • integrate timeouts in rdkit conformer generation (9310df1)
  • interface splits (ccddbcf)
  • io_utils,visualize: add bcif output capabilities, generalize view_pymol, (#60) (4216f16)
  • load cached reference conformers (#145) (257845c)
  • mask residues with unresolved backbone atoms (84ba100)
  • md5 hash for hash_atom_array (ec4bd85)
  • migrate setupy.py instruction into pyproject.toml (b530685)
  • more robust incrementing of chain ids (4199bfd)
  • move init arguments to parse, integrate ARG ambiguity resolving (6069770)
  • MSAs from multiple directories (015a75f)
  • mse_met conversion test improvements, tightening of typing (9efca8a)
  • NCAA for inference (#86) (c0c6977)
  • nested datasets bug fixes (48f36ef)
  • networkx automorphisms, no tests (cedf164)
  • parse AtomArrays directly, introduce PDBOrCIFFileComponents (#94) (0ff8a0a)
  • patch symmetry centers (875a44c)
  • peptide sampling (127bbc9)
  • pH info metadata (#122) (9e8fea4)
  • pipe: add flag to return atom array from af3 pipeline (#76) (a79abe3)
  • random remove ligands (#144) (775853b)
  • remove polymer chains with too few resolved residues; refactor filters (5a93d8f)
  • script to count AF3 tokens initial draft (dbdd489)
  • selection: add n_body='all' option to get_annotation_categories() (#116) (da5f09f)
  • separate peptides (be894b4)
  • SequenceSelection utils (#80) (406cb87)
  • set up CI (774f271)
  • specify covalent bonds during inference (#27) (e815a67)
  • standardize atom ordering within each residue to CCD order (as some PDBs have incorrect ordering) (44c8d48)
  • start adding AF3 pipeline (47e8355)
  • subsample templates and rotate conformers (#35) (3c65428)
  • support buffers in parse (#48) (68406fa)
  • support models as af3 outputs (#39) (7d269df)
  • support MSAs during inference (6643f68)
  • support UNL for inference (#54) (ef8e75d)
  • switch from black,isort,autoflake > ruff (d2038b6)
  • switch from internal biotite to public biotite 1.1 (98eae71)
  • take first chiral subordering (#125) (c383977)
  • template: implement template creation and matching (9e04e5e)
  • update AF3 pipeline to include reference features (660ee92)
  • update residue library creation from CCD to enable idealized coordinates... (9238b6f)
  • update scripts to count tokens (37e5ab8)
  • update timeout utils to support both signal & subprocessing based timeouts (subprocess strategy needed for RDKit timeouts) (f651ba4)
  • update to local biotite installation (5d3ba27)
  • update to public biotite (v1.1.0) and make apptainer digs independent (#55) (ddfe62f)
  • updates from 2d conditioning (#139) (3f1b34a), closes #142
  • upgrade CI to use git clone instead of apptainer rebuilding, fix broken chirals (#147) (41b6c80)
  • upweight LOI (92df8e2)
  • use element for atom name char of atomized tokens (#67) (e35e2b0)
  • user-friendly MSA generation entrypoint (#138) (d55b9c4)
  • versioning: Include automatic versioning of datahub version, in… (#59) (a8636ad)
  • visualize: add slot capabilities for view_pymol (#115) (06f6c16)

Performance Improvements

  • significantly speed up representative coordinate fetching (9326e0c)
  • vectorize custom inter-residue bond removal (3d8a6ab)

BREAKING CHANGES

  • Renamed utils files in cifutils

  • feat: add ipd setup utils, fix missing numpy import

  • fix(ci): add missing files for CI

  • chore(ci): group lint & test together again to avoid buggy github ci

  • feat: add af2 distillation dataset path for ipd setup

  • fix: combine fast & slow tests in CI, expose number of CPU cores to run on

  • fix(ci): update time limits

  • chore: clean up ci & add documentation

  • chore(ci): robustness improvements in setup.sh

  • refactor: implement src-architecture for datahub repo

  • The old export paths to datahub won't work anymore and will need to add /src

  • chore: cleanup for release

  • chore: format

  • fix: preprocessing pipelines

  • chore: update validation scripts, new test dataframes

  • chore: update validation notebook

  • fix: tests broken by using new test datasets

  • chore: bump cifutils version

  • update cifutils to version 1.1.0, which includes a significant refactor and breaking changes

  • Switch to updated cifutils