Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

todo #7

Open
31 of 47 tasks
lucidrains opened this issue May 20, 2024 · 0 comments
Open
31 of 47 tasks

todo #7

lucidrains opened this issue May 20, 2024 · 0 comments

Comments

@lucidrains
Copy link
Owner

lucidrains commented May 20, 2024

  • modules

  • miscellaneous

    • f_tokenbond embedding to pairwise init (default to one single chain for starters if not passed in)
    • take care of normalization and unnormalization of atomic coordinates
    • distance labels should be derived from atom positions if not given
    • weighted rigid align module needs to account for atom_mask (variable number of atoms per batch sample)
    • sample without replacement in MSAModule
    • make sure diffusion loss accounts for mask of nucleic acid / ligand + bond loss during fine tuning
    • return the entire loss breakdown for logging in eventual trainer
    • hook up the centre random augmentation
  • @lucidrains take care of

    • packed atom representation
      • given atom lengths and a sequence, do an average pool based on those lengths - atom -> token
      • given atom lengths and a sequence, expand sequence to consecutives, for token -> atom
    • fix packed atom representation when going from token level -> atom level pairwise repr
    • packed repr - make sure repeating pairwise is done in one specialized function, also take care of curtailing or padding the mask through some kwarg
    • able to pass in residue indices for only protein training, everything else derived, test with sidechainnet
    • atom transformer attention bias needs to be calculated efficiently in the Alphafold3 module, use asserts to make sure shape is correct within local_attn fn
    • take care of residue identities / indices -> atom feats + atom bonds + attention biasing for atom transformers
  • training

    • validation and test dataset
    • add config driven training with pydantic validation for constructing trainer and base model
    • saving and loading for both base alphafold3 model as well as trainer + optimizer states
    • add trainer orchestrator config that contains many training configs and one model
    • able to reconstitute the entire training history
  • datasets

    • single protein input
    • multimer input
    • multimer + nucleic acid(s) input
    • multimer + ligand input
    • other?
  • improvisations

    • add register tokens
    • improve atom transformer with some linear attention + other efficient attention tricks
    • frame averaging in place of their random aug
    • rectified flow instead of diffusion
    • add layer sharing
    • instead of all the attention biasing complexity in atom transformer, alternate between GNN (with the sparse bonds) + flash attention
    • additional conditioning on diffusion module
    • use conditionally routed attention for atom encoder and decoder
  • cleanup

    • remove unpacked representation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant