Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why do glycines have an extra bond in internal coordinates? #2

Closed
SmearingMap opened this issue Jan 7, 2023 · 4 comments
Closed

Why do glycines have an extra bond in internal coordinates? #2

SmearingMap opened this issue Jan 7, 2023 · 4 comments

Comments

@SmearingMap
Copy link

SmearingMap commented Jan 7, 2023

While trying to understand how the internal coordinates are represented on in SCNet format, I noticed nerfax seems to assign additional data beyond the bond and torsion angles and the bond lengths.

I noticed this when I added the following lines of code to your PDB reconstruction example.

seq = list(traj.topology.to_fasta()[0])

glycine_angles = [internal_coords["angles_mask"][0][i,:] 
                  for i in range(len(seq)) if seq[i] == "G"]
glycine_torsions = [internal_coords["angles_mask"][1][i,:] 
                  for i in range(len(seq)) if seq[i] == "G"]
glycine_bonds = [internal_coords["bond_mask"][i,:] 
                  for i in range(len(seq)) if seq[i] == "G"]

Examining these values, I find that each has five non-zero entries, even though point cloud mask has only four, and the data format described in mp_nerf.kb_proteins suggests that glycine should have only four bond angles, four torsions, and four bond lengths.

Notably, I only see this for glycines.
Everything else has a number of non-zero entries matching the mp_nerf description and number of nonzero entries in the cloud mask. Furthermore, the first four bond angles tend to be very near their reference values, but the last one varies greatly. Also, the final bond length is typically very large.

I have not tried out mp_nerf to see if this also happens in their code.

Is this intentional? If so, what are the extra bond lengths, torsions, and angles?

@oliverdutton
Copy link
Contributor

It is an implementation detail that could have been handled better. Placing the CB atom is slightly awkward as it depends on the previous residue: CB(i) is placed C(i-1)-N(i)-CA(i)->CB(i). To make sure the C position was extracted from the previous residue I had to extract it even from the glycines (as the next residue might not be a glycine and hence require it) hence the dummy values that appear.

I've added a commit to clean this back up.

@SmearingMap
Copy link
Author

Thanks! Just a heads up, your update will break anything that tries to JIT-compile make_scaffolds; I don't know if that's a problem for you or not.

Also, am I correct in understanding that the phi angles are shifted back by one from the conventional notation? I.e. the psi angle of residue i, psi(i), is usually understood to be the dihedral angle of N(i-1)-C(i)-CA(i)-N(i) and phi(i) to be the dihedral of C(i)-CA(i)-N(i)-C(i+1). By this convention, the first amino acid backbone in protein sequence will have only a phi angle and the last will have only a psi angle. It looks like you've moved the phi angles back in the internal coordinate arrays, so that the first amino acid in the sequence has no phi or psi angles in its array. Am I getting this right?

More broadly, I'm interested in using nerfax for an academic application. I'm using MDTraj at the moment to compute all torsion angles and I'd like to be able to replace all of this with nerfax (since I'm using jax anyway), but in order to make the switch I'd want to be able to replicate a few of my past results. So, for the purposes of matching values of angles/torsions, is the order of the torsion and angle indices the same as it in mp_nerf?

@oliverdutton
Copy link
Contributor

oliverdutton commented Jan 10, 2023

Thanks! Just a heads up, your update will break anything that tries to JIT-compile make_scaffolds; I don't know if that's a problem for you or not.

In general, this code is only jittable on a specific protein case. This is because each requires different scattering and shapes. To deal with this, there's functions like 'get_jax_protein_fold' which do all the compile-time folding to make the inputs jittable. Below I've added a little snippet for how you would make make_scaffold jittable (and faster when you do jit it).

nerfax/nerfax/plugin.py

Lines 72 to 82 in 8d04e1a

def get_jax_protein_fold(scaffolds, only_backbone=False, reconstruct_fn=reconstruct_from_internal_coordinates):
'''
Takes scaffolds, a dict of torch tensors and returns a
jit-compiled op converting internal->cartesian coords
for that specific protein
'''
cloud_mask, point_ref_mask= [jnp.array(x.cpu() if isinstance(x, torch.Tensor) else x) for x in map(scaffolds.__getitem__, ['cloud_mask', 'point_ref_mask'])]
def _fold(angles_mask, bond_mask, only_backbone=only_backbone):
with jax.ensure_compile_time_eval():
return protein_fold(cloud_mask, point_ref_mask, angles_mask, bond_mask, only_backbone=only_backbone, reconstruct_fn=reconstruct_fn)
return _fold

from nerfax.parser import make_scaffolds,load_to_sc_coord_format
import jax
coords, seq = load_to_sc_coord_format(path, first_frame_only=True)
scaffolds_natural = make_scaffolds(coords, seq)

@jit
def f(coords):
  with jax.ensure_compile_time_eval():
    return partial(make_scaffolds, seq=seq)(coords)

f(coords)

Also, am I correct in understanding that the phi angles are shifted back by one from the conventional notation? I.e. the psi angle of residue i, psi(i), is usually understood to be the dihedral angle of N(i-1)-C(i)-CA(i)-N(i) and phi(i) to be the dihedral of C(i)-CA(i)-N(i)-C(i+1). By this convention, the first amino acid backbone in protein sequence will have only a phi angle and the last will have only a psi angle. It looks like you've moved the phi angles back in the internal coordinate arrays, so that the first amino acid in the sequence has no phi or psi angles in its array. Am I getting this right?

Yes, the angles are rolled when compared to mp_nerf. This was because I found the mp_nerf definition very odd, the angles are rolled and the the angle is redefined. A conversion function is there convert_natural_to_scnet or convert_scnet_to_natural which takes either the angles (2,L,14) array or a dict with 'angles_mask' key present, which it acts on.

nerfax/nerfax/plugin.py

Lines 7 to 30 in 8d04e1a

roll_first_col_in_last_axis = lambda x, roll=1: jnp.concatenate([jnp.roll(x[...,:1], roll, axis=-2), x[...,1:]], axis=-1)
# Convert between scnet odd conventions and 'natural' ones. #FIXME make bijector.
def converter(input, roll):
'''
if a dict, it's assumed to have `angles_mask` key which is altered.
if an array, assumed to be the `angles_mask` array in mp_nerf to alter
'''
def _converter(angles_mask):
# Roll the first angle and dihedral round by 1 to match 'natural' syntax
angles_mask = roll_first_col_in_last_axis(angles_mask, roll=roll)
# Fix difference in how angle is specified
angles, torsions = angles_mask
angles = jnp.pi-angles # due to historical scnet reasons, the scnet angle is defined as pi-angle
angles_mask = jnp.stack([angles, torsions])
return angles_mask
if isinstance(input, dict):
# is scaffolds dict, where we fix angles mask
return {**input, 'angles_mask': _converter(input['angles_mask'])}
else:
# Is angles_mask tensor of shape (2,L,14)
return _converter(input)
convert_scnet_to_natural = partial(converter, roll=1)
convert_natural_to_scnet = partial(converter, roll=-1)

dihedrals values are associated with the atom you will place - so the highest atom number atom

@SmearingMap
Copy link
Author

That was incredibly helpful, and I agree that this is a much cleaner way to organize the computation. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants