In [14]:
from hpacker import HPacker

# model_dirs is simply two-element tuple/list of directories containing the "initial_guess" and "refinement" models
model_dirs = [
    'pretrained_models/initial_guess',
    'pretrained_models/refinement',
]

# Quick Start - right to packing

## 1) Bare Backbone + Sequence information

In [15]:
# Initialize HPacker object by passing it a tutple of paths to the pre-trained models,
# and the backbone-only structure that we want to add side-chains to.
hpacker = HPacker(model_dirs, 'T0950_bb_only.pdb')

Running on cuda.
There are 2909192 parameters
Running on cuda.
There are 2751944 parameters


There are **three** equivalent ways of telling HPacker which amino-acid should correspond to which site.\
The **first** - and simplest - way is to have the input PDB file already contain sequence information (as is the case in our working example).
All we need to do then is a simple call to `.reconstruct_sidechains()`.

In [16]:
chi_trace = hpacker.reconstruct_sidechains(num_refinement_iterations=5, return_trace_of_predicted_angles=True)

# the reconstructed structure will be saved internally in `hpacker.structure`, and can be easiuly saved in a PDB file
hpacker.write_pdb('reconstructed_from_bb_only_T0950.pdb')

In [17]:
# `chi_trace` contains a record of the predicted chi angles across refinement iterations. it is a list of dictionaries
# within each dictionary, chi angles for individual residues are indexed by residue IDs (res_id), which are Tuples of (chain, resnum, icode)
initial_guess_chis = chi_trace[0]
final_chis = chi_trace[-1]

res_ids = hpacker.get_res_ids()
for i in range(10):
    res_id = res_ids[i]
    print(f'{res_id}: {final_chis[res_id]}')

(' ', 12, ' '): [178.869140625]
(' ', 13, ' '): [-46.90363693237305]
(' ', 14, ' '): [-53.05009460449219, -32.65126419067383]
(' ', 15, ' '): [19.594497680664062, -25.412628173828125]
(' ', 16, ' '): [-157.41526794433594, 176.36703491210938, 175.0377197265625]
(' ', 17, ' '): [-57.673526763916016, 170.25694274902344]
(' ', 18, ' '): [-174.6244659423828, -20.911258697509766]
(' ', 19, ' '): [-60.83549499511719]
(' ', 20, ' '): [-70.23514556884766, -179.6683349609375, 176.46156311035156, 179.8834686279297]
(' ', 21, ' '): [-63.557472229003906]


The **second** way of introducing sequence information is to tell HPacker via the `.update_resnames()` method, before reconstructing.

In [23]:
res_id_to_resname = {} # dictionary: res_id --> three-letter amino-acid code
hpacker.update_resnames(res_id_to_resname)

**Third**, the same information can be passed directly into the reconstruction function.

In [25]:
hpacker.reconstruct_sidechains(num_refinement_iterations=5, res_id_to_resname=res_id_to_resname)

## 2) Repacking an All-Atom structure

In [18]:
# when passing as input a structure that already has side-chains and that you want to re-pack from zer,
# make sure to toggle the `remove_sidechains` flag
hpacker = HPacker(model_dirs, 'T0950.pdb', remove_sidechains=True)

Running on cuda.
There are 2909192 parameters
Running on cuda.
There are 2751944 parameters


In [19]:
# as sequence information is already available, repacking the side-chains is a one-liner
hpacker.reconstruct_sidechains(num_refinement_iterations = 5)
hpacker.write_pdb('reconstructed_T0950.pdb')

In [27]:
# if you wish to mutate certain sites, just pass a dictionary containing just the mutations you'd like to apply, analogously to what we did above
mutations = {} # dictionary: res_id --> three-letter amino-acid code
hpacker.reconstruct_sidechains(res_id_to_resname=mutations, num_refinement_iterations=5)

#### If you want to evaluate against ground truth...

We provide a custom function that assumes the input structure is the ground truth, and returns several metrics comparing the reconstructed side-chains with the original ones

In [20]:
hpacker = HPacker(model_dirs, 'T0950.pdb', remove_sidechains=True)

metrics = hpacker.reconstruct_sidechains_and_evaluate(num_refinement_iterations = 5)

mae_per_angle_4, accuracy_per_angle_4, real_chis, predicted_chis, aas, res_ids_dict, rmsds = metrics

print(mae_per_angle_4)
print(rmsds.keys())
res_id_to_rmsd = dict(zip(res_ids_dict['all'], rmsds['all']))
for i in range(10):
    print('%s: %.3f Angstroms' % (res_ids_dict['all'][i], rmsds['all'][i]))

Running on cuda.
There are 2909192 parameters
Running on cuda.
There are 2751944 parameters
All residues: 309
Core residues: 79
Surface residues: 147
Accuracy:	76	66	46	62	
MAE:	19	29	41	36	

All residues: 309
Core residues: 79
Surface residues: 147
Accuracy:	77	70	47	62	
MAE:	19	29	39	36	

All residues: 309
Core residues: 79
Surface residues: 147
Accuracy:	76	69	49	66	
MAE:	18	27	38	34	

All residues: 309
Core residues: 79
Surface residues: 147
Accuracy:	77	71	48	66	
MAE:	18	27	38	36	

All residues: 309
Core residues: 79
Surface residues: 147
Accuracy:	76	69	49	64	
MAE:	19	28	37	38	

All residues: 309
Core residues: 79
Surface residues: 147
Accuracy:	77	70	47	66	
MAE:	18	27	38	36	

tensor([18.4910, 27.3548, 38.4252, 36.0798], dtype=torch.float64)
dict_keys(['all', 'core', 'surface'])
(' ', 12, ' '): 0.107 Angstroms
(' ', 13, ' '): 1.929 Angstroms
(' ', 14, ' '): 0.298 Angstroms
(' ', 15, ' '): 0.497 Angstroms
(' ', 16, ' '): 2.035 Angstroms
(' ', 17, ' '): 0.182 Angstroms
(' ', 18, ' 

#### If you only want to refine existing side-chains...

If you believe your model already has resonably good side-chains, and you just want to refine them, you can use Hpacker's refinement-only option. This only runs the refinement model on the existing side-chains, so you should expect the resulting side-chains to deviate less from the input structure then when making HPacker start from zero, as done above.

In [21]:
# just don't toggle `remove_sidechains`
hpacker = HPacker(model_dirs, 'T0950.pdb', remove_sidechains=True)

# refinement only
hpacker.refine_sidechains(num_refinement_iterations = 5, return_trace_of_predicted_angles = True)

hpacker.write_pdb('refined_only_T0950.pdb')

Running on cuda.
There are 2909192 parameters
Running on cuda.
There are 2751944 parameters


## 3) Partial packing

Unfortunately, the option of packing only some side-chains is not currently available, but it will be soon.

# Slow Start - the HPacker object

#### Constructor
Each HPacker object is utilized to reconstruct the side-chains of a single protein structure.
Indeed, the constructor takes as required inputs a single PDB file, and a tuple of two paths pointing to the "Initial Guess" and the "Refinement" model directories. There are also several keyword arguments that govern pre-processing of the input structure, such as filtering out hydrogens and waters, and to remove sidechains.

#### Internal Representation
HPacker uses BioPython's PDB module to store and manipulate structures. HPacker stores three versions of the structure, as ```Bio.PDB.Structure``` objects:
1. ```self.structure``` stores the *current* version of the structure, as it's being manipulated or reconstructed. Importantly, the backbone is never changed from the input.
2. ```self.original_structure``` always stores the original structure passed to the constructor, after filtering but *before* removing side-chains, if the kwarg ```remove_sidechains``` is set to ```True```.
3. ```self.copy_structure``` is a dummy structure used as a helper by the internal processes.

Residue sites are uniquely identified by residue IDs (```res_id```), each consisting of the Tuple (chain, resnum, icode).
Individual Residue objects and amino-acid types (```resname```) can be accessed using the ```res_id``` within HPacker methods.

#### Useful methods
Below is a list of *some* of the Hpacker methods that can be used to conveniently manipulate the internal representation of the protein structure
- `get_res_ids()`
- `get_residue(res_id)`
- `get_resname(res_id)`
- `update_resnames(res_id_to_resnames)`
- `write_pdb(outputpath)`