## Compute solvent accessible surface area (SASA)

Here we are using freeSASA (https://f1000research.com/articles/5-189/v1).

It can be installed using 

`pip install freesasa`


### Compute SASA per residue for a PDB structure

To compute SASA by residue, the following code can be used

In [3]:
import freesasa

structure = freesasa.Structure('../data/pdbs/9ds2_chothia.pdb')
result = freesasa.calc(structure)

residue_areas = result.residueAreas()  

No residue_areas contains a `ResidueArea` instance for each residue. 
It can be accessed using

`area = residue_areas[chain][residue]`

Each `ResidueArea` has attributes

- `total`: total area
- `relativeTotal`: relative total area, value in range 0 to 1, sometimes larger than 1 if very exposed or at end of chains
- `polar`: area of polar atoms
- `apolar`
- `relativePolar`
- `relativeApolar`

in structural biology, SASA (Solvent-Accessible Surface Area) is often split into polar and apolar components because they have distinct biophysical meanings


In [3]:
import pandas as pd
import freesasa
import os

PDB_DIR = "../data/pdbs"

pdb_id = '9ds2'


pdb_path = os.path.join(PDB_DIR, f"{pdb_id}_chothia.pdb")
structure = freesasa.Structure(pdb_path)
result = freesasa.calc(structure)

residue_areas = result.residueAreas()  

data = []

for residue_number, area in residue_areas['H'].items(): 
    data.append(dict(ab_chain = 'H',
                     ab_resnum = residue_number,
                     ab_resname = area.residueType, 
                     total = area.total, 
                     relativeTotal = area.relativeTotal,
                     polar = area.polar,
                     relativePolar = area.relativePolar,
                     apolar = area.apolar,
                     relativeApolar = area.relativeApolar)
                     ) 

df = pd.DataFrame(data)
df.iloc[40:70]

Unnamed: 0,ab_chain,ab_resnum,ab_resname,total,relativeTotal,polar,relativePolar,apolar,relativeApolar
40,H,41,PRO,103.525306,0.754503,30.896571,1.920234,72.628736,0.599643
41,H,42,GLY,74.173217,0.914702,36.931269,0.827128,37.241948,1.022007
42,H,43,LYS,174.570762,0.851648,81.385697,0.866912,93.185065,0.838749
43,H,44,GLY,15.091555,0.186109,6.577257,0.147307,8.514298,0.233653
44,H,45,LEU,16.361257,0.091119,16.361257,0.440292,0.0,0.0
45,H,46,GLU,65.60881,0.376672,32.071201,0.261848,33.537609,0.648697
46,H,47,TRP,5.058133,0.020298,4.343335,0.070463,0.714797,0.003811
47,H,48,VAL,1.950501,0.012835,0.0,0.0,1.950501,0.016948
48,H,49,ALA,0.0,0.0,0.0,0.0,0.0,0.0
49,H,50,PHE,2.517701,0.012596,0.0,0.0,2.517701,0.015264


### Split PDB structure into antibody and antigen

We are interested in the difference in SASA between the unbound antibody and the antibody when bound in the antibody-antigen complex.

Before we wrote code to compute the SASA per residue for a PDB file. To compute the difference in SASA, we need to create PDB files for the antibody.

biopython provides a class to save the subset of a structure. To do so, one has to subclass `Bio.PDB.Select` and provide a method `accept_chain(chain_id)` that returns `True` if chain_id is to be exported and `False` if not.

In [1]:
from Bio.PDB import PDBParser, PDBIO, Select

class ChainSelect(Select):
    def __init__(self, chain_ids):
        self.chain_ids = chain_ids
    def accept_chain(self, chain):
        return chain.get_id() in self.chain_ids


We can now use this class to save only selected chains from the structure.

Let's save those into `../data/pdbs_ab`. Create this directory.

In [None]:
PDB_AB_DIR = "../data/pdbs_ab"

pdb_id = '9ds2'
light_chain = 'L'
heavy_chain = 'H'

pdb_path = os.path.join(PDB_DIR, f"{pdb_id}_chothia.pdb")
pdb_ab_path = os.path.join(PDB_AB_DIR, f"{pdb_id}_{light_chain}_{heavy_chain}_chothia.pdb")

# create PDB file for unbound antibody
parser = PDBParser(PERMISSIVE=1)
structure = parser.get_structure(pdb_id, pdb_path)

io = PDBIO()
io.set_structure(structure)
io.save(pdb_ab_path, select = ChainSelect([light_chain, heavy_chain]))

### Compute dSASA for antibody residues in free and bound state

In [None]:
# compute SASA for complex
structure = freesasa.Structure(pdb_path)
result = freesasa.calc(structure)
cplx = result.residueAreas()

# compute SASA for unbound antibody
structure = freesasa.Structure(pdb_ab_path)
result = freesasa.calc(structure)
antibody = result.residueAreas()  

# report dSASA per residue
data = []

chain = light_chain

for residue_number, ab_area in antibody[chain].items(): 
    # get SASA for complex for same chain/residue
    cplx_area = cplx[chain][residue_number]
    
    data.append(dict(ab_chain = chain,
                     ab_resnum = residue_number,
                     ab_resname = ab_area.residueType, 
                     dSASA = ab_area.total - cplx_area.total, 
                     dSASA_rel = ab_area.relativeTotal - cplx_area.relativeTotal,
                     dSASA_polar = ab_area.polar - cplx_area.polar,
                     dSASA_polar_rel = ab_area.relativePolar - cplx_area.relativePolar,
                     dSASA_apolar = ab_area.apolar - cplx_area.apolar,
                     dSASA_apolar_rel = ab_area.relativeApolar - cplx_area.relativeApolar)
                     ) 

df = pd.DataFrame(data)
df.iloc[30:50]

Unnamed: 0,ab_chain,ab_resnum,ab_resname,dSASA,dSASA_rel,dSASA_polar,dSASA_polar_rel,dSASA_apolar,dSASA_apolar_rel
30,L,31,ASN,33.159644,0.228671,31.236796,0.301921,1.922848,0.046278
31,L,32,SER,13.799381,0.116608,13.799381,0.193323,0.0,0.0
32,L,33,LEU,1.064942,0.005931,0.318446,0.00857,0.746496,0.005243
33,L,34,ALA,9.357309,0.086036,0.0,0.0,9.357309,0.131775
34,L,35,TRP,0.0,0.0,0.0,0.0,0.0,0.0
35,L,36,TYR,49.776196,0.232393,26.911085,0.331744,22.865111,0.171828
36,L,37,GLN,0.0,0.0,0.0,0.0,0.0,0.0
37,L,38,GLN,41.693909,0.233148,41.362366,0.313708,0.331543,0.007057
38,L,39,LYS,0.0,0.0,0.0,0.0,0.0,0.0
39,L,40,PRO,0.0,0.0,0.0,0.0,0.0,0.0


### Wrap all this into a function

write a function `compute_dSASA(pdb_id, light_chain, heavy_chain)` that

- compute the path to the complex PDB file (from SabDab)
- compute the path for the antibody only PDB file (that we want to create)
- reads the PDB structure
- writes the PDB file for the antibody (select the heavy and light chains) to some scratch directory
- compute SASA per residue for complex
- compute SASA per residue for antibody
- initialize empty data list
- go over light chain and add dictionaries for dSASA per residue (as above),
  also add 
  - pdb_id, 
  - ab_chaintype = 'light'
  - ab_chain = light_chain
- do the same for heavy chain
- return a DataFrame

This is basically copy/paste from above.

To make it easier to copy the function to the next notebook, we define imports and constants again.

In [None]:
import os
import pandas as pd
import freesasa
from Bio.PDB import PDBParser, PDBIO, Select

PDB_DIR = "../data/pdbs"
PDB_AB_DIR = "../data/pdbs_ab"

class ChainSelect(Select):
    def __init__(self, chain_ids):
        self.chain_ids = chain_ids
    def accept_chain(self, chain):
        return chain.get_id() in self.chain_ids


def compute_dSASA(pdb_id, light_chain, heavy_chain):

    # create PDB file for unbound antibody
    pdb_path = ...
    pdb_ab_path = ...
    
    ...

    # compute SASA for complex and unbound antibody 
    ...

    # report dSASA per residue for light and heavy chain
    data = []

    chain = light_chain

    ... 

    chain = heavy_chain

    ... 
 


    df = pd.DataFrame(data)

    return df



In [None]:
compute_dSASA("9ds1", "L", "H")