# Analyzing lectins with bound glycans

GlyContact can extract glycan structures from protein-glycan co-crystals. To show you how, we'll do this for the example of `3ZW1`, the complex of the bacterial lectin BambL and Lewis X. But we're getting ahead of ourselves. Let's imagine we have no idea what glycan is in this file. How do we get started?

In [1]:
from glycontact.process import get_glycan_sequences_from_pdb

pdb_file ="./3ZW1.pdb"

get_glycan_sequences_from_pdb(pdb_file)

['Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-3)Gal', 'Fuc(a1-3)GlcNAc']

Got it! So this crystal structure has two glycan sequences that have been built. Note that, often, the electron density of glycans is not fully resolved, so "fragments", such as `Fuc(a1-3)GlcNAc` here, usually are simply the resolved portion of the larger sequence `Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-3)Gal`. Now that we know what we're looking for, we can analyze the torsion angles within this glycan with the `get_glycosidic_torsions` function.

This workflow extracts the structure of the glycan with the `get_annotation` function under the hood and then calculates all torsion angles for you (note that `omega` torsion angles are only defined for linkages involving the hydroxyl group linked to an exocyclic carbon, usually C6)

In [2]:
from glycontact.process import get_glycosidic_torsions
glycan = "Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-3)Gal"

get_glycosidic_torsions(glycan, pdb_file)

Unnamed: 0,linkage,phi,psi,omega,anomeric_form,position
0,2_NAG-1_GAL,-86.4,100.06,,b,3
1,3_FUC-2_NAG,-78.26,140.97,,a,3
2,4_GAL-2_NAG,-84.06,-127.42,,b,4


Next to torsion angles, we might also be interested in how solvent-exposed each part of this glycan is in the bound conformation. `compute_merge_SASA_flexibility` is your friend here. Again, we give it a glycan and a PDB file and it gives you the solvent-accessible surface area (SASA) and flexibility for each monosaccharide.

In our case, you can probably immediately spot the `Fuc` residue, with its very low SASA value, demonstrating how buried and well-coordinated the bound motif is in the binding pocket of BambL.

In [3]:
from glycontact.process import compute_merge_SASA_flexibility
compute_merge_SASA_flexibility(glycan, my_path = pdb_file)

Unnamed: 0,Monosaccharide_id,Monosaccharide,SASA,Standard Deviation,Coefficient of Variation,flexibility,torsion_flexibility
0,1,Gal(b1-1),217.894532,,,0.842545,
1,2,GlcNAc(b1-3),111.50903,,,0.931872,
2,3,Fuc(a1-3),1.465118,,,0.650212,
3,4,Gal(b1-4),151.538354,,,0.878379,


Now let me show you another trick! If you don't provide any PDB file, `GlyContact` will automatically search through `GlycoShape` molecular dynamics simulations to analyze conformational ensembles for you. Note that, if you haven't used this mode before, `GlyContact` will download and process the corresponding `GlycoShape` structures via their API as a one-time action. After that, you're good.

Of course `GlycoShape` doesn't have all structures, so we have to make do with the closest we can get, which is still a Lewis X structure but without the reducing end `Gal`. Now we can get a sense for what the SASA of an entirely unbound `Fuc` in this sequence context would be (much higher than the bound version, unsurprisingly)

In [11]:
compute_merge_SASA_flexibility("Fuc(a1-3)[Gal(b1-4)]GlcNAc")

Unnamed: 0,Monosaccharide_id,Monosaccharide,SASA,Standard Deviation,Coefficient of Variation,flexibility,torsion_flexibility
0,1,-R,48.592978,0.018008,0.03611,0.302623,
1,2,GlcNAc(b1-1),245.921718,0.015228,0.006212,0.302623,1.318251
2,3,Gal(b1-4),217.170787,0.029154,0.013579,0.18666,1.454679
3,4,Fuc(a1-3),184.729621,0.03317,0.017909,0.18666,1.181822


Since lectin-glycan interactions are also about the lectin, we have a bit of functionality to learn more about the binding pocket as well, namely the `get_binding_pocket` function for instance. This function allows you to extract all amino acid residues within a minimum distance (default: 4.0 Å) around a specified monosaccharide from the glycan you're interested in. In our case, since we know that `Fuc` is the relevant bit for BambL, we home in on that to get all residues of interest (if you instead wanted the **entire** binding pocket, with all glycan-adjacent residues, you could simply remove the `binding_monosaccharide` argument from the function call)

By default, this function returns all **atoms** that are closer than the cut-off value. If you're only interested in the **residues**, try running it with `all_atoms = False` for a more concise output

You can also export the binding pocket (+ bound glycan) into a new PDB file by setting the optional `filepath` argument in the function to the file location you would like to save the `.pdb` file to

In [4]:
from glycontact.process import get_binding_pocket
get_binding_pocket(glycan, pdb_file, binding_monosaccharide = "FUC")

Unnamed: 0,chain,resSeq,resName,atom_name,target_atom,distance_min
0,A,26,GLU,OE1,FUC3_O3,2.679193
1,A,26,GLU,OE2,FUC3_O4,2.683798
2,A,15,ARG,NH2,FUC3_O5,2.825438
3,A,79,TRP,NE1,FUC3_O3,2.869875
4,A,15,ARG,NE,FUC3_O4,2.905937
5,A,38,ALA,N,FUC3_O2,2.998354
6,A,26,GLU,CD,FUC3_O4,3.41624
7,A,15,ARG,CD,FUC3_O4,3.472454
8,A,37,GLY,CA,FUC3_O2,3.481
9,A,15,ARG,CZ,FUC3_O5,3.508778


# Analyzing glycosylated proteins

Next, we'll take a look at processing glycoproteins, where the glycan chain is covalently linked to an amino acid. We again work with an example glycoprotein, this time the cryo-EM structure of the hepatitis C virus E1E2 glycoprotein, in complex with various antibodies. Let's start by checking which glycans are available in the PDB:

In [5]:
pdb_file = "./7T6X.pdb"
get_glycan_sequences_from_pdb(pdb_file)

['Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc',
 'Man(a1-6)Man(a1-6)Man(b1-4)GlcNAc(b1-4)GlcNAc',
 'Man(b1-4)GlcNAc(b1-4)GlcNAc',
 'GlcNAc',
 'GlcNAc(b1-4)GlcNAc']

Let's work with the trimannosyl-core `Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc` _N_-glycan. Most functions in `GlyContact` can be used interchangeably on lectin-bound glycans and covalently attached glycans, without you having to specify anything (the magic happens under the hood). Now, if we run our neat torsion angle calculation again, you will note a new kind of row here: the **linker between glycan and protein**, which offers another layer of flexibility for protein-attached glycans.

As above, we only get an `omega` torsion angle when a C6 hydroxyl group is involved in the glycosidic linkage (which, this time, is indeed the case once!)

In [6]:
from glycontact.process import get_glycosidic_torsions
glycan = "Man(a1-3)[Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc"
get_glycosidic_torsions(glycan, pdb_file)

Unnamed: 0,linkage,phi,psi,omega,anomeric_form,position
0,ASN305-1_NAG,179.36,-168.96,,linker,0
1,2_NAG-1_NAG,-80.85,-120.6,,b,4
2,3_BMA-2_NAG,-82.62,-121.97,,b,4
3,4_MAN-3_BMA,71.33,138.26,,a,3
4,5_MAN-3_BMA,92.63,-157.5,-51.99,a,6


Next to glycosidic torsions, surface availability and flexibility of glycans are of course important. `compute_merge_SASA_flexibility` gives you both, with the flexibility being a distance-based measure that is derived from the temperature factors in the PDB file. (note that we can only calculate `torsion_flexibility` when we have different conformers of the same glycan, such as in molecular dynamics simulations)

This function then also gives you the flexibility of the linker amino acid in the case of a covalently attached glycan, like we have here!

In [7]:
from glycontact.process import compute_merge_SASA_flexibility
compute_merge_SASA_flexibility(glycan, my_path = pdb_file)



Unnamed: 0,Monosaccharide_id,Monosaccharide,SASA,Standard Deviation,Coefficient of Variation,flexibility,torsion_flexibility
0,1,GlcNAc(b1-1),112.252247,,,1.71432,
1,2,GlcNAc(b1-4),113.757327,,,1.978299,
2,3,Man(b1-4),109.338062,,,2.570273,
3,4,Man(a1-3),234.063838,,,2.633248,
4,5,Man(a1-6),240.003828,,,2.796275,
5,305,ASN,21.443297,,,1.639074,
