# Working with Glycans directly - Monomer-level

In order to work with glycans beyond simple conversion, one can directly work on glycans. This is possible without parsing the whole glycan into SMILES but just read in the monosaccharide tree by setting `full=False` when instantiating a glycan.

In [1]:
from glyles import Glycan


glycan = Glycan("Man(a1-2)Gal")  # read full glycan
print(glycan.glycan_smiles)  # better use glycan.get_smiles(), but good for demonstration

glycan = Glycan("Man(a1-2)Gal", full=False)
print(glycan.glycan_smiles)

O1C(O)[C@H](O[C@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@@H]2O)[C@@H](O)[C@@H](O)[C@H]1CO
None


### Count substructures

Using GlyLES, the user can count how often a certain substructure occurs in a glycan. For this check, only the main monosaccharide has to match, not it's orientation, bonds, or attached functional groups.

Instead of passing a glycan object to the `count` function, cou can also send the IUPAC-condensed string.

In [2]:
glycan = Glycan("GalNAc(a1-4)Tal(b1-4)Neu5Ac(a2-3)[ManN(a1-4)Gal(b1-4)[Glc(b1-2)Fru(a1-3)]Gul(a1-2)Glc3S(b1-4)]Gal(a1-4)Man")
sub = Glycan("Gal")

print(glycan.count(sub))
print(glycan.count("Gal"))

3
3


##### This can be extended to bigger substructures

In [3]:
print(glycan.count("Glc(a1-2)Gal"))

1


##### Exact substructure matches

This can also be done as exact matching, i.e. the matching galactoses have to have the exact same modifications as the query. This includes the orientation of the monosaccharides. Similarly, matching of bonds can be requested. Finally, both matching filters can be combined.

In [4]:
# Matching monosaccharides exactly
print(glycan.count("Glc(b1-2)Gal", exact_nodes=True))
print(glycan.count("Glc3S(b1-2)Gal", exact_nodes=True))

# Matching bonds between monosaccharides exactly
print(glycan.count("Glc3S(b1-2)Gal", exact_edges=True))
print(glycan.count("Glc(b1-4)Gal", exact_edges=True))

# Matching both monosaccharides and their bonds exactly
print(glycan.count("Glc3S(b1-2)Gal", exact_nodes=True, exact_edges=True))
print(glycan.count("Glc(b1-4)Gal", exact_nodes=True, exact_edges=True))
print(glycan.count("Glc3S(b1-4)Gal", exact_nodes=True, exact_edges=True))

0
1
0
1
0
0
1


### Filtering and Ordering Glycans based on Structural Properties

Using this, you can filter and sort lists of glycans based on structural properties.

In [5]:
glycans = [(line.split("\t")[0], Glycan(line.split("\t")[0])) for line in open("files/pubchem_poly.tsv", "r").readlines()]
query = Glycan("Tal")
len(list(filter(lambda x: x[1].count(query) > 0, glycans)))

0

In [9]:
query = Glycan("Gal")
print("\n".join([f"{x[1].count(query)}: {x[0]}" for x in sorted(glycans, key=lambda x: x[1].count(query), reverse=True)][:10]))

4: Gal(b1-4)GlcNAc(b1-4)[GalNAc(b1-4)GlcNAc(b1-2)]Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)[GalNAc(b1-4)GlcNAc(b1-6)]Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
3: GalNAc(b1-4)GlcNAc(b1-2)[GalNAc(b1-4)GlcNAc(b1-6)]Man(a1-6)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
3: Gal(b1-4)GlcNAc(b1-2)[GalNAc(b1-4)GlcNAc(b1-6)]Man(a1-6)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
3: Gal(b1-4)GlcNAc(b1-4)[GalNAc(b1-4)GlcNAc(b1-2)]Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
3: Gal(b1-4)GlcNAc(b1-6)[GalNAc(b1-4)GlcNAc(b1-2)]Man(a1-6)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
3: Gal(a1-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[GalNAc(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
3: Fuc(a1-2)[GalNAc(a1-3)]Gal(b1-4)GlcNAc(b1-2)Man(a1-3)[Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc
3: Fuc(a1-2)[GalNAc(a1-3)]Gal(b1-4)GlcNAc(b1-2)Man(a1-6)[Gal(b1-4)GlcNAc(b