# Constructing Subgraphs in Graphein
Graphein provides utilities for extracting various subgraphs. These are composable to enable selection of quite specific subsets.

We first start by constructing a graph with a bunch of different edge types. This will be the basis graph upon which all of the selections are made

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/a-r-j/graphein/blob/master/notebooks/subgraphing_tutorial.ipynb)

In [2]:
# Install Graphein if necessary
# !pip install graphein
# Install DSSP if necessary
# !sudo apt-get install dssp (better for colab) OR !conda install -c salilab dssp

In [2]:
import plotly.io as pio
pio.renderers.default

'vscode'

In [1]:
from graphein.protein.config import ProteinGraphConfig
from graphein.protein.edges.distance import *
from graphein.protein.graphs import construct_graph

edge_fns = [
    add_aromatic_interactions,
    add_hydrophobic_interactions,
    add_aromatic_sulphur_interactions,
    add_cation_pi_interactions,
    add_disulfide_interactions,
    add_hydrogen_bond_interactions,
    add_ionic_interactions,
    add_peptide_bonds
    ]
config = ProteinGraphConfig(edge_construction_functions=edge_fns)

g = construct_graph(config=config, pdb_code="4hhb")

To use the Graphein submodule graphein.protein.features.sequence.embeddings, you need to install biovec.

biovec cannot be installed via conda
To use the Graphein submodule graphein.protein.visualisation, you need to install pytorch3d.

To do so, use the following command:

    conda install -c pytorch3d pytorch3d
To use the Graphein submodule graphein.protein.meshes, you need to install pytorch3d.

To do so, use the following command:

    conda install -c pytorch3d pytorch3d


DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 574 total nodes
DEBUG:graphein.protein.features.nodes.amino_acid:Reading meiler embeddings from: /Users/arianjamasb/github/graphein/graphein/protein/features/nodes/meiler_embeddings.csv
INFO:graphein.protein.edges.distance:Found: 84 aromatic-aromatic interactions
INFO:graphein.protein.edges.distance:Found 1284 hydrophobic interactions.
INFO:graphein.protein.edges.distance:Found 6 disulfide interactions.
INFO:graphein.protein.edges.distance:Found 208 hbond interactions.
INFO:graphein.protein.edges.distance:Found 12 hbond interactions.
INFO:graphein.protein.edges.distance:Found 420 ionic interactions.


In [2]:
from graphein.protein.visualisation import plotly_protein_structure_graph
plotly_protein_structure_graph(g, node_size_min=4, node_size_multiplier=2)

## Subsetting with a list of nodes
The simplest method of constructing a subgraph is when we already have a defined list of nodes that we wish to extract. The naming convention for nodes is:

`CHAIN:RESIDUE_NAME:POSITION`

e.g: `A:ALA:110`

We can use the `extract_subgraph_from_node_list()` function to achieve this.

```python
extract_subgraph_from_node_list(
    g,
    node_list: Optional[List[str]],
    filter_dataframe: bool = True,
    inverse: bool = False,
    return_node_list: bool = False
)
```

* Selections can be inverted with the `inverse` parameter
* Whether or not we wish to filter the `pdb_df` dataframe associated with the graph (accessed via `g.graph["pdb_df"]`) is controlled by the `filter_dataframe` parameter
* If we just wish to retrieve a list of nodes identified by the selection, instead of returning the subgraph itself we specify this with the `return_node_list` parameter.

This is the core subsetting function. The other subsetting functions described below are based on different methods for computing a list of nodes to subset the graph to. If you wish to implement a subsetting method not described here, you simply need to compute a list of node_ids and provide them to this function.

In [3]:
from graphein.protein.subgraphs import extract_subgraph_from_node_list

NODE_LIST = ['B:LYS:82', 'B:GLY:83', 'B:THR:84', 'B:PHE:85', 'B:ALA:86', 'B:THR:87', 'B:LEU:88', 'B:SER:89', 'B:GLU:90', 'B:LEU:91', 'B:HIS:92', 'B:CYS:93', 'B:ASP:94', 'B:LYS:95', 'B:LEU:96', 'B:HIS:97', 'B:VAL:98', 'B:ASP:99', 'B:PRO:100', 'B:GLU:101', 'B:ASN:102', 'B:PHE:103', 'B:ARG:104', 'B:LEU:105', 'B:LEU:106', 'B:GLY:107', 'B:ASN:108', 'B:VAL:109', 'B:LEU:110', 'B:VAL:111', 'B:CYS:112', 'B:VAL:113', 'B:LEU:114', 'B:ALA:115', 'B:HIS:116', 'B:HIS:117', 'B:PHE:118', 'B:GLY:119', 'B:LYS:120', 'B:GLU:121', 'B:PHE:122', 'B:THR:123', 'B:PRO:124', 'B:PRO:125', 'B:VAL:126', 'B:GLN:127', 'B:ALA:128', 'B:ALA:129', 'B:TYR:130', 'B:GLN:131', 'B:LYS:132', 'B:VAL:133', 'B:VAL:134', 'B:ALA:135', 'B:GLY:136', 'B:VAL:137', 'B:ALA:138', 'B:ASN:139', 'B:ALA:140', 'B:LEU:141', 'B:ALA:142', 'B:HIS:143', 'B:LYS:144', 'B:TYR:145', 'B:HIS:146', 'C:VAL:1', 'C:LEU:2', 'C:SER:3', 'C:PRO:4', 'C:ALA:5', 'C:ASP:6', 'C:LYS:7', 'C:THR:8', 'C:ASN:9', 'C:VAL:10', 'C:LYS:11', 'C:ALA:12', 'C:ALA:13', 'C:TRP:14', 'C:GLY:15', 'C:LYS:16', 'C:VAL:17', 'C:GLY:18', 'C:ALA:19', 'C:HIS:20', 'C:ALA:21', 'C:GLY:22', 'C:GLU:23', 'C:TYR:24', 'C:GLY:25', 'C:ALA:26', 'C:GLU:27', 'C:ALA:28', 'C:LEU:29', 'C:GLU:30', 'C:ARG:31', 'C:MET:32', 'C:PHE:33', 'C:LEU:34', 'C:SER:35', 'C:PHE:36', 'C:PRO:37', 'C:THR:38', 'C:THR:39', 'C:LYS:40', 'C:THR:41', 'C:TYR:42', 'C:PHE:43', 'C:PRO:44', 'C:HIS:45', 'C:PHE:46', 'C:ASP:47', 'C:LEU:48', 'C:SER:49', 'C:HIS:50', 'C:GLY:51', 'C:SER:52', 'C:ALA:53', 'C:GLN:54', 'C:VAL:55', 'C:LYS:56', 'C:GLY:57', 'C:HIS:58', 'C:GLY:59', 'C:LYS:60', 'C:LYS:61', 'C:VAL:62', 'C:ALA:63', 'C:ASP:64', 'C:ALA:65', 'C:LEU:66', 'C:THR:67', 'C:ASN:68', 'C:ALA:69', 'C:VAL:70', 'C:ALA:71']

s_g = extract_subgraph_from_node_list(
    g,
    NODE_LIST
    )

# Test our extraction worked
for n in s_g.nodes():
    assert n in NODE_LIST

for n in NODE_LIST:
    assert n in g.nodes()

# Visualise the subgraph
plotly_protein_structure_graph(s_g, node_size_min=4, node_size_multiplier=2)

DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['B:LYS:82', 'B:GLY:83', 'B:THR:84', 'B:PHE:85', 'B:ALA:86', 'B:THR:87', 'B:LEU:88', 'B:SER:89', 'B:GLU:90', 'B:LEU:91', 'B:HIS:92', 'B:CYS:93', 'B:ASP:94', 'B:LYS:95', 'B:LEU:96', 'B:HIS:97', 'B:VAL:98', 'B:ASP:99', 'B:PRO:100', 'B:GLU:101', 'B:ASN:102', 'B:PHE:103', 'B:ARG:104', 'B:LEU:105', 'B:LEU:106', 'B:GLY:107', 'B:ASN:108', 'B:VAL:109', 'B:LEU:110', 'B:VAL:111', 'B:CYS:112', 'B:VAL:113', 'B:LEU:114', 'B:ALA:115', 'B:HIS:116', 'B:HIS:117', 'B:PHE:118', 'B:GLY:119', 'B:LYS:120', 'B:GLU:121', 'B:PHE:122', 'B:THR:123', 'B:PRO:124', 'B:PRO:125', 'B:VAL:126', 'B:GLN:127', 'B:ALA:128', 'B:ALA:129', 'B:TYR:130', 'B:GLN:131', 'B:LYS:132', 'B:VAL:133', 'B:VAL:134', 'B:ALA:135', 'B:GLY:136', 'B:VAL:137', 'B:ALA:138', 'B:ASN:139', 'B:ALA:140', 'B:LEU:141', 'B:ALA:142', 'B:HIS:143', 'B:LYS:144', 'B:TYR:145', 'B:HIS:146', 'C:VAL:1', 'C:LEU:2', 'C:SER:3', 'C:PRO:4', 'C:ALA:5', 'C:ASP:6', 'C:LYS:7', 'C:THR:8', 'C:ASN:9', 'C:VAL:10'

In [4]:
# The associated dataframe is filtered to only include the remaining nodes by default.
# If this is not desired, set filter_dataframe=False
s_g.graph["pdb_df"]

Unnamed: 0,record_name,atom_number,blank_1,atom_name,alt_loc,residue_name,blank_2,chain_id,residue_number,insertion,...,y_coord,z_coord,occupancy,b_factor,blank_4,segment_id,element_symbol,charge,line_idx,node_id
222,ATOM,1689,,CA,,LYS,,B,82,,...,-20.862,8.452,1.0,24.25,,,C,,2572,B:LYS:82
223,ATOM,1698,,CA,,GLY,,B,83,,...,-23.724,10.746,1.0,41.64,,,C,,2581,B:GLY:83
224,ATOM,1702,,CA,,THR,,B,84,,...,-22.242,11.744,1.0,25.47,,,C,,2585,B:THR:84
225,ATOM,1709,,CA,,PHE,,B,85,,...,-18.963,12.749,1.0,21.59,,,C,,2592,B:PHE:85
226,ATOM,1720,,CA,,ALA,,B,86,,...,-20.242,13.948,1.0,23.14,,,C,,2603,B:ALA:86
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
353,ATOM,2694,,CA,,THR,,C,67,,...,16.119,9.983,1.0,15.27,,,C,,3577,C:THR:67
354,ATOM,2701,,CA,,ASN,,C,68,,...,18.613,11.088,1.0,21.49,,,C,,3584,C:ASN:68
355,ATOM,2709,,CA,,ALA,,C,69,,...,17.929,8.006,1.0,15.27,,,C,,3592,C:ALA:69
356,ATOM,2714,,CA,,VAL,,C,70,,...,18.432,5.673,1.0,21.72,,,C,,3597,C:VAL:70


In [5]:
# Inversing the selection.
s_g = extract_subgraph_from_node_list(
    g,
    NODE_LIST,
    inverse=True
    )
plotly_protein_structure_graph(s_g, node_size_min=4, node_size_multiplier=2)

DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:VAL:1', 'A:LEU:2', 'A:SER:3', 'A:PRO:4', 'A:ALA:5', 'A:ASP:6', 'A:LYS:7', 'A:THR:8', 'A:ASN:9', 'A:VAL:10', 'A:LYS:11', 'A:ALA:12', 'A:ALA:13', 'A:TRP:14', 'A:GLY:15', 'A:LYS:16', 'A:VAL:17', 'A:GLY:18', 'A:ALA:19', 'A:HIS:20', 'A:ALA:21', 'A:GLY:22', 'A:GLU:23', 'A:TYR:24', 'A:GLY:25', 'A:ALA:26', 'A:GLU:27', 'A:ALA:28', 'A:LEU:29', 'A:GLU:30', 'A:ARG:31', 'A:MET:32', 'A:PHE:33', 'A:LEU:34', 'A:SER:35', 'A:PHE:36', 'A:PRO:37', 'A:THR:38', 'A:THR:39', 'A:LYS:40', 'A:THR:41', 'A:TYR:42', 'A:PHE:43', 'A:PRO:44', 'A:HIS:45', 'A:PHE:46', 'A:ASP:47', 'A:LEU:48', 'A:SER:49', 'A:HIS:50', 'A:GLY:51', 'A:SER:52', 'A:ALA:53', 'A:GLN:54', 'A:VAL:55', 'A:LYS:56', 'A:GLY:57', 'A:HIS:58', 'A:GLY:59', 'A:LYS:60', 'A:LYS:61', 'A:VAL:62', 'A:ALA:63', 'A:ASP:64', 'A:ALA:65', 'A:LEU:66', 'A:THR:67', 'A:ASN:68', 'A:ALA:69', 'A:VAL:70', 'A:ALA:71', 'A:HIS:72', 'A:VAL:73', 'A:ASP:74', 'A:ASP:75', 'A:MET:76', 'A:PRO:77', 'A:ASN:78', 'A:ALA:79

## Spatial Subgraphing

We can construct spatial subgraphs by specifying a central point and a radius. All nodes within that radius (euclidean distance) will be selected. This selection can be inversed as before.

Here we select all nodes within 20 $\mathring A$ of the origin:

*N.B. different proteins may use different co-ordinate spaces*

In [6]:
from graphein.protein.subgraphs import extract_subgraph_from_point

s_g = extract_subgraph_from_point(g, centre_point=(0, 0, 0), radius=20)

plotly_protein_structure_graph(s_g, node_size_min=4, node_size_multiplier=2)

DEBUG:graphein.protein.subgraphs:Found 177 nodes in the spatial point-radius subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['B:TRP:37', 'A:ALA:130', 'A:TYR:140', 'C:ALA:123', 'B:ALA:138', 'B:HIS:143', 'D:GLU:101', 'D:ALA:142', 'A:ALA:88', 'B:VAL:98', 'A:VAL:107', 'A:THR:41', 'B:ALA:142', 'C:CYS:104', 'C:THR:134', 'C:ALA:130', 'A:ALA:123', 'C:LYS:99', 'D:VAL:109', 'A:ASP:126', 'B:ASN:102', 'A:PHE:36', 'B:LEU:32', 'C:LYS:40', 'A:LEU:105', 'C:SER:131', 'D:ALA:138', 'D:PHE:103', 'C:ALA:28', 'D:PRO:100', 'A:LEU:100', 'D:LEU:32', 'A:LEU:136', 'B:LEU:31', 'D:VAL:111', 'C:HIS:87', 'A:LYS:99', 'A:VAL:96', 'B:PRO:100', 'C:PHE:128', 'C:SER:133', 'B:VAL:133', 'A:SER:133', 'A:ASN:97', 'C:PHE:33', 'C:ARG:141', 'D:VAL:134', 'D:GLY:136', 'C:THR:38', 'D:VAL:137', 'A:MET:32', 'D:ALA:140', 'A:ASP:94', 'D:LEU:141', 'D:ASN:139', 'B:GLN:131', 'D:ASN:102', 'D:GLY:107', 'B:ASN:139', 'C:LEU:91', 'C:THR:137', 'A:TYR:42', 'A:ARG:92', 'A:PHE:98', 'A:THR:134', 'D:GLN:131', 'D:ALA:135', '

In [7]:
# Again, we can inverse this selection
s_g = extract_subgraph_from_point(g, centre_point=(0, 0, 0), radius=20, inverse=True)
plotly_protein_structure_graph(s_g, node_size_min=4, node_size_multiplier=2)

DEBUG:graphein.protein.subgraphs:Found 177 nodes in the spatial point-radius subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:LEU:2', 'A:SER:3', 'A:PRO:4', 'A:ALA:5', 'A:ASP:6', 'A:LYS:7', 'A:THR:8', 'A:ASN:9', 'A:VAL:10', 'A:LYS:11', 'A:ALA:12', 'A:ALA:13', 'A:TRP:14', 'A:GLY:15', 'A:LYS:16', 'A:VAL:17', 'A:GLY:18', 'A:ALA:19', 'A:HIS:20', 'A:ALA:21', 'A:GLY:22', 'A:GLU:23', 'A:TYR:24', 'A:GLY:25', 'A:ALA:26', 'A:GLU:27', 'A:ALA:28', 'A:GLU:30', 'A:PHE:43', 'A:PRO:44', 'A:HIS:45', 'A:PHE:46', 'A:ASP:47', 'A:LEU:48', 'A:SER:49', 'A:HIS:50', 'A:GLY:51', 'A:SER:52', 'A:ALA:53', 'A:GLN:54', 'A:VAL:55', 'A:LYS:56', 'A:GLY:57', 'A:HIS:58', 'A:GLY:59', 'A:LYS:60', 'A:LYS:61', 'A:VAL:62', 'A:ALA:63', 'A:ASP:64', 'A:ALA:65', 'A:LEU:66', 'A:THR:67', 'A:ASN:68', 'A:ALA:69', 'A:VAL:70', 'A:ALA:71', 'A:HIS:72', 'A:VAL:73', 'A:ASP:74', 'A:ASP:75', 'A:MET:76', 'A:PRO:77', 'A:ASN:78', 'A:ALA:79', 'A:LEU:80', 'A:SER:81', 'A:ALA:82', 'A:LEU:83', 'A:SER:84', 'A:ASP:85', 'A:LE

## Subgraphing based on Residue Types


In [8]:
from graphein.protein.subgraphs import extract_subgraph_from_residue_types
residue_types = ["SER", "ALA", "GLY"]

s_g = extract_subgraph_from_residue_types(g, residue_types)
plotly_protein_structure_graph(s_g, colour_nodes_by="residue_name")

DEBUG:graphein.protein.subgraphs:Found 144 nodes in the residue type subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['B:GLY:25', 'A:ALA:130', 'C:ALA:123', 'A:ALA:28', 'A:ALA:69', 'B:ALA:138', 'D:ALA:142', 'B:GLY:64', 'A:ALA:88', 'B:ALA:142', 'B:ALA:86', 'C:ALA:130', 'C:ALA:111', 'D:SER:89', 'A:ALA:123', 'A:GLY:22', 'D:ALA:27', 'D:GLY:56', 'A:ALA:13', 'D:SER:9', 'B:ALA:13', 'A:ALA:65', 'B:SER:9', 'D:GLY:64', 'C:SER:131', 'D:ALA:138', 'D:ALA:76', 'C:ALA:28', 'A:GLY:25', 'A:ALA:26', 'C:SER:133', 'A:SER:133', 'D:ALA:140', 'D:GLY:136', 'B:SER:72', 'C:SER:3', 'D:GLY:107', 'B:ALA:76', 'C:GLY:15', 'D:GLY:119', 'D:SER:72', 'A:SER:81', 'D:ALA:70', 'D:GLY:16', 'D:ALA:135', 'C:GLY:22', 'D:ALA:115', 'B:SER:44', 'C:ALA:69', 'C:GLY:57', 'B:ALA:129', 'D:ALA:53', 'C:ALA:65', 'D:GLY:74', 'C:ALA:120', 'C:GLY:59', 'B:GLY:107', 'C:SER:49', 'B:ALA:135', 'C:ALA:19', 'B:GLY:29', 'A:SER:124', 'B:ALA:128', 'D:ALA:13', 'B:GLY:74', 'D:GLY:69', 'D:GLY:83', 'C:SER:81', 'A:SER:102', 'C:ALA:

In [9]:
# Inverse the selection
s_g = extract_subgraph_from_residue_types(g, residue_types, inverse=True)
plotly_protein_structure_graph(s_g, colour_nodes_by="residue_name", node_size_min=4, node_size_multiplier=2)

DEBUG:graphein.protein.subgraphs:Found 144 nodes in the residue type subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:VAL:1', 'A:LEU:2', 'A:PRO:4', 'A:ASP:6', 'A:LYS:7', 'A:THR:8', 'A:ASN:9', 'A:VAL:10', 'A:LYS:11', 'A:TRP:14', 'A:LYS:16', 'A:VAL:17', 'A:HIS:20', 'A:GLU:23', 'A:TYR:24', 'A:GLU:27', 'A:LEU:29', 'A:GLU:30', 'A:ARG:31', 'A:MET:32', 'A:PHE:33', 'A:LEU:34', 'A:PHE:36', 'A:PRO:37', 'A:THR:38', 'A:THR:39', 'A:LYS:40', 'A:THR:41', 'A:TYR:42', 'A:PHE:43', 'A:PRO:44', 'A:HIS:45', 'A:PHE:46', 'A:ASP:47', 'A:LEU:48', 'A:HIS:50', 'A:GLN:54', 'A:VAL:55', 'A:LYS:56', 'A:HIS:58', 'A:LYS:60', 'A:LYS:61', 'A:VAL:62', 'A:ASP:64', 'A:LEU:66', 'A:THR:67', 'A:ASN:68', 'A:VAL:70', 'A:HIS:72', 'A:VAL:73', 'A:ASP:74', 'A:ASP:75', 'A:MET:76', 'A:PRO:77', 'A:ASN:78', 'A:LEU:80', 'A:LEU:83', 'A:ASP:85', 'A:LEU:86', 'A:HIS:87', 'A:HIS:89', 'A:LYS:90', 'A:LEU:91', 'A:ARG:92', 'A:VAL:93', 'A:ASP:94', 'A:PRO:95', 'A:VAL:96', 'A:ASN:97', 'A:PHE:98', 'A:LYS:99', 'A:LEU:100',

## Subgraphing based on Chains
We can extract graphs of individual chains in a complexed structure graph.

First, let's recap what original protein looks like when coloured by chain:

In [10]:
plotly_protein_structure_graph(g, colour_nodes_by="chain_id", node_size_min=20, node_size_multiplier=1)

And now we extract the subgraph:

In [11]:
from graphein.protein.subgraphs import extract_subgraph_from_chains

s_g = extract_subgraph_from_chains(g, ["A", "B"])
plotly_protein_structure_graph(s_g, colour_nodes_by="chain_id", node_size_min=20, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 287 nodes in the chain subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['B:GLY:25', 'B:TRP:37', 'A:HIS:112', 'A:ALA:130', 'B:THR:123', 'B:LYS:8', 'A:TYR:140', 'A:LYS:90', 'A:HIS:122', 'A:ALA:28', 'A:ALA:69', 'B:HIS:2', 'B:ALA:138', 'B:HIS:143', 'B:GLY:64', 'A:ASP:74', 'B:LEU:114', 'A:ALA:88', 'B:VAL:98', 'A:VAL:107', 'A:THR:41', 'A:VAL:73', 'B:LYS:59', 'B:ALA:86', 'B:ALA:142', 'B:LEU:81', 'A:VAL:62', 'A:ALA:123', 'A:GLY:22', 'A:ASP:64', 'B:THR:50', 'A:LYS:11', 'A:ASP:126', 'B:GLU:7', 'A:ALA:13', 'A:PHE:36', 'B:ALA:13', 'A:ALA:65', 'B:SER:9', 'B:LEU:32', 'B:ASN:102', 'A:LYS:7', 'A:LEU:105', 'B:GLU:22', 'B:PRO:58', 'B:CYS:93', 'B:LEU:68', 'A:LEU:86', 'B:HIS:146', 'A:THR:108', 'A:GLY:25', 'A:LEU:100', 'A:LEU:136', 'B:LEU:31', 'B:ASN:80', 'A:THR:8', 'A:LYS:60', 'A:ALA:26', 'A:LEU:2', 'A:VAL:96', 'A:LYS:99', 'B:LEU:28', 'B:ARG:40', 'A:TRP:14', 'B:PRO:100', 'A:PHE:117', 'A:SER:133', 'A:ASN:97', 'A:PHE:46', 'A:LEU:113', 'B:VAL:13

## Subgraphing to Protein Surface
This can be achieved with `extract_subgraph_from_atom_types()`. Here, we require a graph with the [Relative Solvent Accessibility](https://en.wikipedia.org/wiki/Relative_accessible_surface_area) (RSA, computed by [DSSP](https://anaconda.org/salilab/dssp)) feature. One can define a threshold value of RSA, above which a residue will be considered accessible/on the surface.

In [12]:
from graphein.protein.config import DSSPConfig
from graphein.protein.subgraphs import extract_surface_subgraph
from graphein.protein.features.nodes import rsa

config = ProteinGraphConfig(edge_construction_functions=edge_fns, graph_metadata_functions=[rsa], dssp_config=DSSPConfig())
graph_with_rsa = construct_graph(pdb_code="4hhb", config=config)

RSA_THRESHOLD = 0.2

s_g = extract_surface_subgraph(graph_with_rsa, RSA_THRESHOLD)
plotly_protein_structure_graph(s_g, colour_nodes_by="chain_id", node_size_min=20, node_size_multiplier=1)

DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 574 total nodes
INFO:graphein.protein.edges.distance:Found: 84 aromatic-aromatic interactions
INFO:graphein.protein.edges.distance:Found 1284 hydrophobic interactions.
INFO:graphein.protein.edges.distance:Found 6 disulfide interactions.
INFO:graphein.protein.edges.distance:Found 208 hbond interactions.
INFO:graphein.protein.edges.distance:Found 12 hbond interactions.
INFO:graphein.protein.edges.distance:Found 420 ionic interactions.


Downloading PDB structure '4hhb'...


INFO:graphein.protein.utils:Downloaded PDB file for: 4hhb
DEBUG:graphein.protein.subgraphs:Found 294 nodes in the surface subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:HIS:112', 'A:ALA:130', 'B:THR:123', 'B:LYS:8', 'A:LYS:90', 'B:HIS:2', 'D:ALA:142', 'B:HIS:143', 'D:GLU:26', 'D:LYS:95', 'D:GLU:101', 'A:ASP:74', 'D:LYS:120', 'C:THR:118', 'B:ALA:142', 'B:LYS:59', 'D:THR:123', 'C:THR:134', 'D:HIS:117', 'C:ALA:130', 'D:HIS:97', 'A:GLY:22', 'A:ASP:64', 'B:THR:50', 'C:LYS:99', 'D:THR:4', 'A:LYS:11', 'D:GLY:56', 'D:SER:9', 'B:ALA:13', 'A:ALA:65', 'B:SER:9', 'C:LYS:40', 'A:LYS:7', 'C:SER:131', 'B:GLU:22', 'C:ASP:64', 'D:GLU:43', 'B:PRO:58', 'D:LYS:66', 'D:HIS:77', 'D:ALA:76', 'A:LEU:86', 'B:HIS:146', 'C:HIS:112', 'A:LEU:100', 'B:ASN:80', 'C:LYS:11', 'C:HIS:87', 'A:THR:8', 'A:LYS:60', 'C:LYS:61', 'A:LYS:99', 'A:VAL:96', 'B:ARG:40', 'C:HIS:45', 'C:PRO:114', 'C:ARG:141', 'A:PHE:46', 'C:THR:38', 'D:VAL:67', 'B:LYS:66', 'C:HIS:89', 'D:PRO:58', 'B:GLU:43', 'B:SER:72', 

Equally, this selection can be inverted to obtain the non-solvent accessible subgraph

In [13]:
s_g = extract_surface_subgraph(graph_with_rsa, RSA_THRESHOLD, inverse=True)
plotly_protein_structure_graph(s_g, colour_nodes_by="chain_id", node_size_min=20, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 294 nodes in the surface subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:LEU:2', 'A:ASP:6', 'A:VAL:10', 'A:ALA:13', 'A:TRP:14', 'A:VAL:17', 'A:ALA:21', 'A:TYR:24', 'A:GLY:25', 'A:ALA:26', 'A:GLU:27', 'A:ALA:28', 'A:LEU:29', 'A:ARG:31', 'A:MET:32', 'A:PHE:33', 'A:SER:35', 'A:PHE:36', 'A:THR:39', 'A:THR:41', 'A:TYR:42', 'A:PHE:43', 'A:SER:52', 'A:VAL:55', 'A:GLY:59', 'A:VAL:62', 'A:ALA:63', 'A:LEU:66', 'A:ALA:69', 'A:VAL:70', 'A:VAL:73', 'A:MET:76', 'A:LEU:80', 'A:SER:84', 'A:ALA:88', 'A:ARG:92', 'A:VAL:93', 'A:ASP:94', 'A:ASN:97', 'A:PHE:98', 'A:SER:102', 'A:HIS:103', 'A:CYS:104', 'A:LEU:105', 'A:LEU:106', 'A:VAL:107', 'A:THR:108', 'A:LEU:109', 'A:ALA:110', 'A:ALA:111', 'A:LEU:113', 'A:PHE:117', 'A:PRO:119', 'A:VAL:121', 'A:HIS:122', 'A:ALA:123', 'A:SER:124', 'A:LEU:125', 'A:ASP:126', 'A:LYS:127', 'A:PHE:128', 'A:LEU:129', 'A:VAL:132', 'A:SER:133', 'A:VAL:135', 'A:LEU:136', 'A:TYR:140', 'B:LEU:3', 'B:GLU:7', 'B:VAL:11',

## Subgraphing based on Secondary Structure
We extract subgraphs based on selections of [secondary structure](https://en.wikipedia.org/wiki/Protein_secondary_structure#:~:text=Protein%20secondary%20structure%20is%20the,omega%20loops%20occur%20as%20well.) elements using: `extract_subgraph_from_secondary_structure()`. This requires a graph with node-level secondary structure assignments computed by [DSSP](https://anaconda.org/salilab/dssp).

Assignments produced by DSSP:
* H: Alpha helix (4-12)
* B: Isolated beta-bridge residue
* E: Strand
* G: 3-10 helix
* I: Pi helix
* T: Turn
* S: Bend
* -: None

In [14]:
from graphein.protein.features.nodes import secondary_structure
from graphein.protein.subgraphs import extract_subgraph_from_secondary_structure


config = ProteinGraphConfig(edge_construction_functions=edge_fns, graph_metadata_functions=[secondary_structure], dssp_config=DSSPConfig())
graph_with_ss = construct_graph(pdb_code="4hhb", config=config)

VALID_SS = ["H"] # Coresponds to Helix.

s_g = extract_subgraph_from_secondary_structure(graph_with_ss, VALID_SS)
plotly_protein_structure_graph(s_g, colour_nodes_by="residue_number", node_size_min=20, node_size_multiplier=1)

DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 574 total nodes
INFO:graphein.protein.edges.distance:Found: 84 aromatic-aromatic interactions
INFO:graphein.protein.edges.distance:Found 1284 hydrophobic interactions.
INFO:graphein.protein.edges.distance:Found 6 disulfide interactions.
INFO:graphein.protein.edges.distance:Found 208 hbond interactions.
INFO:graphein.protein.edges.distance:Found 12 hbond interactions.
INFO:graphein.protein.edges.distance:Found 420 ionic interactions.


Downloading PDB structure '4hhb'...


INFO:graphein.protein.utils:Downloaded PDB file for: 4hhb
DEBUG:graphein.protein.subgraphs:Found 379 nodes in the secondary structure subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:ALA:28', 'A:ALA:69', 'B:ALA:142', 'B:LYS:59', 'B:ALA:86', 'C:THR:108', 'A:ALA:123', 'C:LYS:99', 'D:ALA:27', 'B:SER:9', 'D:GLY:64', 'D:ALA:138', 'B:LEU:68', 'D:PHE:103', 'A:THR:108', 'D:LEU:32', 'C:SER:133', 'C:ASP:6', 'C:PHE:33', 'D:LEU:141', 'D:VAL:137', 'D:VAL:67', 'C:VAL:70', 'A:VAL:55', 'D:GLY:107', 'D:GLU:22', 'B:TRP:15', 'A:THR:134', 'C:ASN:9', 'D:ALA:135', 'D:ALA:115', 'A:VAL:17', 'B:THR:87', 'D:HIS:116', 'D:LEU:88', 'A:LEU:101', 'C:GLY:59', 'D:LEU:110', 'B:ALA:128', 'A:LEU:106', 'B:GLY:74', 'D:GLY:83', 'C:HIS:58', 'A:SER:102', 'B:VAL:126', 'C:VAL:62', 'B:PRO:5', 'B:LYS:132', 'D:GLU:90', 'D:VAL:11', 'A:ASP:85', 'A:LEU:29', 'C:VAL:135', 'C:ARG:31', 'D:LEU:106', 'C:VAL:10', 'A:TYR:24', 'A:GLY:59', 'C:ALA:82', 'D:ALA:129', 'B:LEU:106', 'B:GLY:69', 'A:ALA:21', 'C:ALA:12', 'A:

Again, this can be inversed to remove the selection

In [15]:
s_g = extract_subgraph_from_secondary_structure(graph_with_ss, VALID_SS, inverse=True)
plotly_protein_structure_graph(s_g, colour_nodes_by="residue_number", node_size_min=20, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 379 nodes in the secondary structure subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:VAL:1', 'A:LEU:2', 'A:SER:3', 'A:GLY:18', 'A:ALA:19', 'A:HIS:20', 'A:PHE:36', 'A:PRO:37', 'A:THR:38', 'A:THR:39', 'A:LYS:40', 'A:THR:41', 'A:TYR:42', 'A:PHE:43', 'A:PRO:44', 'A:HIS:45', 'A:PHE:46', 'A:ASP:47', 'A:LEU:48', 'A:SER:49', 'A:HIS:50', 'A:GLY:51', 'A:SER:52', 'A:HIS:72', 'A:VAL:73', 'A:ASP:74', 'A:ASP:75', 'A:LEU:80', 'A:LEU:86', 'A:HIS:87', 'A:ALA:88', 'A:HIS:89', 'A:LYS:90', 'A:LEU:91', 'A:ARG:92', 'A:VAL:93', 'A:ASP:94', 'A:PRO:95', 'A:LEU:113', 'A:PRO:114', 'A:ALA:115', 'A:GLU:116', 'A:PHE:117', 'A:THR:118', 'A:THR:137', 'A:SER:138', 'A:LYS:139', 'A:TYR:140', 'A:ARG:141', 'B:VAL:1', 'B:HIS:2', 'B:LEU:3', 'B:THR:4', 'B:GLY:16', 'B:LYS:17', 'B:VAL:18', 'B:ASN:19', 'B:TYR:35', 'B:PRO:36', 'B:TRP:37', 'B:THR:38', 'B:GLN:39', 'B:ARG:40', 'B:PHE:41', 'B:PHE:42', 'B:GLU:43', 'B:SER:44', 'B:PHE:45', 'B:GLY:46', 'B:ASP:47', 'B:LEU

## Subgraphing based on sequence positions
We extract subgraphs based on their position in the sequence with `extract_subgraph_by_sequence_position()`:

*N.B. this does not discriminate based on chain. If you wish to do so, either use the base node_list subsetting function or compose the chain selection and the sequence position selection functions*

In [16]:
from graphein.protein.subgraphs import extract_subgraph_by_sequence_position

SEQUENCE_POSITIONS = range(1, 100, 2)

s_g = extract_subgraph_by_sequence_position(g, SEQUENCE_POSITIONS)
plotly_protein_structure_graph(s_g, colour_nodes_by="residue_number", node_size_min=20, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 200 nodes in the sequence position subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['B:GLY:25', 'B:TRP:37', 'A:ALA:69', 'D:LYS:95', 'A:THR:41', 'A:VAL:73', 'B:LYS:59', 'B:LEU:81', 'D:TRP:15', 'D:HIS:97', 'D:SER:89', 'C:LYS:99', 'D:ALA:27', 'A:LYS:11', 'B:GLU:7', 'A:ALA:13', 'D:SER:9', 'B:ALA:13', 'A:ALA:65', 'B:SER:9', 'A:LYS:7', 'D:GLU:43', 'D:HIS:77', 'B:CYS:93', 'A:GLY:25', 'B:LEU:31', 'D:LEU:81', 'C:LYS:11', 'C:HIS:87', 'C:LYS:61', 'A:LYS:99', 'C:HIS:45', 'A:ASN:97', 'C:PHE:33', 'D:VAL:67', 'C:HIS:89', 'B:GLU:43', 'A:VAL:55', 'A:GLU:27', 'C:SER:3', 'D:CYS:93', 'C:LEU:91', 'C:GLY:15', 'B:TRP:15', 'B:ASP:47', 'D:ASP:79', 'C:ASN:9', 'C:PRO:77', 'A:SER:81', 'D:LEU:31', 'D:ASN:57', 'A:HIS:87', 'C:GLU:23', 'C:PRO:95', 'D:ASP:99', 'A:THR:39', 'A:VAL:17', 'B:THR:87', 'B:LEU:3', 'B:TYR:35', 'C:ALA:69', 'C:GLY:57', 'D:ASP:47', 'B:LEU:75', 'A:VAL:93', 'B:ASN:19', 'C:LEU:29', 'D:ALA:53', 'C:ALA:65', 'A:LEU:91', 'B:LYS:61', 'A:PRO:

## Subgraphs based on bond types
We can subset graphs to nodes that share certain bond types using `extract_subgraph_by_bond_type()`

In [17]:
from graphein.protein.subgraphs import extract_subgraph_by_bond_type

BOND_TYPES = ["hbond", "ionic"]

s_g = extract_subgraph_by_bond_type(g, BOND_TYPES)
plotly_protein_structure_graph(s_g, node_size_min=10, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 121 nodes in the bond type subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:HIS:112', 'A:HIS:122', 'B:HIS:2', 'D:GLU:101', 'D:GLU:26', 'A:ASP:74', 'D:HIS:117', 'D:HIS:97', 'A:ASP:64', 'A:LYS:11', 'A:ASP:126', 'B:GLU:7', 'A:LYS:7', 'D:GLU:43', 'B:GLU:22', 'C:ASP:64', 'D:HIS:77', 'B:HIS:146', 'C:HIS:112', 'C:LYS:11', 'A:LYS:60', 'C:LYS:61', 'C:ASP:6', 'C:ARG:141', 'A:SER:133', 'C:HIS:89', 'B:GLU:43', 'A:GLU:27', 'D:ARG:40', 'C:SER:3', 'B:HIS:116', 'C:HIS:50', 'D:GLU:22', 'A:TYR:42', 'A:ARG:92', 'B:ASP:47', 'A:ASN:78', 'D:HIS:146', 'B:GLU:26', 'D:ASP:79', 'A:HIS:72', 'A:SER:81', 'D:ASP:99', 'C:GLU:23', 'B:TYR:35', 'D:HIS:116', 'A:ARG:141', 'C:LYS:60', 'D:ARG:104', 'C:GLU:30', 'D:ASP:47', 'D:ARG:30', 'B:GLU:90', 'B:LYS:17', 'B:LYS:144', 'C:SER:49', 'A:SER:124', 'C:ARG:92', 'C:GLU:116', 'D:ASP:21', 'D:LYS:65', 'D:ASP:94', 'C:SER:81', 'C:HIS:72', 'C:ASP:85', 'A:SER:102', 'B:ARG:104', 'A:SER:49', 'C:LYS:7', 'C:LYS:16', 'A:GLU:

## K-hop subgraphs
We can extract subgraphs based on the set of nodes that are within $k$ hops of a central node using `extract_k_hop_subgraph`:

In [18]:
from graphein.protein.subgraphs import extract_k_hop_subgraph

# K = 1
s_g = extract_k_hop_subgraph(g, central_node="A:ALA:110", k=1)
plotly_protein_structure_graph(s_g, node_size_min=10, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 5 nodes in the k-hop subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:LEU:109', 'A:ALA:110', 'B:ALA:115', 'A:ALA:111', 'A:PHE:117'].


In [19]:
# K =2
s_g = extract_k_hop_subgraph(g, central_node="A:ALA:110", k=2)
plotly_protein_structure_graph(s_g, node_size_min=10, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 20 nodes in the k-hop subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:THR:118', 'A:HIS:112', 'B:HIS:116', 'A:VAL:121', 'A:GLU:116', 'A:LEU:125', 'B:LEU:114', 'A:VAL:107', 'A:VAL:17', 'A:TYR:24', 'B:ALA:115', 'A:THR:108', 'A:LEU:109', 'A:ALA:110', 'A:ALA:111', 'A:TRP:14', 'A:PHE:117', 'B:PHE:122', 'A:LEU:106', 'A:LEU:113'].


In [20]:
K = 3
s_g = extract_k_hop_subgraph(g, central_node="A:ALA:110", k=3)
plotly_protein_structure_graph(s_g, node_size_min=10, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 56 nodes in the k-hop subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:GLU:27', 'A:THR:118', 'A:VAL:70', 'A:HIS:112', 'B:VAL:126', 'B:TYR:130', 'A:GLU:23', 'B:THR:123', 'B:HIS:116', 'A:VAL:121', 'A:HIS:122', 'B:TRP:15', 'A:ALA:63', 'A:ALA:115', 'B:GLU:26', 'A:GLU:116', 'A:LEU:125', 'B:LEU:114', 'B:LEU:14', 'A:VAL:107', 'B:HIS:117', 'A:VAL:17', 'B:VAL:111', 'A:TYR:24', 'B:ALA:115', 'A:ASP:126', 'A:ALA:13', 'A:LEU:129', 'A:PRO:114', 'A:LEU:105', 'A:VAL:10', 'B:GLU:121', 'A:LYS:16', 'B:VAL:23', 'A:PRO:119', 'A:SER:124', 'A:THR:108', 'B:LEU:110', 'B:VAL:18', 'A:GLY:18', 'A:GLY:15', 'A:LEU:109', 'A:ALA:110', 'B:VAL:113', 'A:ALA:21', 'A:GLY:25', 'A:ALA:120', 'A:ALA:111', 'B:PHE:118', 'A:TRP:14', 'A:PHE:117', 'A:HIS:20', 'B:PHE:122', 'A:LEU:106', 'A:LEU:113', 'A:LEU:66'].


In [21]:
# K= 4
s_g = extract_k_hop_subgraph(g, central_node="A:ALA:110", k=4)
plotly_protein_structure_graph(s_g, node_size_min=10, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 108 nodes in the k-hop subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['B:GLY:25', 'A:HIS:112', 'B:THR:123', 'A:ALA:130', 'A:HIS:122', 'A:ALA:28', 'A:ALA:69', 'B:LEU:114', 'A:VAL:107', 'A:VAL:62', 'A:ALA:123', 'A:GLY:22', 'A:ASP:64', 'A:LYS:11', 'A:ASP:126', 'A:ALA:13', 'B:ALA:13', 'A:ALA:65', 'A:LEU:105', 'B:GLU:22', 'B:LEU:68', 'A:THR:108', 'A:GLY:25', 'A:ALA:26', 'A:TRP:14', 'A:PHE:117', 'C:ARG:141', 'A:LEU:113', 'A:GLU:27', 'A:VAL:70', 'B:GLN:131', 'B:HIS:116', 'B:TRP:15', 'B:GLU:26', 'A:VAL:17', 'B:TYR:35', 'B:LEU:75', 'B:ALA:129', 'B:ASN:19', 'B:PRO:124', 'B:LYS:17', 'A:LEU:101', 'B:LYS:120', 'A:SER:124', 'A:THR:67', 'B:VAL:109', 'A:VAL:132', 'B:PHE:122', 'A:LEU:106', 'A:THR:118', 'B:VAL:126', 'B:TYR:130', 'B:GLN:127', 'B:PRO:51', 'A:VAL:121', 'B:LEU:14', 'B:VAL:111', 'A:CYS:104', 'A:TYR:24', 'A:ALA:19', 'B:VAL:11', 'A:ASP:6', 'B:VAL:33', 'A:LYS:127', 'A:LEU:109', 'A:ALA:21', 'A:ALA:120', 'A:ALA:71', 'A:ALA:111', 'B

In [22]:
# Again, these can be inversed:
s_g = extract_k_hop_subgraph(g, central_node="A:ALA:110", k=4, inverse=True)
plotly_protein_structure_graph(s_g, node_size_min=10, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 108 nodes in the k-hop subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:VAL:1', 'A:LEU:2', 'A:SER:3', 'A:PRO:4', 'A:ALA:5', 'A:LYS:7', 'A:THR:8', 'A:LEU:29', 'A:GLU:30', 'A:MET:32', 'A:PHE:33', 'A:LEU:34', 'A:SER:35', 'A:PHE:36', 'A:PRO:37', 'A:THR:38', 'A:THR:39', 'A:LYS:40', 'A:THR:41', 'A:TYR:42', 'A:PHE:43', 'A:PRO:44', 'A:HIS:45', 'A:PHE:46', 'A:ASP:47', 'A:LEU:48', 'A:SER:49', 'A:HIS:50', 'A:GLY:51', 'A:SER:52', 'A:ALA:53', 'A:GLN:54', 'A:VAL:55', 'A:LYS:56', 'A:GLY:57', 'A:HIS:58', 'A:GLY:59', 'A:LYS:60', 'A:LYS:61', 'A:ASN:68', 'A:HIS:72', 'A:VAL:73', 'A:ASP:74', 'A:ASP:75', 'A:MET:76', 'A:PRO:77', 'A:ASN:78', 'A:ALA:79', 'A:LEU:80', 'A:SER:81', 'A:ALA:82', 'A:LEU:83', 'A:SER:84', 'A:ASP:85', 'A:LEU:86', 'A:HIS:87', 'A:ALA:88', 'A:HIS:89', 'A:LYS:90', 'A:LEU:91', 'A:ARG:92', 'A:VAL:93', 'A:ASP:94', 'A:PRO:95', 'A:VAL:96', 'A:ASN:97', 'A:PHE:98', 'A:LYS:99', 'A:LEU:100', 'A:SER:102', 'A:HIS:103', 'A:SER:131', 'A:

## Subgraphing based on Atom Types
This can be achieved with `extract_subgraph_from_atom_types()`. This is not relevant for resiude-level graphs as we use (typically) C$\alpha$ atoms as the nodes. Instead, we create an atom-level graph for this example.

In [23]:
from graphein.protein.edges.atomic import add_atomic_edges
config=ProteinGraphConfig(granularity="atom", edge_construction_functions=[add_atomic_edges])
g = construct_graph(config=config, pdb_code="4hhb")
plotly_protein_structure_graph(g, node_size_min=5, node_size_multiplier=1, colour_nodes_by="atom_type")

DEBUG:graphein.protein.graphs:Deprotonating protein. This removes H atoms from the pdb_df dataframe
DEBUG:graphein.protein.graphs:Detected 4384 total nodes


In [24]:
from graphein.protein.subgraphs import extract_subgraph_from_atom_types

ATOM_TYPES = ["CA", "N"]

s_g = extract_subgraph_from_atom_types(g, ATOM_TYPES)
plotly_protein_structure_graph(s_g, colour_nodes_by="atom_type", node_size_min=5, node_size_multiplier=1)

DEBUG:graphein.protein.subgraphs:Found 1148 nodes in the atom type subgraph.
DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['A:LEU:91:N', 'C:HIS:112:CA', 'C:VAL:121:CA', 'D:GLY:56:N', 'C:PHE:33:N', 'A:PRO:77:N', 'D:PRO:51:CA', 'C:HIS:87:CA', 'B:LEU:96:CA', 'C:THR:41:N', 'B:SER:44:N', 'D:VAL:126:CA', 'A:PRO:4:N', 'C:THR:134:N', 'D:LEU:81:CA', 'B:GLY:56:N', 'B:LYS:82:CA', 'D:ALA:138:N', 'A:ASP:47:CA', 'A:ASN:68:CA', 'A:PRO:4:CA', 'A:ASN:78:N', 'B:ALA:115:CA', 'C:GLY:25:N', 'C:ASP:47:N', 'A:MET:76:CA', 'C:VAL:1:CA', 'B:ASP:47:CA', 'D:GLY:69:CA', 'A:THR:134:N', 'D:CYS:112:CA', 'A:ALA:28:N', 'D:CYS:93:N', 'A:GLU:116:N', 'B:PRO:125:N', 'B:GLU:6:N', 'A:ALA:123:CA', 'A:VAL:10:CA', 'C:SER:84:CA', 'A:LEU:48:CA', 'A:ALA:28:CA', 'B:PHE:122:CA', 'C:PHE:36:N', 'C:LEU:66:CA', 'B:THR:50:CA', 'C:GLY:22:N', 'C:SER:84:N', 'B:PHE:85:CA', 'D:ALA:115:N', 'C:ALA:63:N', 'C:THR:38:N', 'D:GLU:121:CA', 'B:TRP:37:N', 'B:ASP:47:N', 'A:LEU:105:CA', 'C:THR:137:CA', 'C:LEU:48:N', 'D:LEU:81:N', 'A:THR

## High-level function
We also provide a higher level function to combine multiple selections which wraps all of the aforementioned functions. All of the selections described previously can be performed with the `extract_subgraph` function:

```python
extract_subgraph(
    g: nx.Graph,
    node_list: Optional[List[str]] = None,
    sequence_positions: Optional[List[str]] = None,
    chains: Optional[List[str]] = None,
    residue_types: Optional[List[str]] = None,
    atom_types: Optional[List[str]] = None,
    bond_types: Optional[List[str]] = None,
    centre_point: Optional[
        Union[np.ndarray, Tuple[float, float, float]]
    ] = None,
    radius: Optional[float] = None,
    k_hop_central_node: Optional[str] = None,
    k_hops: Optional[int] = None,
    k_only: Optional[bool] = None,
    filter_dataframe: bool = True,
    inverse: bool = False,
    return_node_list: bool = False,
) -> Union[nx.Graph, List[str]]:
```


In [25]:
from graphein.protein.subgraphs import extract_subgraph
## Node list selection
s_g = extract_subgraph(g, node_list=NODE_LIST, inverse=False)

# Sequence position selection
s_g = extract_subgraph(g, sequence_positions=SEQUENCE_POSITIONS, inverse=False)

# chain selection
s_g = extract_subgraph(g, chains=["A", "B"], inverse=False)

# Performing selections with multiple methods

s_g = extract_subgraph(g, node_list=NODE_LIST, chains = ["A"], inverse=False)

plotly_protein_structure_graph(s_g, node_size_min=10, node_size_multiplier=1)


DEBUG:graphein.protein.subgraphs:Creating subgraph from nodes: ['B:THR:123', 'B:ALA:138', 'B:HIS:143', 'B:LEU:114', 'B:VAL:98', 'B:ALA:142', 'B:ALA:86', 'C:LEU:2', 'B:ASN:102', 'C:LYS:40', 'C:ASP:64', 'B:CYS:93', 'C:ALA:28', 'B:HIS:146', 'C:LYS:11', 'C:LYS:61', 'B:PRO:100', 'C:HIS:45', 'C:ASP:6', 'B:VAL:133', 'C:PHE:33', 'C:LEU:66', 'C:THR:38', 'C:VAL:70', 'C:SER:3', 'B:GLN:131', 'B:HIS:116', 'C:HIS:50', 'B:ASN:139', 'C:GLY:15', 'C:ASN:9', 'B:LEU:96', 'C:GLY:22', 'C:GLU:23', 'C:ALA:69', 'B:THR:87', 'C:LYS:60', 'C:ASN:68', 'C:GLY:57', 'C:GLU:30', 'B:ALA:129', 'B:LYS:82', 'C:LEU:29', 'C:ALA:65', 'B:PRO:124', 'B:GLU:90', 'C:LYS:56', 'B:LYS:144', 'C:GLY:59', 'B:LYS:120', 'B:LEU:91', 'B:LYS:95', 'B:GLY:107', 'C:SER:49', 'B:ALA:135', 'C:ALA:19', 'B:ALA:128', 'C:LEU:34', 'B:VAL:109', 'C:GLN:54', 'C:VAL:1', 'B:PHE:122', 'C:VAL:17', 'C:HIS:58', 'B:VAL:126', 'B:ARG:104', 'B:TYR:130', 'B:GLN:127', 'C:VAL:62', 'B:VAL:137', 'C:THR:67', 'C:LYS:7', 'B:LEU:105', 'C:LYS:16', 'C:ALA:26', 'B:LYS:132', 'C