## Getting Protein-Protein Interaction Data from BioGRID DB

### 0. Required libraries
This example focuses on accessing the protein-protein interaction data from BioGRID DB via provided REST API from the database. In this case, few libraries are used:
1. `requests` used for performing the web request and retrieval of response.
2. `pandas` used for transforming the retrieved response (in JSON) to data frame.
3. `networkx` used for constructing the network graph structure from the protein-protein information.
4. `matplotlib` to support for the drawing functions in `networkx`

In [None]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx

### 0. Register to get data from BioGRID DB
Go to https://webservice.thebiogrid.org and register and get the access key, please copy it!

### 1. Accessing the Web API of BioGRID DB
The following code shows the steps in accessing the Web API of BioGRID DB. First we need to setup the default endpoint of API to get the data, which is https://webservice.thebiogrid.org/interactions . Then, setup the parameters required such as the `accessKey`, `organism`, `geneList` and `format` as in the example. Details of the parameters available can refer to BioGRID API documentation. https://wiki.thebiogrid.org/doku.php/biogridrest<br><br>
Then will use the API endpoint url and the parameters to request response from the webserver. The returned response will be parsed using the `json()` function. <br><br>
<span style='color:red'>Note: Please always check if the response having the correct outcome or any error messages

In [None]:
biogrid_url = "https://webservice.thebiogrid.org/interactions"
params = {
    "accessKey":"585b23750ef5fd7f8c3b0a3170e41e3d",
    "format":"json",
    "searchNames": True,
    "geneList":"MB",
    "organism":9606,
    "searchbiogridids":True,
    "includeInteractors":True
}
response = requests.get(biogrid_url, params=params)
network  = response.json()

In [None]:
network

Then, use the `DataFrame.from_dict()` function to transform the data into a `pandas` dataframe. Please be noted that you can also export the dataframe into external files.

In [None]:
network_df = pd.DataFrame.from_dict(network, orient='index')
network_df.head() # print the first five row of data in dataframe

In [None]:
network_df.shape

In [None]:
network_df.OFFICIAL_SYMBOL_A = [gene.upper() for gene in network_df.OFFICIAL_SYMBOL_A]
network_df.OFFICIAL_SYMBOL_B = [gene.upper() for gene in network_df.OFFICIAL_SYMBOL_B]

In [None]:
network_df

Please observe the dataframe above, these are the information available from the protein-protein interaction data returned from the webserver. Look at the available attributes, in this example, we focus only on the two columns, which are the `OFFICIAL_SYMBOL_A` and `OFFICIAL_SYMBOL_B`. This is the indication of there is interaction between these two proteins.

### 2. Generate network graph for the protein-protein interaction
From now onwards, we will use the dataframe in generating the network graph. In this case, we will use the `from_pandas_edgelist()` function from `networkx` and pass in the dataframe we generate in STEP 1, then pass in the two columns name as the following parameters.

In [None]:
network_graph = nx.from_pandas_edgelist(network_df, "OFFICIAL_SYMBOL_A", "OFFICIAL_SYMBOL_B")

Once the network graph structure is generated, we can access many of the properties about the network. For example, we can get information about the number of edges and number of nodes using `number_of_edges()` and `number_of_nodes()` respectively. 

In [None]:
print('Number of edges:', network_graph.number_of_edges())
print('Number of nodes:', network_graph.number_of_nodes())

We also can get number of interactions available for each node using the `degree()` function of the network graph object.

In [None]:
network_graph.degree()

Then, we can also check on the centrality of the nodes, by checking the `degree_centrality()` function from `networkx` and pass in the network graph object. This will return the measured centrality of each node, closer to 1.0 tend to be the center. This is directly correlate with the output of `degree()` above.

In [None]:
degree_centrality = nx.degree_centrality(network_graph)

Nodes that closer to the center are the nodes that having higher number of interactions compare to the one further from the center. In this case, we also can filter certain number of proteins according to the centrality. In following example, we can extract the top five.

In [None]:
top_5_proteins = sorted(degree_centrality.items(), key=lambda x:-x[1])[:5]
top_5_proteins

### 3. Visualizing the network graph
Next, we can further visualize the network using `spring_layout()` function from `networkx` and pass in the network graph object. This will generate a layout based on the network graph object, which include the coordinates of each node. This is a random process, which by default it will generate different layout everytime. Unless set the `seed` parameter within `spring_layout` to a specific number, then it will always generate same layout.

In [None]:
slayout = nx.spring_layout(network_graph, seed=123)

Then, to generate the graph view, use the `draw()` function from `networkx`, pass in the network graph object and the layout from above. There are different settings available for the graph display, such as:
- `with_labels`: set True to display text on each node.
- `node_size`: the size of the node whether big or small.
- `node_color`: setting the color of the node.
- `edge_colr`: setting the color of the edge.
- `font_size`: setting the size of the font.
- `nodelist`: define specific nodes to be visualized

In [None]:
nx.draw(network_graph, slayout, with_labels=False, node_size=20, node_color='lightblue', font_size=8)

We can overlay the images by executing two `draw()` functions at the same time, this can be useful if we want to overlay some additional information. For example, in this case, let's obtain the protein names from the top five proteins. 

In [None]:
high_centrality_nodes = [node for node, centrality in top_5_proteins]
high_centrality_nodes

At first, we will use the `draw()` to draw the first layer of the network graph. <br>
Then, we can call the `draw_networkx_nodes()` function to draw a second layer on top of the first one, at this line, we will set the `nodelist` parameter to the list of top five proteins above, and set it in different color.

In [None]:
nx.draw(network_graph, slayout, with_labels=False, node_size=20, node_color='lightblue')
nx.draw_networkx_nodes(network_graph, slayout, nodelist=high_centrality_nodes, node_size=20, node_color='orange')

However, in this case, please be noted that, visualization might not be a meaningful if we visualize the entire data. But please be aware that this is just visualization, the PPI data still can be useful for integration with other analysis.