# Group K: Type 1 Diabetes in S. cerevisiae

<a id='backtobeg'></a>
<font size="5">Table of Contents</font>

1.[Hypothesis](#hypothesis)
<br>
2.[Background: BCMB](#bcmb1)
<br>
3.[Background: MATH](#math1)
<br>
4.[Integration of Interdisciplinary Components](#integration)
<br>
5.[Experiments: Network Algorithms](#math2)
<br>
6.[Experiments: Biological Significance of the Proteins](#bcmb2)
<br>
7.[Conclusion](#final)
<br>
8.[Reference List](#reference)



<a id='hypothesis'></a>
## 1. Hypothesis

**Project Aims (Madeline)**

To use Saccharomyces cerevisiae as a model to investigate and discover proteins associated with the pathogenesis of type 1 diabetes
To investigate the protein network of PTPN22, a human protein known to have a role in type 1 diabetes, via its yeast homologs PTP1, 2 and 3 to understand its pathway of function and find novel proteins that may also contribute to type 1 diabetes susceptibility.

**Project Hypotheses (Madeline and Elaine)**

Through the collaborative work of mathematics and biochemistry students, two yeast proteins, Hog1 and PTC1, were identified to be biologically relevant proteins that are linked to PTP2. It was hypothesised that knockout of PTP2 in yeast would be able to model the type 1 diabetes associated PTPN22 variant effect on p38 via cell death through Hog1, and that knocking out PTC1 would achieve a similar effect, thus implicating Wip1 dysfunction in type 1 diabetes susceptibility.



<a id='bcmb1'></a>
## 2. Background : BCMB (David)


Insulin dependent diabetes mellitus, also known as type 1 diabetes (T1D) , is an autoimmune disease driven by the selective destruction of insulin producing pancreatic β cells but the exact cause for the disease is yet to be fully understood. There are however properties and common aspects of the disease that are known (Paschou et al. 2018).
For example, we know that mechanistically T1D is typically driven by some form of inflammatory trigger within the pancreas, typically as a combination of genetic and epigenetic, both immunological and environmental risk factors. This insulitis (islet of langerhans inflammation) leads to the activation of antigen presenting cells triggering the adaptive immunity and activating CD4+ helper T-cells. These helper T-cells continue to drive the autoimmune reaction through the release of chemokines and cytokines resulting in the activation and recruitment of CD8+ cytotoxic T-cells and the generation of autoimmune antibodies ultimately resulting in the targeted destruction and apoptosis of pancreatic β cells (Størling et al. 2017). These driving factors can be further aggravated by genetic risk factors that support autoimmunity. In short, type 1 diabetes is typically as a result of some form of environmental trigger or stress, whose effects are exacerbated by genetic factors. Such factors can include the polymorphisms of the CTLA-4 gene that affects the suppression of T-cells or mutations in the HLA coding MHC regions that can have a greater disease encompassing effect that diminishes central immune tolerance, generates autoreactive B and T cells and drives beta cell apoptosis via MHC-1 auto antigen expression (Paschou et al. 2018, Richardson et al. 2016).


<img src="images/Fig1.png" style= "width:600px height:50px">

*Figure 1 : Made with biorender: depicts many key intercellular interactions within the pathogenesis of type 1 diabetes (note: it does not depict all known factors)*

**PTPN22 (David)**

One such genetic trigger includes PTPN22, primarily by its mutated variant of PTPN22 R620W. PTPN22 normally plays a significant role in the regulation of the immune response through its suppression of autoreactive T-cells(Cao  et al. 2012). The PTPN22 mutated variant, however, instead becomes a gain-of-function variant that instead of acting as just an inhibitor for T-cell activation, is now also capable of acting as an inducer for p38, a MAPK responsible for the regulation of stress, as well as T and B cell activation and innate cell maturation (Menard et al. 2011). Under stressful conditions, the PTPN22 variant acts upon p38 to drive its MAPK signalling, resulting in mitochondrial apoptosis of beta cells (Tomita 2017). R620W also reduces the gene expression of anti-inflammatory IL-10, a cytokine responsible for regulating excessive immune response. Overall, this mutant variant of PTPN22 causes the overexpression of the immune response, and apoptosis signalling in beta cells. Pathways of PTPN22 and its associated proteins thus showed potential for investigation in understanding the mechanism of type 1 diabetes and the identification of other risk factors of the disease.

PTP1, 2 and 3 are the three known homologs of PTPN22 in Saccharomyces cerevisiae. All three phosphatases have inhibitory roles in MAPK pathways similar to that of the gain-of-function variant of PTPN22, all of which are involved in stress regulation such as heat, osmolarity and oxidative stress. PTP2 and PTP3 were particularly relevant to our project due to their function in inhibiting Hog1, the yeast homolog of p38, with PTP2 being the more potent inhibitor (Mattison et al. 2000, Rodriguez-Pena et al. 2010). The similarities in function and pathway between the human proteins and their homologs justified the use of yeast as a model for our investigation.

**Yeast homologs (David)**

PTP1, 2 and 3 are the three known homologs of PTPN22 in Saccharomyces cerevisiae. All three phosphatases have inhibitory roles in MAPK pathways similar to that of the gain-of-function variant of PTPN22, all of which are involved in stress regulation such as heat, osmolarity and oxidative stress. PTP2 and PTP3 were particularly relevant to our project due to their function in inhibiting Hog1, the yeast homolog of p38, with PTP2 being the more potent inhibitor (Mattison et al. 2000, Rodriguez-Pena et al. 2010). The similarities in function and pathway between the human proteins and their homologs justified the use of yeast as a model for our investigation.


**Conservation of apoptotic mechanisms (Madeline)**

Apoptosis is a highly regulated process of programmed cell death used to selectively remove damaged, superfluous or dangerous cells in order to maintain organismal homeostasis. Research over the years have demonstrated that the basic molecular machinery for yeast programmed cell death is conserved, and potentially confer an evolutionary advantage to yeast (Madeo et al. 1997; Madeo et al. 2002; Herker et al. 2004; Büttner et al. 2006). Yeast apoptosis appears to occur during the selective removal of older or damaged individuals from the population to enhance the survival and growth of younger cells (Herker et al. 2004; Büttner et al. 2006). Intracellular accumulation of reactive oxygen species (ROS) is a reliable apoptotic marker exhibited by the chronological aged yeast cell (Madeo et al. 1999; Herker et al. 2004). Exogenous agents can also induce yeast apoptosis, such as hydrogen peroxide, acetic acid, heat, UV radiation and hyperosmotic stress (Madeo et al. 1997). Like mammalian cells, S. cerevisiae exhibit characteristic apoptotic markers of DNA fragmentation, chromatin condensation, phosphatidylserine externalisation, and cytochrome c release from mitochondria (Madeo et al. 1997; Ludovico et al. 2002). These conservation of apoptotic mechanisms in yeast allows them to be used as a model to study apoptosis events in type 1 diabetes.

**PTPN22 / PTP2 BIOINFORMATICS (Madeline)**

The human PTPN22 is a class I non-receptor protein-tyrosine phosphatase (PTP) consisting of catalytic PTP domain, interdomain and C-terminal domain. The catalytic PTP domain is involved in the dephosphorylation activity during the negative regulation of T-cell receptor signalling. The active site of the human PTP catalytic domain consists of multiple loop regions: phosphate binding loop, which contains the catalytic Cysteine residue; Trp-Pro-Asp “WPD” loop, which act as a gate to the active site; Q-loop, involved in the hydrolytic activity; tyrosine phosphate (Tyr-P) recognition loop, which specifically recognises tyrosine phosphate on target substrates; and LYP-specific insert region, which allows PTPN22 to make extensive interactions with specific substrates (Tautz et al. 2013; Yu et al. 2011) (Figure 3).  The yeast homolog, PTP2 also contains conserved PTP domains that have the same dephosphorylation activity to the PTP domain in human PTPN22. Sequence alignments have confirmed the conservation of Cysteine residue at the active site of PTP domains in human PTPN22 and yeast PTP2 (Figure2).  

<img src="images/Fig2.png" style="width: 1000px">

*Figure 2. Protein sequence alignment of human PTPN22 and S. cerevisiae PTP2 using UniProt webserver Clustal-Omega program. Highlighted in grey are sequence similarities, with 95 similar positions and 7.3% identity. Cysteine residue at the active site is conserved between human and yeast (red).*

<img src="images/Fig3.png">

*Figure 3. Crystal structure of the human PTPN22 PTP domain. LYP-specific insert (cyan) provides substrate specificity of PTPN22. Conserved regions important in the catalytic activity are highlighted; tyrosine phosphate (Tyr-P) recognition loop (blue), Q-loop (orange), WPD loop (green) and phosphate binding loop (red). The catalytic Cys-227 shown in red spheres is conserved between human PTPN22 and yeast PTP2. PDB ID: 2P6X.*

**WIP1 / PTC1 BIOINFORMATICS (Elaine)**

Wip1 or PPM1D is a human phosphatase enzyme which acts as a negative regulator involved in cell death suppression via MAPK pathways (Takekawa et al, 2000). One protein Wip1 negatively regulates is p38 MAP kinase, known to be a positive regulator of factors associated with inflammation and apoptosis. In yeast, a homolog of Wip1 exists as PTC1, which is also a phosphatase. PTC1 negatively regulates Hog1, the yeast homolog of human p38, making it a suitable model for human Wip1. 

Both Wip1 in humans and PTC1 phosphatase proteins have three conserved manganese ion binding sites (Figure 4). In yeast PTC1 these sites are conserved at D58- G59, D233 and D272. Binding of a manganese ion by all three of these sites causes conformational changes in the proteins that result in activation of its phosphatase domain (Figure 5). A past study has mutated the binding site at D314 to alanine in Wip1, causing an elimination of protein function (Lu et al, 2005). Targeting these conserved sites in yeast will thus allow for PTC1 knockout via protein function disruption.

<img src="images/Fig4.png" style = "width:1000px ">

*Figure 4. Protein sequence alignment of Saccharomyces cerevisiae protein PTC1 (PP2C1) and human protein WIP1 (PPM1D). Shaded areas representative of sequence similarity. Metal-ion binding sites conserved between yeast and human protein (purple). Image adapted from Uniprot.*


<img src="images/Fig5.png" style= "width:300px height:30px">

*Figure 5. Wip1 structure showing manganese ion-binding sites (orange) D105-G106, D314 and D366.*

[Back to the Table of Contents](#backtobeg)


<a id='math1'></a>
## 3. Background: MATH (Adam)

As the biology students identified PTP1, PTP2 and PTP3 homologs as important to potentially seeing effects to type-1 diabetes, we strived to look for important nodes that are in the paths of these proteins using network theory. As the PPI network has 6574 nodes and 922983 links, it would be practically inefficient to explore every node connected to PTP1, PTP2 and PTP3. Thus, our goal is to identify nodes that are of high importance relative to PTP1, PTP2, PTP3 but not having the effect of being lethal towards the whole network (as investigation of such nodes from a biology point of view would be lethal). Our overall aim is to prune our network such that we remove high degree nodes in clusters, then find meaningful nodes that correspond to our target proteins and in doing so make use of the following community detection, centrality and shortest path methods. 

###### 3.1 Louvain Community Detection

This algorithm seeks to maximise modularity in an iterative manner and alternates between treating nodes as communities with placing nodes in other communities so long as it increases modularity (details to follow). The Louvain community detection method is very fast relative to other similar detection methods (Fast Greedy, Spinglass) with complexity time of $O(nlogn)$. Further Rahiminejad, Maurya and Subramaniam (2019) found that the Louvain method yielded the most consistent results when compared with other methods based on having the highest fractions of nodes landing in identical communities relative to other algorithms. As with all modularity-based community detection methods, the Louvain community detection algorithm suffers from a resolution limit (Fortunato and Barthelemy, 2006, pp.36-41), that is it struggles to find smaller sized communities in larger networks. Though this is worth noting, it is not of great issue as it will become apparent later in our method that the purpose of community detection is to identify high degree nodes in each cluster. 

###### 3.2 Random Walks

Once we have identified clusters with Louvain algorithm, we then seek to find important nodes by dividing each cluster into a subgraph and performing a simple random walk on each subgraph. Intuitively, such a process will find the nodes that a random walker will spend more time on (in proportion to others) and can think of these identified nodes as having high connectivity. Python here will return a dictionary of values corresponding to each node. Such a method of ranking nodes in biological networks is extremely common and has been used to explore cell survival, essential proteins and to prioritise disease genes (Peng, Wang, Zhang and Wu, 2016). 

###### 3.3 Shortest Paths

The final stage of our method will involve using the networkx.algorithm.shortest_path command. This command computes the shortest path between target nodes and source nodes using a Bidirectional Breadth First Shortest Path Algorithm. Such an algorithm given two nodes will conduct a breadth first search until a vertex overlap in their searh history. Python will then return a dictionary with the nodes in the shortest paths between all possible pairs of nodes (unless either the target or source is specified).



[Back to the Table of Contents](#backtobeg)

<a id='integration'></a>
## 4. Integration of Interdisciplinary Components (Adam and Elaine)

Once we had established PTPN22 to be a starting protein of interest for our investigation into type 1 diabetes, the biochemistry team members conducted research into three identified yeast homologs of PTPN22: PTP1, PTP2 and PTP3. After gaining a general understanding of their pathways and confirming their suitability, these yeast homologs were presented to the maths peers for network analysis in order to map their protein interactions and find new proteins of interest. Additionally, yeast homologs of proteins investigated in the biochemistry student’s bioinformatics workshops: PTPRC and FOXP3 were also given to the maths team members to investigate for a broader scope. These additional yeast proteins included FKH1, FKH2, HCM1 and FHL1.

Initially, the mathematics students decided that the best way to find related proteins of interest was to look at the immediate neighbours of the seven aforementioned proteins (PTP1, PTP2, PTP3, FKH1, FKH2, HCM1,FHL1) and on such a simple network, perform a random walk Markov chain which yields a probability distribution. Though this method seemed promising (it would provide us with nodes that were a high probability of being related to our target homologs), when consulted by Georg, it appeared to be biased too heavily towards the immediate surroundings of PTP1, PTP2, PTP3 instead of using the entire network as a basis of our search for potentially related nodes. However, in the construction of an alternative Markov chains, we needed to check shortest paths and it appeared that 4932.YBR160W (CDC28) was frequently in the shortest paths of possible pairs of our seven interested proteins.     

Through the protein interaction network analysis done by the maths team members, it was found that the additional proteins FKH1, FKH2, HCM1 and FHL1 were quite distant interactors to our PTP1, 2 and 3 proteins. They were consequently excluded due to their low relevance to our pathways of interest.  
When the biochemistry students researched into CDC28, they confirmed it to be involved most closely in PTP2 and PTP3’s pathways. Further research revealed that CDC28 was a cyclin-dependent kinase with human homolog CDK1, and both proteins played a role in the cell cycle, acting as a switch between cell cycle progression when expressed or apoptosis when inhibited. This was an important finding that allowed the biochemistry students to narrow down on cell death pathways in yeast as an approach to better understand type 1 diabetes, and in particular, decide to focus on PTP2 and 3 as more suitable experimental proteins compared to PTP1.
The next method of approach the mathematics students underwent was to start from the whole network instead of starting from our interested nodes to find related proteins. Upon using the Louvain community detection algorithm to identify the clusters that PTP1, PTP2 and PTP3 were a part of, subgraphs of such clusters were made and repeated three times. We then ran a random walk process to compute stationary probabilities in each cluster subgraph and found that four nodes appeared to have high stationary probabilities (YLR113W, YJL128C, YDL006W and YNL053W). We then presented these nodes to our biochemistry peers for further investigation in their roles within the PTP2 and PTP3 pathway.   

From these new proteins provided by the maths teammates, the biochemistry members conducted a literature and database search to understand how these proteins interacted with the yeast homologs as well as their relevant pathways. It was concluded that two proteins, Hog1 (YLR113W) and PTC1 (YDL006W) were of particular interest due to their relevance within PTP2 and PTP3 pathway and their connection with the previously identified CDC28. Linking these proteins finally allowed the biochemistry students to identify the Hog1 MAPK signalling pathway as the appropriate pathway to investigate (Figure 6).

<img src="images/Fig6.png" style="width: 1000px height: 500px">

*Figure 6. Identified secondary proteins in relation to their roles and interactions in the Hog1 PTP2/3 MAPK pathway. Image adapted from KEGG kegg.jp6*

Hog1 was discovered to be involved in cell death mechanisms for yeast via activation of factors which inhibited CDC28, and its human homolog p38 similarly functioned in cell death and inflammation signalling via the MAPK pathway. Like PTP2/3, PTC1 was discovered to be a negative regulator for Hog1, with its human homolog Wip1 also being a negative regulator for p38. As the biochemistry team was interested in using knockouts to discover protein dysfunctions that would cause increased inflammation and cell death, PTC1, PTP2/3 were concluded to be highly suitable experimental targets.

However, after the biochemistry students had found experimental targets and the project neared an end, it became apparent to the mathematics students that this was equivalent to simply searching on the STRING database so we wanted to alter it slightly for the sake of originality.   Thus, we decided to carry out the deletion of the highest degree nodes (according to stationary probabilities in each cluster subgraph) after the community detection step. We then could find nodes in the shortest path between PTP1 and PTP2, PTP3. As a result we saw that PTC1 (YDL006W) was still in such a path and thus was a terrific candidate to consider as an important node to proceed with for our biochemistry peers. 

The result of this joint collaboration between the maths and biochemistry members of the team was the identification of two new proteins Hog1 and PTC1 whose human homologs could potentially be associated with the pathogenesis of type 1 diabetes. Following this, the biochemistry team members were able to effectively design an experimental plan. 


<a id='math2'></a>
## 5. Experiments: Network Algorithms

Our biologists discovered that the yeast proteins PTP1, PTP2, and PTP3 were homologs of PTPN22, the human protein in which we are interested. Our efforts therefore focused on investigating the network around these three proteins, looking for other nearby proteins which might have important relationships to them.

### 5.1 Identifying Meaningful Proteins (Hoon)

###### 5.1.1. Louvain Community detection 


Our primary mathematical method utilised Louvain community detection and Random Walk centrality, and was carried out by Hoon.
The purpose of utilising Louvain Community detection is to identify the significant clusters in the network.
Louvain Community detection is based on maximizing the modularity of clusters in the network. The first step is to assign a node to its own community. Afterwards the algorithm goes through every node, and calculates the local modularity score after adding the node to each of its neighbours' clusters.

Modularity score is given by : 
$$ Q = \frac{1}{2m}\sum_{i,j}[A_{i,j}-\frac{k_{i}k_{j}}{2m}]\delta({c_{i}c_{j}}) $$

where: 
- m is the sum of all of the edge weights in the graph 
- $k_{i} \ and \ k_{j}$ are the sum of the weights of the edges attached to nodes  i and j , respectively;
- $A_{i,j}$ represents the Adjacency Matrix element
- $c_{i},c_{j}$ represent the communities of the node
- $\delta$ is the Kronecker Delta

Then group the nodes with the neighbour which produced the highest modularity score.
After going through every node, the second step is to merge each community into a super node creating a new network then repeat the above steps on the new network.
Figure 7 illustrates the implication of the Louvain Community Detection’s 2nd Step.

<img src="images/Fig7.png" style= "width:600px height:60px">

*Figure 7 (Blondel et al, 2008)*

Louvain Community Detection was utilised because it is a simple algorithm which outperforms its modularity maximization algorithm counterparts with a computational complexity of $O(n\log n)$. Furthermore, it can find clusters of different sizes.




###### 5.1.2. Random Walk Stationary Centrality Algorithms


A Markov process is a stochastic process which follows the Markov principle. That is, the probability of a future action is only determined by one's current situation. Mathematically it can be expressed as :
$$ P_{i,j} = P(x_{n} = j | x_{n-1} = i) $$

- $P_{i,j}$, in the context of a graph, is the probability to move to node $j$ , given one is at node $i$
- $x_{n} = k$ mean at time n, one is at position k

In a Markov Process (a phenomon called a random walk, which the figure below shows) the movement of the walker convergences to a fixed pattern which is described as the stationary probability. The stationary probability of a node is denoted as $\pi_{i}$. Figure 8 shows the movement of a random walker.


![SegmentLocal](rw_anim.gif "segment")

*Figure 8 : Random Walk Animation. (Dikkerboom, 2019)*

To find High Centrality Nodes, Hoon took a novel approach, calculating the Random Walk stationary distribution.
In theory a stationary distribution $\pi$ is calculated by repeatedly multiplying a probability transition Matrix $P$ given by :

$$ P = AD^{-1} $$ 

- $A$ represents the Adjacency Matrix element
- $D$ represents the diagonal matrix with elements equal to the row sum of A $$ D_{i,i} = \sum_{j}A_{i,j}$$

This mathematical expression here shows that the probability distribution here is uniform.

$\pi$ has the property $$π^{n}=π^{n}P$$ as long as $P$ is a periodic and irreducible.

Consider again the expression above.

This shows that $\pi^{n}$ is an eignevector of $P$ with eigenvalue $1$.

Although the Random walk Stationary Centrality algorithm has a relatively expensive computational complexity $O(n^{2})$, the main reason for utilising Random Walk Centrality scores is because it models the Protein-Protein Interactions closer to the real world.

In real world protein-protein interactions, signalling pathways may not always be optimal (geodesic). To account for this phenomenon, the betweenness centrality score based on random walks ensures that the score accounts for paths that are not always optimal (Newman, 2005).

###### 5.1.3. Deletion of High Centrality Nodes in each cluster and Application of the Shortest Paths Algorithm


Nodes that have a high centrality in Networks tend to be hubs linking different clusters. By removing these hubs noise will be eliminated leading to  sparser graphs ensuring that meaningful pathways between nodes can be explored. (Eslachi, Maddi, 2018)
By the deletion of high centrality nodes in each cluster, more noise in the network can be eliminated. Deletion of top 10%, 20% and 30% nodes yielded the same result.
The shortest path algorithm that is used is the bidirectional shortest paths algorithm. The bidirectional shortest paths algorithm runs two simultaneous BFS searches, starting from the Goal and Starting nodes towards each other. The search terminates once the two subgraphs generated from the search intersect at a vertex. Not only does a Bi-Directional shortest path algorithm reap results than a single BFS search it is much more efficient running at a computational complexity of  $O(n^{\frac{k}{2}})$ compared to $O(n^k)$.


Figure 9 below displays the intuition behind this methodology. Although every node here is connected to the center node, it's removal can lead to a meaningful pathway between two target nodes.

<img src="images/Fig9.jpeg" style="width: 1000px;height: 500px">

*Figure 9 : Paths between nodes connected by a high degree node (Gottwald, 2020)*

### 5.2 Earlier Approaches (Christopher)

###### 5.2.1 Random Walks in Neighbourhoods

In the early stages of the project, our biologists had also discovered that FKH1, FKH2, HCM1, FHL1 are yeast proteins which are homologs of human proteins known to interact with PTPN22, and so we included these four in our early experiments.
Thus we had seven 'interested' proteins in the early stages: PTP1, PTP2, PTP3, FKH1, FKH2, HCM1, FHL1.
The first approach followed the efforts of Wang and Peng (2013) in research on cancer-related proteins, and the follow up work by Peng et al. (2017), both of which dealt with random walks on small protein networks.
Our random walks approach was to construct a simple network of the immediate surroundings of an interested protein, and performed a random
walk Markov chain on those nodes, resulting in a probability distribution for the immediate neighbours of each of our seven proteins.

Once subgraph construction is complete, an artificial 'S' node is added to the network, with a link to every other node. This is to represent the influence of the outside network; the potential for a random walker to leave the local system and come back. Plainly, S always registered a higher probability than any other node, since it was linked to every other node, but this is to be expected: the outside network will obviously outweigh all the nodes in our small local network. Consequently, we ignore S when looking for the highest probability nodes after Random Walking.

This method by nature focuses on the immediate vicinity of the interested protein, but since it has been used in a similar context of investigating the relationship of a particular protein to disease (albeit a different organism) in the past, and is computationally easy since we are only operating on a relatively very small network, we decided to proceed with it.

Four different variants of Random Walks on the neighbours of our interested proteins were tried, but the three latter variants produced no meaningful results; in fact, they concluded with the same probability value assigned to every node in the small network. Only the first variant, which considered the immediate neighbourhood of only one interested protein at a time, actually produced a meaningful distribution of probabilities. The graphs of these are reproduced below.

![The probabilities resulting from a Random Walk in the immediate neighbourhood of PTP1](ImmediateLocalMarkovGraphs/PTP1.png)
![The probabilities resulting from a Random Walk in the immediate neighbourhood of PTP2](ImmediateLocalMarkovGraphs/PTP2.png)
![The probabilities resulting from a Random Walk in the immediate neighbourhood of PTP3](ImmediateLocalMarkovGraphs/PTP3.png)
![The probabilities resulting from a Random Walk in the immediate neighbourhood of FHL1](ImmediateLocalMarkovGraphs/FHL1.png)
![The probabilities resulting from a Random Walk in the immediate neighbourhood of FKH1](ImmediateLocalMarkovGraphs/FKH1.png)
![The probabilities resulting from a Random Walk in the immediate neighbourhood of FKH2](ImmediateLocalMarkovGraphs/FKH2.png)
![The probabilities resulting from a Random Walk in the immediate neighbourhood of HCM1](ImmediateLocalMarkovGraphs/HCM1.png)

*Figure 10 : Probability distributions by Random Walk for neighbours of Proteins in S. cerevisiae*

###### 5.2.2 Proteins in Shortest Paths

During preparation for the Markov procedure above, our group noticed that some proteins appeared in several of the shortest paths between our interested proteins. This suggested an investigation of how shortest paths would change if the most frequently occurring proteins in the shortest paths were deleted.
We therefore experimented with the effect of various nodes on shortest paths between our interested proteins. To this end, we wrote a program which prints out a shortest path between every possible pair of interested nodes. It gathered, for every node that appears in at least one path, the number of times that node appears. It then deleted the most common node, and investigated
* i) which nodes took the place of the deleted one
* ii) whether the shortest paths became longer due to the deletion

The experiment operated on 21 shortest paths; all possible pairs of the seven interested proteins. It proceeded by deleting the most frequently occurring node currently appearing in the set of shortest paths, checking if any path lengths had increased, and deleting the next most frequently occurring node.
The approach was tried several times, deleting up to three proteins from the network before restarting. Upon each restart a different protein was deleted first.

A protein with Stringdb designation 4932.YBR160W (CDK1) appeared extremely important in shortest paths. Initially it appeared in 15 out of the 21 paths, and deleting it increased the length of 9 of those paths from 3 nodes to 4 nodes (these lengths include endpoints). In other words, CDK1 was the only node adjacent to both proteins in 9 out of the 21 pairs. Although this indicated that CDK1 is extremely significant, it did not lead our group to new information. A protein being critical does not guarantee it is meaningful. Some proteins and protein communities are critically necessary to most ongoing processes because without them the cell breaks down; the fact that a particular protein appears often in relation to our interested proteins does not, on its own, tell us that anything about the function of the interested proteins. We concluded that CDK1 is one of these proteins that is generally critical across the cell, and therefore the fact that it is very important in relation to our interested proteins does not tell us anything useful about their function.

After the initial work on Local Random Walks and Shortest Paths, we mostly chose to ignore the 4 additional interested proteins FKH1, FKH2, HCM1, and FHL1, the better to focus on the PTP proteins that we knew to be central to our investigation. The later experimental work, outlined in the above section, therefore did not use these four.

[Back to the Table of Contents](#backtobeg)

<a id='bcmb2'></a>
## 6. Experiments: Biological Significance of the Proteins

*Experimental Plan Outline*

<img src="images/Fig11.png" style= "width:600px height:50px">

*Figure 11. Wip1 structure showing manganese ion-binding sites (orange) D105-G106, D314 and D366.*

**Rationale (Elaine and Madeline)**

We aimed to model the elevated expression of p38 caused by the type 1 diabetes associated PTPN22 by knocking out its homolog PTP2, thus removing a key inhibitor of p38's yeast homolog, Hog1, leading to its unregulated activity. The success of this would be measured by yeast cell death. Following this, the results of knocking out PTC1 would be tested to see if a similar result would be achieved. If so, it could then be implicated that Wip1 dysfunction would pose a similar effect on p38 as the PTPN22 variant, thus presenting it as a possible risk factor for type 1 diabetes susceptibility.

<img src="images/Fig12.png" style = "width: 1000px">


*Figure 12. Schematic diagram of human and yeast protein networks to model the pathogenesis of type 1 diabetes. Stress stimulates Hog1/p38 MAPK (yeast/human) signalling pathways and dysfunction of PTP2/PTPN22 R620W or PTC1/Wip1 leads to apoptosis in humans and yeast. Same-coloured boxes indicate homology, arrows represent activation and “T” represents inhibition.*

**Generation of PTC1 and PTP2 Knockout Strains: CRISPR-Cas9 (ELAINE)**

Using CRISPR-Cas9 genome editing, PTP2 and PTC1 knockout strains will be generated by transforming Saccharomyces cerevisiae with S. pyogenes-derived Cas9 and designed guide RNA expression plasmid vectors (Figure 15). Gene edits will be introduced at conserved metal ion binding sites in PTC1 and PTP2’s conserved active site to disrupt protein function. Co-transformation with a designed repair template containing gene disrupting edits will use homology-directed repair to introduce desired mutations into the yeast’s DNA (Figure 13, 14). Successful transformants will be screened for by selection for the plasmids’ URA3 marker on YC-Ura media and cultured for use in analysis.

<img src= "images/Fig13.png" style="width:800px height:70px">


*Figure 13. Designed guide RNA primer and repair template oligonucleotides for CRISPR-Cas9 mediated gene editing of PTC1 gene in Saccharomyces cerevisiae. Intended gene edits and mutations in red, targeting metal-ion binding site D58, G59 and protospacer adjacent motif (PAM).*

<img src="images/Fig14.png" style= "width:800px height:70px">

*Figure 14. Designed guide RNA primer and repair template oligonucleotides for CRISPR-Cas9 mediated gene editing of PTP2 in Saccharomyces cerevisiae. Intended gene edits and mutations in red targeting active site C666, C670 and protospacer adjacent motif (PAM).*

<img src="images/Fig15.png" style =  "height: 1000px; width: 1000px;">

*Figure 15. Cas9 and guide RNA expressing plasmid vector for yeast cell co-transformation with repair template to generate yeast cells expressing PTC1 and PTP2 gene edits. Cas9 nuclease derived from S. pyogenes which recognises protospacer adjacent motif (PAM) of 5’ NGG 3’. (Laughery 2019)*


**Analysis Protocols:**
**2-DE gels and MALDI-TOF mass spectrometry (David)**

2 dimensional electrophoresis will be used to differentiate between the proteomes of afflicted knock-out strains and the reference strain for the purpose of identifying differences in protein composition in the hopes of identifying proteins of interest. These yeast cells will be exposed to hyperosmotic stress (0.55M-0.9M NaCl), visualised using Coomassie Brilliant Blue and measured over a temporal frame. Differences in protein abundance and changes in post translational modifications (PTMs) over a period of time will help identify both experimentally relevant and novel proteins associated with apoptosis and their expression over time. To accomplish this, these 2-DE gels are imaged and compared against one another for notable proteins before identification via MALDI-TOF mass spectrometry. Using online databases such as uniprot and MASCOT a protein can be discerned via a series of identified peptides. 2-DE gels and MALDI-TOF was ultimately chosen over shotgun proteomics in the hopes of identifying potentially interesting changes in PTM expression between protein species as the yeast cells reacted to the hyperosmotic environment. Shotgun proteomics can still be used as a method for mass protein identification and abundance quantification as an alternative for MALDI-TOF if the need for a high throughput method arises.

**Flow Cytometry (Madeline)**

Flow cytometry will be used to analyse the yeast apoptotic markers caused by prolonged activation of the Hog1 pathway with PTP2 and PTC1 knockout strains upon exposure to hyperosmotic stress. To induce yeast apoptosis, cells will initially be exposed to either 0.55 M or 0.9 M NaCl. S. cerevisiae is known to tolerate maximum osmotic stress of 2 M NaCl (Lages et al. 1999).  Cells will then be co-stained with Annexin-V and propidium iodine (PI) to characterise yeast cells that are in: early apoptosis exhibiting phosphatidylserine externalization (AnnV+, PI-); early necrosis showing membrane permeabilization (AnnV-, PI+); and late apoptosis/necrotic cells showing both phosphatidylserine externalization and membrane permeabilization (AnnV+, PI+). Dihydroethidium (DHE) will also be used to assess ROS accumulation which is another marker for yeast apoptosis. ROS will oxidise DHE to ethidium which will be detected by the flow cytometer.

**Metabolomics (Madeline)**

Metabolomic studies will be conducted using proton nuclear magnetic resonance (1H NMR) spectroscopy to assess the effects of PTP2 and PTC1 knockout strains on the Hog1 signalling pathway in response to hyperosmotic stress. Cell cultures exposed to 0.9 M NaCl from the flow cytometry experiment will be used for analysis. The data obtained from 1H NMR spectra will first be analysed using Chenomx NMR Suite, followed by data normalization, scaling and multivariate analysis using MetaboAnalyst 4.0 software. Principal Component Analysis (PCA) will be used to select metabolites of interest and perform One-way Analysis of Variance (ANOVA) to identify significant metabolites. The metabolite profiles of PTP2 and PTC1 knockout strains will then be compared to identify common metabolites that may indicate yeast apoptosis.

[Back to the Table of Contents](#backtobeg)

<a id='final'></a>
## 7. Conclusions

**BCMB3888 Methods (Elaine and David)**

We initially set out to investigate the practicality of using Saccharomyces cerevisiae as a model organism in type 1 diabetes and potentially identify novel proteins involved in type 1 susceptibility. Collaboration between the mathematical and biochemical teams lead to the identification of Hog1 and PTC1 as significant pathogenic proteins associated with type 1 diabetes. Further bioinformatics research following the two identified yeast genes had then led to the identification of Wip1 as the human homologue for PTC1 and p38 for Hog1. Given the conservation of core apoptotic machinery between species, if the PTP2 and PTC1 knockout strains were to result in cell death as hypothesised, PTC1 dysfunction and thus human Wip1 dysfunction would be identified as a risk factor for uncontrolled cell death. Such a result would then have the potential to be linked to a susceptibility to autoimmunity, as the crux of many autoimmune diseases can be boiled down to the unmediated destruction of cells. If PTC1 dysfunction leads to cell death through uninhibited Hog1 activity, this can be translatable to Wip1 dysfunction leading to uninhibited p38 function. As the PTPN22 variant implicated in type 1 diabetes is known to cause autoimmunity through activation of p38, the effect of Wip1’s dysfunction would be comparable and thus could also be linked to type 1 diabetes. Therefore, If PTC1 knockout causes cell death as PTP2 does, Wip1 dysfunction could be implicated in type 1 diabetes susceptibility. These discoveries may warrant further investigation into this phenomena followed by an expansion in experimental design and scope pertaining to a more qualitative approach. Further exploratory research can also be conducted on Wip1 dysfunction and/or p38 inhibition within other auto-immune diseases given the significant overlap and importance of cell death within autoimmunity.



**Math3888 Methods (Hoon)**

A novel approach was undertaken to find meaningful proteins which lay in the pathways between PTP1,2 and 3, the yeast homologues of PTPN22. The procedure was based on finding significant clusters in the network, using Louvain Community Detection, identifying high centrality nodes, implementing the Random Walk Stationary Probability Algorithm and finally after identifying and deleting these high centrality nodes, a bi-directional breadth first search is used to identify the shortest paths between the target nodes PTP1 to PTP2 and PTP1 to PTP3 and to discover the important protein, YDL005W(PTC1).

However more computationally optimal algorithms could be implemented in all stages of the procedure. For example, our novel approach in identifying high centrality nodes, is extremely bare bones. However, recent innovations improve on random walks centrality, such as the random walk betweenness centrality score (Newman, 2005) based upon modelling the graph as an electrical system based upon Kirchhoff's first law of current conservation. Furthermore although Louvain Community Detection is an established and reliable community centrality method it is relatively old since it was devised in 2008. Many different forms of an Enhanced Louvain Community detection algorithms are constantly being introduced. These innovations could be brought to bear on our methods.

[Back to the Table of Contents](#backtobeg)

<a id='reference'></a>
## 8. Reference List

Blondel, V. Guillaume, J. Lambiotte, R. & Lefebvre, E. 2008, 'Fast unfolding of communities in large networks.' *Journal of Statistical Mechanics: Theory and Experiment*, 2008(10), p.P10008.

Büttner, S. Eisenberg, T. Herker, E. Carmona-Gutierrez, D. Kroemer, G. & Madeo, F. 2006, ‘Why yeast cells can undergo apoptosis: death in times of peace, love, and war’, *Journal of Cell Biology*, vol. 175, no. 4, pp. 521-5.

Cao, Y. Yang, J. Colby, K. Hogan, S. Hu, Y. Jennette, C. Berg, E. Zhang, Y. Jennette, J. Falk, R. & Preston, G. 2012,
<br>
‘High basal activity of the PTPN22 gain-of-function variant blunt leukocyte responsiveness negatively affecting IL-10 production in ANCA vasculitis’, *PLoS One*, vol. 7, no. 8.

Dijkstra, E. 1959, ‘A note on two problems in connexion with graphs’, *Numerische Mathematik*, vol.1, no.1, pp.269-271.

Dikkerboom, S. 2019, 'Random Walking', animation, *Bigger Tree*,
<br>
[online] Avaliable at: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2746-0 [Accessed 14 November 2020]

Fortunato, S. & Barthelemy, M. 2006, ‘Resolution limit in community detection’, *Proceedings of the National Academy of Sciences*,
<br>
[online] vol. 104, no.1, pp.36-41. Available at: https://www.pnas.org/content/104/1/36 [Accessed 16 November 2020]

Gottwald, G. 2020, 'Two proteins connected via some high degree node.' *Ed MATH3888 - Discussion*,
<br>
[online] Available at: https://edstem.org/courses/4735/discussion/332004 [Accessed 17 Nov. 2020]

Herker, E. Jungwirth, H. Lehmann, K. Maldener, C. Fröhlich, K. Wissing, S. Büttner, S. Fehr, M. Sigrist, S. & Madeo, F. 2004,
<br>
‘Chronological aging leads to apoptosis in yeast’, *Journal of Cell Biology*, vol. 164, no. 4, pp. 501-7.

Lages, F. Silva-Graça, M. & Lucas, C. 1999, ‘Active glycerol uptake is a mechanism underlying halotolerance in yeasts: a study of 42 species’, *Microbiology (Reading, England)*, vol. 145 (Pt 9), no. 9, pp. 2577-2585.

Laughery, M. & Wyrick, J. 2019, 'Simple CRISPR‐Cas9 Genome Editing in Saccharomyces cerevisiae.' *Current protocols in molecular biology*, vol 129, no. 1, p.e110.

Lu, X. Nannenga, B. & Donehower, L.A. 2005,
<br>
'PPM1D dephosphorylates Chk1 and p53 and abrogates cell cycle checkpoints.', *Genes & development*, vol 19, no. 10, pp.1162-1174.

Ludovico, P. Rodrigues, F. Almeida, A. Silva, M. Barrientos, A. & Côrte-Real, M. 2002,
<br>
‘Cytochrome c release and mitochondria involvement in programmed cell death induced by acetic acid in Saccharomyces cerevisiae’, *Molecular Biology of Cell*, vol. 13, no. 8, pp. 2598-606.

Maddi, A. & Eslahchi, C. 2017, 'Discovering overlapped protein complexes from weighted PPI networks by removing inter-module hubs.' *Scientific Reports*, 7(1).

Madeo F. Fröhlich E. & Fröhlich K. 1997, ‘A yeast mutant showing diagnostic markers of early and late apoptosis’, *The Journal of Cell Biology*, vol. 139, no. 3, pp. 729-734.

Madeo, F. Fröhlich, E. Ligr, M. Grey, M. Sigrist, S. Wolf, D. & Fröhlich, K. 1999, ‘Oxygen stress: a regulator of apoptosis in yeast’, *Journal of Cell Biology*, vol. 145, no. 4, pp.757-67.

Madeo, F. Herker, E. Maldener, C. Wissing, S. Lächelt, S. Herlan, M. Fehr, M. Lauber, K. Sigrist, S. Wesselborg, S. & Fröhlich, K. 2002,
<br>
‘A caspase-related protease regulates apoptosis in yeast’, *Molecular Cell,* vol. 9, no. 4, pp. 911-7.

Mattison, C. & Ota, I. 2000, ‘Two protein tyrosine phosphatases, Ptp2 and Ptp3, modulate the subcellular localization of Hog1 MAP kinase in yeast’, *Genes Dev*, vol. 14, no. 10, pp. 1229-1235.

Menard, L. Saadoun, D. Isnard, I. Ng, Y. Meyers, G. Massad, C. Price, C. Abraham, C. Motaghedi, R. Buckner, J. Gregersen, P. & Meffre, E. 2011,
‘The PTPN22 allele encoding an R620W variant interferes with the removal of developing autoreactive B cells in humans’, *J Clin Invest*, vol. 121 no. 9 pp. 3635-3644.

Newman, M. 2005, 'A measure of betweenness centrality based on random walks.' *Social Networks* 27, pp39-54

Peng, W., Wang, J., Zhang, Z. & Wu, F., 2016. ‘Applications of Random Walk Model on Biological Networks’, *Current Bioinformatics*, vol.11, no.2, pp.211-220.

Peng, Y. Scott, P. Tao, R. Wang, H. Wu, Y. & Peng, G. 2017. 'Dissect the Dynamic Molecular Circuits of Cell Cycle Control through Network Evolution Model.' *BioMed Research International*,
<br>
[online] Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5390606/ [Accessed 24 Sept. 2020].

Rahiminejad, S. Maurya, M. & Subramaniam, S. 2019. ‘Topological and functional comparison of community detection algorithms in biological networks’ *BMC Bioinformatics*,
<br>
[online] vol.20, no.1, Available at:https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2746-0 [Accessed 14 November 2020]

Rodriguez-Pena, J.M. Garcia, R. Nombela, C. & Arroyo, J. 2010,
<br>
‘The high-osmolarity glycerol (HOG) and cell wall integrity (CWI) signalling pathways interplay: a yeast dialogue between MAPK routes’, *Yeast* vol. 27, no. 8, pp. 495-502.

Takekawa, M. Adachi, M. Nakahata, A. Nakayama, I. Itoh, F. Tsukuda, H. Taya, Y. & Imai, K. 2000. 'p53‐inducible Wip1 phosphatase mediates a negative feedback regulation of p38 MAPK‐p53 signaling in response to UV radiation.'
<br>
*The EMBO journal*, vol 19, no. 23, pp.6517-6526.

Tautz, L. Critton, D. & Grotegut, S. 2013, ‘Protein tyrosine phosphatases: structure, function, and implication in human disease’, *Methods in Molecular Biology*, vol. 1053, pp. 179-221.

Tomita, T., 2017, ‘Apoptosis of pancreatic β-cells in Type 1 diabetes’, *Bosn J Basic Med Sci*, vol. 17 no. 3 pp. 183-193.

Wang, H. & Peng, G. 2013, 'Mathematical model of dynamic protein interactions regulating p53 protein stability for tumor suppression.' *Computational and Mathematical Methods in Medicine*,
<br>
[online] Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3888710/ [Accessed 24 Sept. 2020]

Yu, X. Chen, M. Zhang, S. Yu, ZH. Sun, JP. Wang, L. Liu, S. Imasaki, T. Takagi, Y. & Zhang, Z. 2011,
<br>
‘Substrate specificity of lymphoid-specific tyrosine phosphatase (Lyp) and identification of Src kinase-associated protein of 55 kDa homolog (SKAP-HOM) as a Lyp substrate’,
<br>
*The Journal of Biological Chemistry*, vol. 286, no. 35, pp. 30526-34.

[Back to the Table of Contents](#backtobeg)