## **Assignment 2 - Systems Biology**
### **Hugo Manuel Alves Henriques e Silva, hugoalv@student.chalmers.se**

#### **How to run:**
`Run the cells below in order. Make sure to have the datasets in the same folder as the notebook.`

## **Question 1**

### **Data & Network Building**

- The data consists of a dictionary where the key is the name of the metabolite and the value is a list of tuples where the first element is 
  
  the name of the enzyme that catalyzes the reaction and the second element is the resulting metabolite. It emulates a graph where the nodes 
  
  are the metabolites and the edges are the reactions catalyzed by the enzymes.

In [19]:
network = {

    'Glucose': [('Hexokinase', 'Glucose-6-phosphate')],

    'Glucose-6-phosphate': [('Phosphoglucose isomerase', 'Fructose-6-phosphate')],

    'Fructose-6-phosphate': [('Phosphofructokinase 1', 'Fructose-1,6-bisphosphate')],

    'Fructose-1,6-bisphosphate': [

        ('Aldolase', 'Glyceraldehyde 3-phosphate'),

        ('Aldolase', 'Dihydroxyacetone phosphate')

    ],

    'Dihydroxyacetone phosphate': [('Triosephosphate isomerase', 'Glyceraldehyde 3-phosphate')],

    'Glyceraldehyde 3-phosphate': [('Glyceraldehyde 3-phosphate dehydrogenase', '1,3-Bisphosphoglycerate')],

    '1,3-Bisphosphoglycerate': [('Phosphoglycerate kinase', '3-Phosphoglycerate')],

    '3-Phosphoglycerate': [('Phosphoglyceromutase', '2-Phosphoglycerate')],

    '2-Phosphoglycerate': [('Enolase', 'Phosphoenolpyruvate')],

    'Phosphoenolpyruvate': [('Pyruvate kinase', 'Pyruvate')]

}

enzymes = [
    
    'Hexokinase',

    'Phosphoglucose isomerase',

    'Phosphofructokinase 1',

    'Aldolase',

    'Triosephosphate isomerase',

    'Glyceraldehyde 3-phosphate dehydrogenase',

    'Phosphoglycerate kinase',

    'Phosphoglyceromutase',

    'Enolase',

    'Pyruvate kinase'
    
]

endpoints = ["Pyruvate"]

### **Search & Filtering Algorithms**

*Note: We only have 1 endpoint in this example, but a generalized algorithm was used.*

- Search algorithm where all paths to endpoints are going to be explored - finds all paths to endpoints.

- Filtering algorithm for non-essential enzymes for each endpoint - Finds the enzymes which are not used in every path to the endpoint.

- Filtering algorithm for non-essential enzymes for all endpoints - Finds the intersection of non-essential enzymes for each endpoint.

In [20]:
def find_paths(metabolite, path, paths, visited):
    visited = visited.copy()
    if (metabolite in visited) and (visited[metabolite]):
        return
    if metabolite in endpoints:
        if metabolite not in paths:
            paths[metabolite] = [path]
        else:
            paths[metabolite].append(path)
    elif metabolite not in network:
        return
    else:
        if visited[metabolite]:
            return

            
        for enzyme, product in network[metabolite]:
            visited[metabolite] = True
            find_paths(product, path + [(enzyme, product)], paths, visited)

def find_non_essential_enzymes(paths):
    non_used_enzymes_for_endpoint = {}
    for endpoint in paths:
        #find intersection of all the enzymes used in the paths to the endpoint
        essential_enzymes = []
        for element in paths[endpoint][0]:
            essential_enzymes.append(element[0])

        for path in paths[endpoint]:
            new_essential_enzymes = []

            for element in path:
                new_essential_enzymes.append(element[0])

            essential_enzymes = list(set(essential_enzymes).intersection(set(new_essential_enzymes)))

        non_used_enzymes_for_endpoint[endpoint] = list(set(enzymes) - set(essential_enzymes))
    return non_used_enzymes_for_endpoint

def find_all_non_essential_enzymes(non_used_enzymes_for_endpoint):
    all_non_used_enzymes = []

    for endpoint in non_used_enzymes_for_endpoint:
        all_non_used_enzymes += non_used_enzymes_for_endpoint[endpoint]
    all_non_used_enzymes = list(set(all_non_used_enzymes))

    for endpoint in non_used_enzymes_for_endpoint:
        for e in all_non_used_enzymes:
            if e not in non_used_enzymes_for_endpoint[endpoint]:
                all_non_used_enzymes.remove(e)
    return all_non_used_enzymes



### **Results**

Use the previously defined algorithms to find all the paths to our endpoint "*Pyruvate*" and filter the non-essential enzymes.

In [21]:
paths = {}

visited = {}
for metabolite in network:
    visited[metabolite] = False

find_paths('Glucose', [], paths, visited)

non_essential_enzymes_for_endpoint = find_non_essential_enzymes(paths)

non_essential_enzymes = find_all_non_essential_enzymes(non_essential_enzymes_for_endpoint)

print(non_essential_enzymes)

['Triosephosphate isomerase']


### The non essential enzymes are:

**Triosephosphate isomerase**

