## **Assignment 2 - Systems Biology**
### **Hugo Manuel Alves Henriques e Silva, hugoalv@student.chalmers.se**

#### **How to run:**
`Run the cells below in order. Make sure to have the datasets in the same folder as the notebook.`

## **Question 2**

### **Procedure**

- Find every possible path for every single product
  
- Find all the non essential enzymes for each product
  
- Mash all the non essential enzymes together
  
- Find the intersection of all the non essential enzymes (an enzyme may be essential for one product but not for another)
  
- Remove the non essential enzymes from the list of all enzymes and get the **ESSENTIAL** ones

#### **Import dataset**

In [61]:
import pandas as pd
df = pd.read_csv('ccm.csv')
df.head()

Unnamed: 0,From,To,Enzyme
0,Glucose [c],Glucose-6-phosphate [c],Enzyme1
1,Glucose-6-phosphate [c],Fructose-6-phosphate [c],Enzyme2
2,Fructose-6-phosphate [c],Glucose-6-phosphate [c],Enzyme2
3,Fructose-6-phosphate [c],Fructose-1-6-phosphate [c],Enzyme3
4,Fructose-1-6-phosphate [c],Dihydroxyacetone phosphate [c],Enzyme4


#### **Build network and other helpful sets**

In [62]:
network = {}

for i in range(len(df)):

    from_metabolite = df.iloc[i]['From']
    to_metabolite = df.iloc[i]['To']
    enzyme = df.iloc[i]['Enzyme']
    
    if from_metabolite not in network:
        network[from_metabolite] = [(enzyme, to_metabolite)]
    else:
        network[from_metabolite].append((enzyme, to_metabolite))

enzymes = []
for metabolite in network:
    for enzyme in network[metabolite]:
        enzymes.append(enzyme[0])
enzymes = list(set(enzymes))
print(enzymes)

# endpoints are "to" elements in the dataframe without [...] in the end of the string
endpoints = []
for i in range(len(df)):
    if df.iloc[i]['To'][-1:] != ']':
        endpoints.append(df.iloc[i]['To'])
endpoints = list(set(endpoints))
print(endpoints)

['Enzyme33', 'TKT', 'Enzyme16', 'Enzyme3', 'Enzyme26', 'Enzyme34', 'Enzyme35', 'PseudoEnzymes', 'Enzyme9', 'Enzyme24', 'Enzyme30', 'Enzyme17', 'Enzyme31', 'Enzyme14', 'Enzyme19', 'Enzyme6', 'Enzyme2', 'Enzyme4', 'Enzyme25', 'Enzyme7', 'Enzyme11', 'Enzyme10', 'Enzyme32', 'Enzyme18', 'Enzyme5', 'Enzyme13', 'Enzyme23', 'Enzyme12', 'Enzyme28', 'Enzyme8', 'Enzyme27', 'Transporter2', 'Enzyme20', 'TALDO', 'Enzyme21', 'Enzyme1', 'Enzyme22', 'Enzyme29', 'Transporter1', 'Enzyme15']
['Nucleotides', 'Glutamate', 'Glycine', 'Aspartate', 'Serine', 'Alanine', 'Asparagine', 'Glutamine', 'Cysteine', 'Fatty acids']


#### **Search Algorithm**

In [63]:
def find_paths(metabolite, path, paths, visited):
    # create a new visited list
    visited = visited.copy()
    if (metabolite in visited) and (visited[metabolite]):
        return
    if metabolite in endpoints:
        if metabolite not in paths:
            paths[metabolite] = [path]
        else:
            paths[metabolite].append(path)
    elif metabolite not in network:
        return
    else:
        if visited[metabolite]:
            return

            
        for enzyme, product in network[metabolite]:
            visited[metabolite] = True
            find_paths(product, path + [(enzyme, product)], paths, visited)

#### **Filtering Algorithm for non-essential enzymes for each endpoint**

In [64]:
def find_non_essential_enzymes(paths):
    non_used_enzymes_for_endpoint = {}
    for endpoint in paths:
        #find intersection of all the enzymes used in the paths to the endpoint
        essential_enzymes = []
        for element in paths[endpoint][0]:
            essential_enzymes.append(element[0])

        for path in paths[endpoint]:
            new_essential_enzymes = []

            for element in path:
                new_essential_enzymes.append(element[0])

            essential_enzymes = list(set(essential_enzymes).intersection(set(new_essential_enzymes)))

        non_used_enzymes_for_endpoint[endpoint] = list(set(enzymes) - set(essential_enzymes))
    return non_used_enzymes_for_endpoint

#### **Filtering Algorithm for non-essential enzyme for all endpoints**

In [65]:
def find_all_non_essential_enzymes(non_used_enzymes_for_endpoint):
    all_non_used_enzymes = []

    for endpoint in non_used_enzymes_for_endpoint:
        all_non_used_enzymes += non_used_enzymes_for_endpoint[endpoint]
    all_non_used_enzymes = list(set(all_non_used_enzymes))

    for endpoint in non_used_enzymes_for_endpoint:
        for e in all_non_used_enzymes:
            if e not in non_used_enzymes_for_endpoint[endpoint]:
                all_non_used_enzymes.remove(e)
    return all_non_used_enzymes

#### **Application of all algorithms with our dataset**

In [66]:
paths = {}

visited = {}
for metabolite in network:
    visited[metabolite] = False

find_paths('Glucose [c]', [], paths, visited)

non_essential_enzymes_for_endpoint = find_non_essential_enzymes(paths)

non_essential_enzymes = find_all_non_essential_enzymes(non_essential_enzymes_for_endpoint)


print("all non essential enzymes: ", non_essential_enzymes)
print("all essential enzymes are: ", sorted(list(set(enzymes) - set(non_essential_enzymes))))

all non essential enzymes:  ['Enzyme33', 'TKT', 'Enzyme3', 'Enzyme26', 'Enzyme34', 'Enzyme35', 'Enzyme24', 'Enzyme30', 'Enzyme17', 'Enzyme31', 'Enzyme14', 'Enzyme19', 'Enzyme25', 'Enzyme2', 'Enzyme4', 'Enzyme7', 'Enzyme11', 'Enzyme32', 'Enzyme18', 'Enzyme5', 'Enzyme13', 'Enzyme23', 'Enzyme12', 'Enzyme28', 'Enzyme27', 'Transporter2', 'Enzyme20', 'TALDO', 'Enzyme21', 'Enzyme22', 'Enzyme29', 'Transporter1', 'Enzyme15']
all essential enzymes are:  ['Enzyme1', 'Enzyme10', 'Enzyme16', 'Enzyme6', 'Enzyme8', 'Enzyme9', 'PseudoEnzymes']


**The essential enzymes are Enzyme1, Enzyme10, Enzyme16, Enzyme6, Enzyme8, Enzyme9, PseudoEnzymes.**