## Script for graph pruning

Importing needed libraries and modules

In [415]:
import os
os.getcwd()
print(os.getcwd())
import importlib
import create_graph as pruning
importlib.reload(pruning)
import pandas as pd


/Users/martinpycha/Desktop/Job_AV/metabolomic_optimization


Input file must be provided in the *.graphml* format for the script to work properly.
The location of the file is used as an input for the algorithm.
* As the first step, the .graphml file is parsed, molecules and reactions are converted into objects.
* The function prepare_mols_reacs also provides all the molecules, reactions, initial reactants and final products.
    * The initial reactants are the molecules, which are not products of any reaction and therefore must have been added to the system.
    * The final products are the molecules, which are not reactants of any reactions and therefore must leave the system. 

In [416]:
# "./threepath.graphml" is an address relative to the current directory (directory of this script)
# path to the source graphml file - the graph to be pruned

#INPUT_PATH_GRAPHML = "./assets/input/PalPaoSteOle_regular.graphml"
#INPUT_PATH_GRAPHML = "./assets/input/PPSO_v2.graphml"
INPUT_PATH_GRAPHML = "PPSO_v2_threepath.graphml"
# TOTO SE MENI S KAZDYM DALSIM OREZANIM
#INPUT_PATH_GRAPHML = "./assets/input/threepath_PPSO_T40.graphml"
molecules, reactions, first_reactants, final_products, _ = pruning.prepare_mols_reacs(INPUT_PATH_GRAPHML)


Number of parsed reactions: 1218


#### Running the algorithm


Now, the algorithm class can be initiated.

**1. Keeping the bare minimum** (basic_pruning)
* For every initial reactant, the 'outcoming' reaction with the highest value is added.
* For every final product, the 'incoming' reaction with the highest value is added.
* For every molecule, which is neither initial reactant nor final product, both 'incoming' and 'outcoming' reactions with the highest values are added.

**2. Ensuring connectivity** (connecting)
* This step ensures, that the resulting graph is connected, since the previous step could result in a bigger number of 'subgraphs' (metabolic 'subnetworks'), which are not connected by any reaction
* During this step, the reaction with the highest value is iteratively added between two 'subnetworks', ensuring the connectivity of the map/graph.

**3. Adding all the reactions beyond threshold.** (adding_beyond_threshold)
* The reactions so far keep the basic logic of the metabolomic path - from every initial reactant, you can get to the final product and the metabolomic map is one big graph. However, there might be reactions with very high values, which might not have been included in the map in the past stages because of the logic of the algorithm.
* In this step, all reactions beyond maximum of x % of reactions are removed (example 60%).

In [417]:

pruning_algorithm = pruning.Pruning(
                        reactions=reactions, 
                        first_reactants=first_reactants, 
                        final_products=final_products, 
                        molecules=molecules,
                        remove_irrelevant_pre=True,
                        basic_pruning=True,
                        connecting=True,
                        adding_beyond_treshold=True,
                        threshold=0.4,
                        proportion=False
                    )

# running the algorithm
pruning_algorithm.run()


Number of sorted reactions START: 0
Number of pruned reactions START: 0
Number of sorted reactions AFTER SORTING: 1218
Number of pruned reactions AFTER SORTING: 0
----------------------------------------------------
WE ARE NOT REMOVING -- MOLECULE IS TOO IMPORTANT!!!!
Molecule in question: TG123SteOlePal
----------------------------------------------------
----------------------------------------------------
WE ARE NOT REMOVING -- MOLECULE IS TOO IMPORTANT!!!!
Molecule in question: TG123PaoSteSte
----------------------------------------------------
----------------------------------------------------
WE ARE NOT REMOVING -- MOLECULE IS TOO IMPORTANT!!!!
Molecule in question: TG123StePaoPao
----------------------------------------------------
REMOVING: TG123StePalPao.pre -> TG123StePalPao val: 100.000342925506
REMOVING: TG123StePalPao -> Sink val: 100.000000000001
the reac.target is source 
REMOVING related eq: TG123StePalPao -> DG23PalPao + Ste val: 0.000335351121236726
the reac.target 

#### Saving the outputs


The *save_result_graphml* alters the original reaction .graphml file and saves the resulting .graphml file into the desired location. Therefore, both the file address of the original file and the desired file path of the new file must be entered. Part of the name can be determined by *name* argument.

The *save_result_txt* alters the original reaction .txt file and saves the resulting .txt file into the desired location. The information about the arguments is above.

In [None]:
# path to the original reactions - reactions to be selected
#INPUT_PATH_TXT = "./assets/input/PalPaoSteOle_regular_new.txt"

INPUT_PATH_TXT = "PalPaoSteOle_regular_new.txt"
# path to the output, where both the resulting .graphml file and .txt file is stored
#OUTPUT_PATH = "./assets/output"

OUTPUT_PATH_TXT = "PPSO_v2"
OUTPUT_PATH_GRAPHML = "PPSO_v2"

# saving the result into .graphml
pruning.save_result_graphml(INPUT_PATH_GRAPHML, OUTPUT_PATH_GRAPHML, pruning_algorithm, remove_nodes=True, name="PPSO_reducedPre_(conditions)")
# saving the result into .txt 
pruning.save_result_txt(INPUT_PATH_TXT, OUTPUT_PATH_TXT, pruning_algorithm, name="PPSO_reducedPre_(conditions)")

Removed 419 edges and 6 nodes.
Number of reactions parsed from .txt file: 770
Number of reactions in reactions dict: 770
Length of pruned reactions: 799
Length of equation to write: 542
Total lines: 542
Succesfully written the result into the file!
