#### Script for graph pruning

Importing needed libraries and modules

In [216]:
import os
os.getcwd()
print(os.getcwd())
import importlib
import create_graph as pruning
importlib.reload(pruning)
import pandas as pd

/Users/martinpycha/Desktop/Job_AV/metabolomic_optimization


Input file must be provided in the *.graphml* format for the script to work properly.
The location of the file is used as an input for the algorithm.
* As the first step, the .graphml file is parsed, molecules and reactions are converted into objects.
* The function prepare_mols_reacs also provides all the molecules, reactions, initial reactants and final products.
    * The initial reactants are the molecules, which are not products of any reaction and therefore must have been added to the system.
    * The final products are the molecules, which are not reactants of any reactions and therefore must leave the system. 

In [217]:
# "./threepath.graphml" is an address relative to the current directory (directory of this script)
# path to the source graphml file - the graph to be pruned
INPUT_PATH_GRAPHML = "./assets/input/PalPaoSteOle_regular.graphml"
molecules, reactions, first_reactants, final_products = pruning.prepare_mols_reacs(INPUT_PATH_GRAPHML)


Number of parsed reactions: 1234


#### Running the algorithm


Now, the algorithm class can be initiated.

**1. Keeping the bare minimum** (basic_pruning)
* For every initial reactant, the 'outcoming' reaction with the highest value is added.
* For every final product, the 'incoming' reaction with the highest value is added.
* For every molecule, which is neither initial reactant nor final product, both 'incoming' and 'outcoming' reactions with the highest values are added.

**2. Ensuring connectivity** (connecting)
* This step ensures, that the resulting graph is connected, since the previous step could result in a bigger number of 'subgraphs' (metabolic 'subnetworks'), which are not connected by any reaction
* During this step, the reaction with the highest value is iteratively added between two 'subnetworks', ensuring the connectivity of the map/graph.

**3. Adding all the reactions beyond threshold.** (adding_beyond_threshold)
* The reactions so far keep the basic logic of the metabolomic path - from every initial reactant, you can get to the final product and the metabolomic map is one big graph. However, there might be reactions with very high values, which might not have been included in the map in the past stages because of the logic of the algorithm.
* In this step, all reactions beyond maximum of x % of reactions are removed (example 60%).

In [218]:

pruning_algorithm = pruning.Pruning(
                        reactions=reactions, 
                        first_reactants=first_reactants, 
                        final_products=final_products, 
                        molecules=molecules,
                        basic_pruning=True,
                        connecting=True,
                        adding_beyond_treshold=True,
                        threshold=0.90
                    )

# running the algorithm
pruning_algorithm.run()


Basic pruning has been conducted.
Number of reactions kept: 271, which is 21.96% of all reactions.
Connectivity has been ensured.
Number of reactions kept: 299, which is 24.23% of all reactions.
All reactions beyond threshold 0.9 have been added.
Number of reactions kept: 342, which is 27.71% of all reactions.


#### Saving the outputs


The *save_result_graphml* alters the original reaction .graphml file and saves the resulting .graphml file into the desired location. Therefore, both the file address of the original file and the desired file path of the new file must be entered. Part of the name can be determined by *name* argument.

The *save_result_txt* alters the original reaction .txt file and saves the resulting .txt file into the desired location. The information about the arguments is above.

In [219]:
# path to the original reactions - reactions to be selected
INPUT_PATH_TXT = "./assets/input/PalPaoSteOle_regular.txt"
# path to the output, where both the resulting .graphml file and .txt file is stored
OUTPUT_PATH = "./assets/output"

# saving the result into .graphml
pruning.save_result_graphml(INPUT_PATH_GRAPHML, OUTPUT_PATH, pruning_algorithm, name="PalPaoSteOle_regular")
# saving the result into .txt 
pruning.save_result_txt(INPUT_PATH_TXT, OUTPUT_PATH, pruning_algorithm, name="PalPaoSteOle_regular")

283
Succesfully written the result into the file!
