# Tutorial

## Set Up & Data Extraction

We start by importing the necessary classes and modules from the PyMetaTree project. We then initialize the `DataHandler` class by providing the desired data storage folder.


In [None]:
from pymetatree.data_handling.data_handler import EawagDataHandler

DATA_STORAGE_PATH = 'your_data_storage_path'

data_handler = EawagDataHandler(DATA_STORAGE_PATH)

The data handler plays several roles. In particular, we can use it to extract data from different sources. We here display how to extract ten reactions from EAWAG's soil database. 

In [None]:
PACKAGE_NAME = "eawag_bbd"
data_handler.download_data(PACKAGE_NAME)
eawag_data = data_handler.get_data()
eawag_data[0].model_dump()

As can be seen above, a chemical reaction object contains several fields:
* `dataset`: The origin dataset, i.e., EAWAG_SOIL, ELESVIER, etc. 
* `description`: A description of the reaction, if provided.
* `enzyme_classes`: A list of enzyme classes, if provided.
* `mapped_smiles`: The reaction smiles, atom-to-atom mapped. This field is filled after running an external software.
* `multistep_flag`: True, if the reaction contains several steps which couldn't be determined. 
* `name`: The name of the reaction.
* `namerxn_reaction_class` and `namerxn_reaction_numbers`: Fields which classify the reaction, filled by an external software.
* `pathways`: 
* `reactants`: List of `Molecule` objects, which represent the individual reactant molecules and contain as fields `name`, `smiles` (unmapped), and `uid`, which is an automatically-computed unique identifier for this molecule.
* `products`: See above
* `scenarios`: List of experimental conditions, if provided.
* `template`: The associated, extracted template (more explanation on this field later in the tutorial).
* `uid`: Unique identifier of the chemical reaction.
* `unmapped_smiles`: The string provided by the dataset in a raw (unmapped) format.
* `unmapped_smiles_cannonicalized`: An automatically-computed field, representing the same reaction in a cannonical format. 

We can then simply save the data by writing the file name and running the following command:

In [None]:
file_name = "eawag_data_bbd.json"
data_handler.save_data(file_name)

To load back the data, one can run:

In [None]:
from pymetatree.data_handling.data_handler import EawagDataHandler

data_handler = EawagDataHandler(DATA_STORAGE_PATH)

data_handler.load_data(["eawag_data_soil.json"])
data_handler.eawag_data[0].model_dump()

Note that the argument of the data loader is a list. The reason being that if more than one file is provided, it will automatically merge the datasets together.

## Dataset Mapping

As we can see, the data is saved as a list of `ChemicalReaction` objects. To map the SMILES in the data set, one needs to generate a list which can then be run externally and incorporated back into our system. To do so, we can do the following:

In [None]:
list_to_map = data_handler.get_list_to_map()
list_to_map[0]

This will save the list to map:

In [None]:
data_handler.save_list_to_map("soil_to_map.json")

When the mapping is done, we will now be able to incorporate it back into our original dataset as follow:

In [None]:
data_handler.append_mapped_list("mapped_list.json")
mapped_data = data_handler.get_data()
mapped_data[0].model_dump()

## Template Extraction

In [None]:
from pymetatree.data_handling.data_handler import EawagDataHandler


data_handler_2 = EawagDataHandler(DATA_STORAGE_PATH)
data_handler_2.load_data(["eawag_data_for_mapping.json"])
data_handler_2.eawag_data[0].model_dump()

In [None]:
data_handler_2.append_mapped_list("eawag_mapped.json", "json")

In order to extract the templates associated with each reaction, one can simply run the following line of code:

In [None]:
data_handler_2.extract_templates()
data_with_templates = data_handler_2.get_data()
data_with_templates[0].model_dump()

In [None]:
from pymetatree.chemoinformatics.functions import rdrxn_from_string

test_template = data_with_templates[5].template.template_fwd_smarts
rdrxn_from_string(test_template, 'smarts')

## Blueprints

In [None]:
from pymetatree.blueprint.blueprint_handler import BlueprintHandler

reaction_for_bp = data_with_templates[0]
bp = BlueprintHandler(chemical_reaction=reaction_for_bp)

In [None]:
bp.blueprint.model_dump()

In [None]:
from rdkit import Chem

Chem.MolFromSmiles(reaction_for_bp.reactants[0].smiles)

In [None]:
bp.activate_template(0, 'forward')
bp._rdrxn

In [None]:
bp.run_reaction(0, 'backward', [reaction_for_bp.products[0].smiles])

# Substructure Search

In [None]:
from pymetatree.data_handling.data_handler import EawagDataHandler

data_handler_bp = EawagDataHandler(DATA_STORAGE_PATH)
data_handler_bp.load_data(['data_with_templates.json'])
data_handler_bp.eawag_data[0].model_dump()

In [None]:
data_handler_bp.generate_blueprints()
data_handler_bp.blueprints[0]

In [None]:
from pymetatree.blueprint.substructure_search import BlueprintSubstructureSearch

sub_search = BlueprintSubstructureSearch(data_handler_bp.blueprints)

In [None]:
# search common substructures to acetone
sub_search.search('CC(=O)C')