# Environment setting
[Conda](https://www.anaconda.com/docs/getting-started/miniconda/main) is recommended to handle packages in a virtual environment. Required packages are listed in `_env/requirements.yml`.

# Build models
Run the command to make QSAR models. 

The results will be saved `SVR-PK/outputs/prediction_level1_augmented` (for product-based splitting with data augmentation) folder with target IDs as subfolders.

In each subfolder, you can find these files: (Fix this:bug: :exclamation:)
- `mod.pickle`                                : Pickle encompassing the constructed model
- `prediction_results_prd_test(train).tsv`    : Predicted value for each sample by SVR-baseline models (i.e. prediction from product)
- `prediction_results_rct_test(train).tsv`    : Predicted value for each sample by SVR-PK, -SK and -concatECFP models (i.e. prediction from reactant pair)
- `prediction_score_prd_test(train).tsv`      : Prediction accuracy of SVR-baseline models
- `prediction_score_rct_test(train).tsv`      : Prediction accuracy of SVR-PK, -SK and -concatECFP models

In [None]:
# Product-based splitting
# ! python build_model.py -c config/chembl_config_lv1.json

# Product-based splitting with data augmentation
! python build_model.py -c config/chembl_config_lv1_augment.json

# Reactant-based splitting
# ! python build_model.py -c config/chembl_config_lv2.json

# Reactant-based splitting with data augmentation
# ! python build_model.py -c config/chembl_config_lv2_augment.json

# Screen reactants and combine by generated models
Reactant screening using built SVR-PK (and SVR-baseline) models. Before screening, you should decide a virtual reaction and write it on your configuration file (see 0. Configuration in README.md).

This screened results are stored in `SVR-PK/outputs/reactant_combination_level1_augmented_10000_rc1000` (for 1000 samples for each reactant)

In each subfolder, you can find these files:
- `{chembl_id}_{reaction_id}_rct(1,2)_candidates_selected_whole.tsv`: Reactant candidates (random sampled)
- `{chembl_id}_{reaction_id}_rct(1,2)_candidates_selected_kernel_whole.tsv`: Kernel matrix of reactant candidates (random sampled)
- `ok_combinations.tsv`: Index of reactant pairs that prediction is exceeded the threshold determined by `ext_ratio`
- `{chembl_id}_{reaction_id}_rct_candidates_pairs_whole_sparse_split_highscored.tsv`: Upper n_samples * 100 of predicted reactant pairs (predicted by SVR-PK)
- `{chembl_id}_{reaction_id}_rct_candidates_pairs_whole_sparse_split_retrieved.tsv`: Upper n_samples of predicted reactant pairs (predicted by SVR-baseline), also the invalid molecules were removed (see the Synthesizability of virtual molecules section of paper)
- `{chembl_id}_{reaction_id}_rct_candidates_pairs_whole_sparse_split_retrieved_route.tsv`: Samples for which the reactant pairs match the output of the retrosynthesis

In [None]:
# Sample 1k reactants for each
! python reactant_screening.py -c config/chembl_config_for_screening_1k.json

# Sample 10k reactants for each
# ! python reactant_screening.py -c config/chembl_config_for_screening_10k.json

# Sample 100k reactants for each
# ! python reactant_screening.py -c config/chembl_config_for_screening_100k.json

## Thompson sampling for comparison
To screen reactants using the sampling, run the following command.

The screened results are also stored in `SVR-PK/outputs/reactant_combination_level1_augmented_10000_rc1000` (for 1000 samples for each reactant)

In each subfolder, you can find these files:
- `ts_results.csv`: `n_samples` of Thompson sampling results
- `ts_results_valid.tsv`: Invalid molecules were removed from `ts_results.csv` (see the Synthesizability of virtual molecules section of paper)
- `ts_results_valid_route.tsv`: Samples for which the reactant pairs match the output of the retrosynthesis

In [None]:
# Sample 1k reactants for each
! python reactant_screening_by_TS.py -c config/chembl_config_for_screening_1k.json

# Sample 10k reactants for each
# ! python reactant_screening_by_TS.py -c config/chembl_config_for_screening_10k.json

# Sample 100k reactants for each
# ! python reactant_screening_by_TS.py -c config/chembl_config_for_screening_100k.json

# Analyze results
Please refer `SVR-PK/analysis.ipynb`