# Using our CLI

In addition to our [API](PythonAPI.ipynb) we provide an easy to use command-line interface (CLI) which can be used to train your own site-of-metabolism (SOM) prediction models and retrieve predictions from them.

In this interactive Jupyter notebook, we will walk you through using this tool by creating a new SOM prediction model based on a synthetic dataset. The resulting model is not expected to be useful for real metabolism prediction, but serves as an example for what can be done using our tool. 

```{tip}
To get additional information you can also invoke any subcommand with the `--help` flag. This will show a summary of all supported arguments.
```

## Building the model

### Hyperparameter search

The `hyperparameters` command allows you to perform cross-validation hyperparameter search for your data using the same setup that was also used in the FAME3R paper. Hyperparameters are exported as JSON.

```{note}
Hyperparameter optimization is disabled per default in this notebook, as it takes a long time.
If you want to perform hyperparameter optimization and use the generated hyperparameters, you can uncomment the next code cell and add `--hyperparameters hyperparameters.json` as an option to the `train` command in the next section.
```

In [None]:
%%bash
#fame3r hyperparameters -i data/metatrans_autoannotated_cleaned/train.sdf -o hyperparameters.json

### Training

The `train` command is used to train a random forest model for predicting SOMs as well as an auxillary model for predicting the FAME score. Without additional parameters the resulting model will be trained using exactly the same parameters that were also used for the models reported in the FAME3R paper.

In [None]:
%%bash
fame3r train -i data/metatrans_autoannotated_cleaned/train.sdf -o models

### Threshold post-tuning

The `threshold` command can be used for threshold post-tuning i.e. finding the classification threshold that will result in the most balanced predictions.

```{note}
Threshold post-tuning is disabled per default in this notebook, as it takes a long time.
If you want to perform threshold post-tuning and use the generated threshold, you can uncomment the next code cell and add `--threshold hyperparameters.json` as an option to the `predict` command in the next section.
```

In [None]:
%%bash
#fame3r threshold -i data/metatrans_autoannotated_cleaned/train.sdf -m models/random_forest_classifier.joblib

## Applying the model

### Generating predictions

Now that we have some trained models, the `predict` command can be used to generate predictions, including predicted probabilities and binary predictions based on either the default or a provided threshold. The `--fame-score` flag ensures that FAME scores are also generated for each input atom.

In [None]:
%%bash
fame3r predict -i data/metatrans_autoannotated_cleaned/test.sdf -m models -o predictions.csv --fame-score

### Calculating metrics

Given a prediction CSV file generated as seen above, the `metrics` command can then be used to calculate various classification metrics, including the Top-_k_ metric (_k_=2) which is commonly used in metabolism prediction. Metrics are exported as JSON.

In [None]:
%%bash
fame3r metrics -i predictions.csv -o metrics.json

## Using descriptors externally

While our Python API can be used to seamlessly integrate our work into your Python-based chemoinformatics workflows, we recognize that other programming and modelling environments exist. To that end, you can use the `descriptors` command to generate FAME3R descriptors in various configurations. The generated descriptors are exported as CSV.

In [None]:
%%bash
fame3r descriptors -i data/metatrans_autoannotated_cleaned/train.sdf -o descriptors.csv