# Running MATCH on PeTaL

Wednesday, 18 August 2021.

In this notebook I demonstrate my apparatus for how to run MATCH with PeTaL.

If you are reproducing these results you will want to change your runtime hardware accelerator to a GPU if you haven't already.


# Setup

To set up, we will want to clone the PeTaL labeller repository and install requirements.

In [None]:
!git clone https://github.com/nasa-petal/PeTaL-labeller.git

In [None]:
%cd PeTaL-labeller/auto-labeler/MATCH/

Checking out the match-with-petal branch, but soon we'll merge this into main.

In [None]:
!git checkout match-with-petal

In [17]:
# !git pull origin match-with-petal

If you're running on another machine, you should set up an environment.
Currently I do
```
conda create --name match-env python=3.6.8
conda activate match-env
```
NOTE: I understand there are problems with using both conda and pip together as package managers.
NOTE: This part may also take a while. (2 minutes on Colab)
NOTE: No need to restart the runtime at the end like Colab suggests you do.

In [None]:
!pip install -r requirements.txt

We run `setup.py` to download the PeTaL dataset. Right now this includes:

- golden.json - the latest version of golden.json on David's branch, which adheres to the golden dataset schema
- filtered.json - the result of running the following script, which filters out all papers except for the biomimicry papers which have labels.
```
python3 filter.py -i MATCH/PeTaL/golden.json -o MATCH/PeTaL/filtered.json
```
- taxonomy.txt - the taxonomy file
- PeTaL.joint.emb - the file of embeddings, which you can obtain by running *embedding pre-training* using MATCH/joint/run.sh

In [None]:
!python3 setup.py

Now the `PeTaL/` directory is in `src/MATCH/PeTaL`. Let's move to the `src` directory.

In [None]:
%cd src
!ls

# Run through the entire training/testing/evaluation pipeline.

`run_MATCH_with_PeTaL_data.py` --cnf config.yaml --verbose runs `Split.py`, `augment.py`, `transform_golden.py`, `preprocess.py`, `train.py`, and `eval.py`. These can also be run separately (see the comments at the top of each of these files for instructions to run them separately).

You'll be asked for a wandb API key. You can either continue onward without it or use your own (it should prompt you to click a link to get the API key)

In [None]:
!python3 run_MATCH_with_PeTaL_data.py --cnf config.yaml --verbose

Produce precision-recall plots to assess MATCH's performance as we vary threshold.

In [None]:
!python3 ../analysis/precision_and_recall.py -m MATCH -p ../plots --verbose

Produce a multilabel confusion matrix to assess MATCH's predictions.

In [None]:
!python3 ../analysis/multilabel_confusion_matrix.py -m MATCH/ -p ../plots/ --verbose

# Cross-validation and analysis

Performs cross-validation to generate multiple trials on different folds of the dataset, and saves them to a log file `../experiment_data/xval_test/20210818_new_test.txt`.

In [None]:
!python3 xval_test.py --cnf config.yaml -k 10 --study NEW_TEST --verbose | tee -a ../experiment_data/xval_test/20210818_new_test.txt

Performs statistics on that log file.

In [None]:
!python3 ../analysis/analyse_MATCH_output.py -f ../experiment_data/xval_test/20210818_new_test.txt

# Inference

In [None]:
!./run_inference.sh MATCH/PeTaL/filtered.json