# Complete dtw evaluation workflow: simulate paired sequences and their groundtruth alignment, predict alignment with sm-dtw and assess sm-dtw prediction

This notebook shows you how to pipeline a complete workflow to assess the alignment prediction made by `sm-dtw`. The pipeline consist in three steps:

    1. simulate several pairs of "phyllotaxis" sequences with groundtruth alignments 
    2. provide the paired sequences to sm-dtw to predict alignments
    3. assess the predicted alignments by comparing with groundtruth alignments

The paired sequences are generated in batches using a configuration (config) table.

### Defintions / reminders

- a *phyllotaxis sequence* is an ordered sequence of values representing the divergence angles and internode length between 'organs' (e.g. leaves, fruits, flowers, branches) along the stem of a plant 
- a *pair of sequences* (or *paired sequences*) are two related sequences: one "reference" and one "test" derived from the "reference" after several modifications.

### Requirements

**/!\ Software requirements**: `sm-dtw` must be installed in a conda environment that you will activate in this notebook. Please refer to the documentation of sm-dtw to set-up the conda environment.

**Input requirements**:

- The **config table** is a .csv file that details the modifications that alters the "test" sequence from the "reference" sequence. A template file with all explanation is available in **Phyllotaxis-sim-eval/example_data/simulation_plants_README.ods**
- a config table is provided to run tests and this notebook: Phyllotaxis-sim-eval/example_data/Notebook_tests/**simulation_plants_nb.csv**


## Step1: Simulated (multiple) paired sequences with a configuration file

In [1]:
#Edit this variable to indicate the path to your local "Phyllotaxis-sim-eval" folder
localrepo=~/Documents/RDP/MyProjects/ROMI/Data/Eval_AnglesAndInternodes/Phyllotaxis-sim-eval/

#Edit this variable with the system path of the folder you choose to store the results:
dest=~/Documents/RDP/MyProjects/ROMI/Data/Eval_AnglesAndInternodes/tests
#suggestion: you can use Phyllotaxis-sim-eval/example_data/Notebook_tests/ that already contains input data for this notebook

#Choose a prefix id for the generated dataset
data="data1"

Rscript ../bin/simul_data.R \
--repository $localrepo \
--file ../example_data/Notebook_tests/simulation_plants_nb.csv \
--destination $dest \
--output_prefix $data \
--plots \
--verbose
#only --file/-f is a compulsory argument
#absolute path is required for --destination option
#--plots option is deactivated by default
#--verbose option is also deactivated by default

[1] "processing data for plant Plant#1"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = TRUE"
[1] "measure = TRUE"
[1] "noise = FALSE"
Number of short error-free internodes that can be permuted = 3 .
Permuting organs of interval n° 16 in the $modified sequence.
Permuting organs of interval n° 17 in the $modified sequence.
Permuting organs of interval n° 18 in the $modified sequence.
Number of permutation performed = 3.
[1] "processing data for plant Plant#2"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = FALSE"
[1] "measure = FALSE"
[1] "noise = TRUE"
[1] "processing data for plant Plant#3"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = FALSE"
[1] "measure = FALSE"
[1] "noise = TRUE"
[1] "processing data for plant Plant#4"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = TRUE"
[1] "measure = FALSE"
[1] "noise = FALSE"
Number of short error-free internodes

The above script has generated 5 new files (names are fixed):

- `reference_sequences.csv`: can be used as input for `sm-dtw`
- `test_sequences.csv`: can be used as input for `sm-dtw`
- `align_intervals.csv`: groundtruth alignment of intervals between the two paired phyllotaxis sequences
- `align_organs.csv`: groundtruth alignment of organs
- `Rplots.pdf`: compilation of aligned plots for each sequence pair 

## Step2: Predict an alignment for the paired sequences using sm-dtw


In [2]:
#Activate the conda environment where `sm-dtw` has been installed
#for example:
source ~/softwares/miniconda3/bin/activate
conda activate romi

(base) (romi) 

: 1

In [4]:
#choose a prefix name to identify the result output from dtw
myprediction="dtw1"
align_csv_database.py $dest/${data}_reference_sequences.csv $dest/${data}_test_sequences.csv \
$myprediction --free_ends 0.4

(romi) (romi) 2021-11-08 14:36:41 - INFO: Loading CSV files...
2021-11-08 14:36:41 - INFO: Found 5 PlantID in the reference CSV file.
2021-11-08 14:36:41 - INFO: Found 5 PlantID in the test CSV file.
2021-11-08 14:36:41 - INFO: Found 5 common PlantID in the reference & test CSV file.
2021-11-08 14:36:41 - INFO: Performing sequence comparison for 'Plant#4'...
2021-11-08 14:36:41 - INFO: Starting brute force search for 49 pairs of free-ends...
2021-11-08 14:36:56 - INFO: Found free-ends (3, 4) at a cost of 0.03412553435988868.
2021-11-08 14:36:57 - INFO: Performing sequence comparison for 'Plant#1'...
2021-11-08 14:36:57 - INFO: Starting brute force search for 64 pairs of free-ends...
2021-11-08 14:37:24 - INFO: Found free-ends (2, 6) at a cost of 0.18646557120078544.
2021-11-08 14:37:25 - INFO: Performing sequence comparison for 'Plant#2'...
2021-11-08 14:37:25 - INFO: Starting brute force search for 25 pairs of free-ends...
2021-11-08 14:37:29 - INFO: Found free-ends (0, 2) at a cost o

: 1

## Step3: Assess the alignment prediction made by `sm-dtw`

In [5]:
Rscript ../bin/eval_dtw.R \
--repository $localrepo \
--alignment_dtw $dest/${myprediction}_result.csv \
--reference_seq $dest/${data}_reference_sequences.csv \
--test_seq $dest/${data}_test_sequences.csv \
--intervals_truealign $dest/${data}_align_intervals.csv \
--output_prefix $data$myprediction \
--detail --plots --verbose \
--destination $dest

Starting script to evaluate dtw alignment prediction 
Le chargement a nécessité le package : optparse
Le chargement a nécessité le package : gridExtra
Converting dtw results for PlantID = Plant#1 
Converting dtw results for PlantID = Plant#2 
Converting dtw results for PlantID = Plant#3 
Converting dtw results for PlantID = Plant#4 
Converting dtw results for PlantID = Plant#5 
## Starting analysis for PlantID = Plant#1 .
[1] "both reference sequence have 23 intervals"
[1] "both test sequences (before/after dtw) have 22 intervals"
## Starting analysis for PlantID = Plant#2 .
[1] "both reference sequence have 14 intervals"
[1] "both test sequences (before/after dtw) have 13 intervals"
## Starting analysis for PlantID = Plant#3 .
[1] "both reference sequence have 14 intervals"
[1] "both test sequences (before/after dtw) have 14 intervals"
## Starting analysis for PlantID = Plant#4 .
[1] "both reference sequence have 19 intervals"
[1] "both test sequences (before/after dtw) have 21 interv

: 1