# Complete dtw evaluation workflow: simulate paired sequences and their groundtruth alignment, predict alignment with sm-dtw and assess sm-dtw prediction

This notebook shows you how to pipeline a complete workflow to assess the alignment prediction made by `sm-dtw`. The pipeline consist in three steps:

    1. simulate several pairs of "phyllotaxis" sequences with groundtruth alignments 
    2. provide the paired sequences to sm-dtw in order to predict alignments
    3. assess the predicted alignments by comparing with groundtruth alignments

The paired sequences are generated in batches using a configuration (config) table.

### Defintions / reminders

- a *phyllotaxis sequence* is an ordered sequence of values representing the divergence angles and internode length between 'organs' (e.g. leaves, fruits, flowers, branches) along the stem of a plant 
- a *pair of sequences* (or *paired sequences*) are two related sequences: one "reference" and one "test" derived from the "reference" after several modifications.

### Requirements

**/!\ Software requirements**: 
- `sm-dtw` must be installed in a conda environment that you will activate in this notebook. Please refer to the documentation of sm-dtw to set-up the conda environment.
- R (v4.0 +) must be installed with at least the following packages: optparse, ggplot2, reshap2, gridExtra

**Input requirements**:

- The **config table** is a .csv file that details the modifications that alters the "test" sequence from the "reference" sequence. A template file with all explanation is available in **Phyllotaxis-sim-eval/example_data/simulation_plants_README.ods**
- a config table is provided to run tests and this notebook: Phyllotaxis-sim-eval/example_data/Notebook_tests/**simulation_plants_nb.csv**


## Step1: Simulated (multiple) paired sequences with a configuration file

In [6]:
#Edit this variable to indicate the path to your local "Phyllotaxis-sim-eval" folder
localrepo=~/Documents/RDP/MyProjects/ROMI/Data/Eval_AnglesAndInternodes/Phyllotaxis-sim-eval/

#Edit this variable with the system path of the folder you choose to store the results (absolute path only):
dest=~/Documents/RDP/MyProjects/ROMI/Data/Eval_AnglesAndInternodes/tests
#suggestion: you can use Phyllotaxis-sim-eval/example_data/Notebook_tests/ that already contains input data for this notebook

#Choose a prefix id for the generated dataset
data="data1"

Rscript ../bin/simul_data.R \
--repository $localrepo \
--file ../example_data/Notebook_tests/simulation_plants_nb.csv \
--destination $dest \
--output_prefix $data \
--verbose

#only --file/-f is a compulsory argument
#absolute path is required for --destination option
#--verbose option is also deactivated by default
# add --plot  to print plots in the dest folder

(romi) (romi) (romi) (romi) (romi) (romi) (romi) (romi) (romi) (romi) [1] "default parameters for simulated sequences"
Starting script to simulate paired sequences of phyllotaxis 
[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[?25h[1] "processing data for plant Plant#1"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = TRUE"
[1] "Noise or measures = measures"
[1] "Natural permutations can be added to the divergence angle sequence"
[1] "there are isolated permutations"
[1] "No consecutive permutations have been drawn in the sequence"
[1] "Computing new divergence angles after natural permutations"
[1] "the sd of the gaussian noise applied to input values will be scaled to absolute"
[1] "the sd of the gaussian noise applied to input values will be scaled to absolute"
Number of short error-free internodes that can be permuted = 4 .
Permuting organs of interval n° 13 in the $modified sequence.
Permuting organs of interval n° 16 in the $modified seque

: 1

The above script has generated 5 new files (names are fixed):

- `reference_sequences.csv`: can be used as input for `sm-dtw`
- `test_sequences.csv`: can be used as input for `sm-dtw`
- `align_intervals.csv`: groundtruth alignment of intervals between the two paired phyllotaxis sequences
- `align_organs.csv`: groundtruth alignment of organs
- `Rplots.pdf` or `SimulatedPairedSequences.pdf`(if the option --plot has been given): compilation of aligned plots for each sequence pair 

## Step2: Predict an alignment for the paired sequences using sm-dtw


In [7]:
#Activate the conda environment where `sm-dtw` has been installed
#for example:
source ~/softwares/miniconda3/bin/activate
conda activate romi

(romi) (romi) (base) (romi) 

: 1

In [8]:
#choose a prefix name to identify the result output from dtw
myprediction="dtw1"
align_csv_database.py $dest/${data}_reference_sequences.csv $dest/${data}_test_sequences.csv \
$myprediction #--free_ends 0.4

(romi) (romi) 2022-06-14 22:31:39 - INFO: Loading CSV files...
2022-06-14 22:31:39 - INFO: Found 6 PlantID in the reference CSV file.
2022-06-14 22:31:39 - INFO: Found 6 PlantID in the test CSV file.
2022-06-14 22:31:39 - INFO: Found 6 common PlantID in the reference & test CSV file.
2022-06-14 22:31:39 - INFO: Performing sequence comparison for 'Plant#5'...
2022-06-14 22:31:39 - INFO: Starting brute force search for 49 pairs of free-ends...
2022-06-14 22:31:52 - INFO: Found free-ends (3, 4) at a cost of 0.03297957061664237.
Traceback (most recent call last):
  File "/home/fabfab/softwares/miniconda3/envs/romi/bin/align_csv_database.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/fabfab/Documents/RDP/MyProjects/ROMI/Data/Eval_AnglesAndInternodes/dtw/src/dtw/bin/align_csv_database.py", line 173, in <module>
    main(args)
  File "/home/fabfab/Documents/RDP/MyProjects/ROMI/Data/Eval_AnglesAndInternodes/dtw/src/dtw/bin/align_csv_database.py", line 147,

: 1

## Step3: Assess the alignment prediction made by `sm-dtw`

In [4]:
Rscript ../bin/eval_dtw.R \
--repository $localrepo \
--alignment_dtw $dest/${myprediction}_result.csv \
--reference_seq $dest/${data}_reference_sequences.csv \
--test_seq $dest/${data}_test_sequences.csv \
--intervals_truealign $dest/${data}_align_intervals.csv \
--output_prefix $data$myprediction \
--detail --plots --verbose \
--destination $dest

Starting script to evaluate dtw alignment prediction 
Le chargement a nécessité le package : optparse
Le chargement a nécessité le package : gridExtra
Converting dtw results for PlantID = Plant#1 
Converting dtw results for PlantID = Plant#2 
Converting dtw results for PlantID = Plant#3 
Converting dtw results for PlantID = Plant#4 
Converting dtw results for PlantID = Plant#5 
Converting dtw results for PlantID = Plant#6 
## Starting analysis for PlantID = Plant#1 .
[1] "both reference sequence have 23 intervals"
[1] "both test sequences (before/after dtw) have 22 intervals"
## Starting analysis for PlantID = Plant#2 .
[1] "both reference sequence have 14 intervals"
[1] "both test sequences (before/after dtw) have 13 intervals"
## Starting analysis for PlantID = Plant#3 .
[1] "both reference sequence have 14 intervals"
[1] "both test sequences (before/after dtw) have 14 intervals"
## Starting analysis for PlantID = Plant#4 .
[1] "both reference sequence have 15 intervals"
[1] "both te

: 1