# Simulate data from a config table

This notebook shows you how to simulate several pairs of "phyllotaxis" sequences at the same time using a configuration (config) table.

- a *phyllotaxis sequence* is an ordered sequence of values representing the divergence angles and internode length between 'organs' (e.g. leaves, fruits, flowers, branches) along the stem of a plant 
- a *pair of sequences* are two related sequences: one "reference" and one "test" derived from the "reference" after several modifications.

The **config table** is a .csv file that details the modifications that alters the "test" sequence from the "reference" sequence. A template file with all explanation is available in **R_simul-eval/example_data/simulation_plants_README.ods**

The ouput "reference" and "test" sequences of this process can be given to an "alignment predictor" program such as `sm-dtw`. Among the outputs of this program, a "ground truth" alignment is provided ("align_intervals.csv") which allows you to quantify how much the alignment prediction is correct.

A pdf containing plots for each sequence pair can also be generated


In [7]:
dest=/home/fabfab/Dropbox/Arabidopsis-eval/Phyllotaxis-sim-eval/example_data/Notebook_tests

Rscript ../bin/simul_data.R \
--file ../example_data/Notebook_tests/simulation_plants_nb.csv \
--destination $dest \
--plots \
--verbose
#only --file/-f is a compulsory option
#absolute path are required for --destination option
#--plots option is deactivated by default

(romi) (romi) [1] "processing data for plant Plant#1"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = TRUE"
[1] "measure = TRUE"
[1] "noise = FALSE"
Number of short error-free internodes that can be permuted = 3 .
Permuting organs of interval n° 12 in the $modified sequence.
Permuting organs of interval n° 17 in the $modified sequence.
Permuting organs of interval n° 21 in the $modified sequence.
Number of permutation performed = 3.
[1] "processing data for plant Plant#2"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = FALSE"
[1] "measure = FALSE"
[1] "noise = TRUE"
[1] "processing data for plant Plant#3"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = FALSE"
[1] "measure = FALSE"
[1] "noise = TRUE"
[1] "processing data for plant Plant#4"
reminder of main scenario parameters 
[1] "seg_errors = TRUE"
[1] "permutation = TRUE"
[1] "measure = FALSE"
[1] "noise = FALSE"
Number of short error-f

: 1

The above script has generated 5 new files (names are fixed):

- `reference_sequences.csv`: can be used as input for `sm-dtw`
- `test_sequences.csv`: can be used as input for `sm-dtw`
- `align_intervals.csv`: groundtruth alignment of intervals between the two paired phyllotaxis sequences
- `align_organs.csv`: groundtruth alignment of organs
- `Rplots.pdf`: compilation of aligned plots for each sequence pair 

In [8]:
source ~/softwares/miniconda3/bin/activate
conda activate romi

(base) (romi) 

: 1

In [9]:
align_csv_database.py $dest/reference_sequences.csv $dest/test_sequences.csv ploufplouf --free_ends 0.4

2021-11-03 17:52:20 - INFO: Loading CSV files...
2021-11-03 17:52:20 - INFO: Found 5 PlantID in the reference CSV file.
2021-11-03 17:52:20 - INFO: Found 5 PlantID in the test CSV file.
2021-11-03 17:52:20 - INFO: Found 5 common PlantID in the reference & test CSV file.
2021-11-03 17:52:20 - INFO: Performing sequence comparison for 'Plant#3'...
2021-11-03 17:52:20 - INFO: Starting brute force search for 25 pairs of free-ends...
2021-11-03 17:52:23 - INFO: Found free-ends (0, 4) at a cost of 0.08855695212494209.
2021-11-03 17:52:24 - INFO: Performing sequence comparison for 'Plant#1'...
2021-11-03 17:52:24 - INFO: Starting brute force search for 64 pairs of free-ends...
2021-11-03 17:52:46 - INFO: Found free-ends (0, 8) at a cost of 0.17814975897239918.
2021-11-03 17:52:47 - INFO: Performing sequence comparison for 'Plant#4'...
2021-11-03 17:52:47 - INFO: Starting brute force search for 49 pairs of free-ends...
2021-11-03 17:53:00 - INFO: Found free-ends (3, 4) at a cost of 0.0495478242

: 1