# Creating new workflows and application bindings

This recipe shows how create new application bindings to create new workflows.

`ete-build` comes with a number of pre-configured applications, which are meant to cover default options of the supported software and some common approaches. Here you will learn how to create custom configurations that you can use in workflows. 


## Requirements
- ete3
- ete3_external_apps
- [basic concepts about ete-build](ete_build_basics.ipynb) 
- [composing custom workflows](ete_build_workflows.ipynb)


## Recipe

### 1. Configuring external software calls

In [composing custom workflows](ete_build_workflows.ipynb), you learnt how to refer to the different preconfigured options in `ete-build`.

You can explore the details of any application binding (or even workflow) by using the `ete3 build build show [blockname]`.

In [15]:
%%bash
ete3 build show raxml_default
ete3 build show phyml_default
ete3 build show mafft_linsi

[1;33m[raxml_default][0m
[33m                              _desc[0m = RAxML with default parameters, GAMMA JTT/GTR and aLRT branch supports.
[1;34m                               _app = raxml[0m
[33m                         _bootstrap[0m = alrt
[33m                            _method[0m = GAMMA
[33m                          _aa_model[0m = JTT
[33m                      _model_suffix[0m = 
[33m                                 -f[0m = d
[33m                                 -p[0m = 31416

[1;33m[phyml_default][0m
[33m                              _desc[0m = Phyml tree using +G+I+F, 4 classes and aLRT branch supports. Default models JTT/GTR
[1;34m                               _app = phyml[0m
[33m                          _aa_model[0m = JTT
[33m                          _nt_model[0m = GTR
[33m                  --no_memory_check[0m = 
[33m                            --quiet[0m = 
[33m                             --pinv[0m = e
[33m                            

As you have probably noticed, each application binding referred in workflow names is defined as a configuration block whose name is between brackets. To create new configuration blocks, you need to keep the following concepts in mind: 

- the name of the block should not contain other symbols rather than text, numbers and underscores. Never use hyphens.
- Each block is composed of several internal `ete-build` options and several native software options. 
  - _internal options_ are always prepended by an underscore and they are used to define the software to execute, the description of the config block and some default parameters that should not be hardcoded as native program arguments.
  - Any other option in the format "argument = value" are considered _native options_ and are passed as it to the corresponding software.  
  - To pass flags as arguments, you can use the syntax "`argument = ''`" or simply "`argument = `" 



### 2. Creating new configuration blocks

Let's imagine that we want to create a new workflow where RAxML needs to be configured in a different manner. 

All we need to do is to dump any configuration block based on the RAxML bindings from the examples provided, and modify it. You can dump config blocks as regular text with the `ete2 build dump` command: 


In [18]:
%%bash
ete3 build dump raxml_default > myconfig.cfg
cat myconfig.cfg

[raxml_default]
                                   _desc = RAxML with default parameters, GAMMA JTT/GTR and aLRT branch supports.
                                    _app = raxml
                              _bootstrap = alrt
                                 _method = GAMMA
                               _aa_model = JTT
                           _model_suffix = 
                                      -f = d
                                      -p = 31416



Let's now modify the options, so the configuration block looks like this (10 bootstraps and CAT model):
```
[raxml_CAT_10]
                         _desc = RAxML with default parameters, CAT with 100 bootstraps.
                          _app = raxml
                    _bootstrap = 10        # we will use 10 replicates to keep the example fast 
                       _method = CAT
                     _aa_model = JTT
                 _model_suffix = 
                            -f = d
                            -p = 31416
```

In [22]:
%%bash 
echo '
[raxml_CAT_10]
                         _desc = RAxML with default parameters, CAT with 100 bootstraps.
                          _app = raxml
                    _bootstrap = 10
                       _method = CAT
                     _aa_model = JTT
                 _model_suffix = 
                            -f = d
                            -p = 31416
' > myconfig.cfg
cat myconfig.cfg


[raxml_CAT_10]
                         _desc = RAxML with default parameters, CAT with 100 bootstraps.
                          _app = raxml
                    _bootstrap = 10
                       _method = CAT
                     _aa_model = JTT
                 _model_suffix = 
                            -f = d
                            -p = 31416



### 3. Using our custom configuration blocks 

Once you have create all your personal configuration blocks, you can used them to run custom workflows by passing your configuration file with the `'-c'` option of `ete-build`

Note that for this example we also are enabling 5 CPUs, you can adjust this parameter to your system.

In [24]:
%%bash
ete3 build -a data/NUP62.aa.fa --clearall -o raxml_cat/ -w mafft_linsi-none-none-raxml_CAT_10 -c myconfig.cfg --cpu 5

Toolchain path: /Users/jhc/anaconda/bin/ete3_apps 
Toolchain version: 2.0.3


      --------------------------------------------------------------------------------
                  ETE build - reproducible phylogenetic workflows 
                                    unknown, unknown.

      If you use ETE in a published work, please cite:

        Jaime Huerta-Cepas, Joaquín Dopazo and Toni Gabaldón. ETE: a python
        Environment for Tree Exploration. BMC Bioinformatics 2010,
        11:24. doi:10.1186/1471-2105-11-24

      (Note that a list of the external programs used to complete all necessary
      computations will be also shown after execution. Those programs should
      also be cited.)
      --------------------------------------------------------------------------------

    
[32mINFO[0m -  Testing x86-64  portable applications...
       clustalo: [32mOK[0m - 1.2.1
[33mDialign-tx not supported in OS X[0m
       fasttree: [32mOK[0m - FastTree Version 2.1.8 Double 

After a few minutes a tree based on 10 RAxML bootstrap replicates using CAT should be ready. You can now load and analyze the tree.

In [33]:
from ete3 import Tree
cat_tree = Tree("raxml_cat/mafft_linsi-none-none-raxml_CAT_10/NUP62.aa.fa.final_tree.nw")
print cat_tree.get_ascii(attributes=["support", "name"])


          /-1.0, Phy004OQ34_STRCA
     /1.0, 
    |    |     /-1.0, Phy004PA1B_ANAPL
    |     \1.0, 
    |         |     /-1.0, Phy0054BO3_MELGA
    |          \1.0, 
    |               \-1.0, Phy003I7ZJ_CHICK
    |
    |               /-1.0, Phy00508FR_NIPNI
-1.0,          /0.3, 
    |         |     \-1.0, Phy0050IUO_OPIHO
    |     /0.1, 
    |    |    |     /-1.0, Phy004Y35P_HALLE
    |    |     \1.0, 
    |    |          \-1.0, Phy004XRVA_HALAL
    |    |
    |    |          /-1.0, Phy004W8WJ_FALPE
    |    |     /0.9, 
     \1.0,    |     \-1.0, Phy004W8WI_FALPE
         |    |
         |    |                         /-1.0, Phy004OLZM_COLLI
         |    |                    /0.9, 
         |    |                   |     \-1.0, Phy004OLZN_COLLI
         |    |               /0.3, 
         |    |              |    |     /-1.0, Phy004V34S_CORBR
         |    |              |     \0.4, 
          \0.0,          /0.1,          \-1.0, Phy004Y9VQ_LEPDC
              |         |    |