##### "Entities are not to be multiplied without necessity" ---- William of Ockham

### Background: Why is this document important

The quote above translate in to DRY (Don't Repeat Yourself, see this [blog](https://www.codementor.io/joshuaaroke/dry-code-vs-wet-code-89xjwv11w)) in programming, as opposed to WET (write everything twice). As I complained recklessly to Fu, many scripts have been written >2 times on the server, which can be very problematic when it comes to maintainence. For the sake of furture generation and personal development, I have created this document as a tutorial of managing dependency on a bash commandline. 

The central idea is to have a single authoratative copy of the utility/workhouse code that is re-used all the time. This utility should contain ONLY the logic w.r.t. the specific data structure (e.g. transcriptomics matrix, ChIP-Seq tracks), and NOT the variables (e.g.: species-dependent genome index, ). This way, we can reuse the same script for all species which would avoids loads of debugging and testing, hence speeding up the research.

One of the major instrument to create DRY codes is OOP (object-oriented programming, see this [tutorial](https://python.swaroopch.com/oop.html) ), which is rather handy in dealing with abstract data structures. But this will be the topic of another document.

### Pipelines are written in the following languages:

  1. bash
    
    1. utility scripts to be executed with "source" or "."

        * e.g: `source /home/feng/repos/BrachyPhoton/util.sh`
        
    1. functional scripts to be exceuted with bash (omittable)
      
        * e.g: `(bash) /home/feng/repos/BrachyPhoton/pipeline_rnaseq/pipeline_mapper.sh --help`
          
  1. python2
      
    1. direct excutable functional scripts (omittable python2):
         
      * e.g: `(python2) preprocessor.py <path_to_fastq_folder>`
    
    1. Utility modules that can be imported once installed
      
      * e.g.:`python -c 'import pymisca.util as pyutil'`
      
      
  1. others



## Dependencies:

### Python

#### install

To install a python package locally, say `matplotlib`, just do 

```sh
pip install --user matplotlib

### or specify the version explicitly
pip2 install --user matplotlib
```

#### `pymisca` is Feng's python2 utilities

  * There is a local copy on cluster at `/home/feng/repos/pymisca`, 
  
  * Remote is hosted on http://github/shouldsee/pymisca but may lag behind
  
  * install simiarly with 
  
  ```sh
  pip install --user /home/feng/repos/pymisca
  
  ### Check properly installed
  pip show pymisca
  ```
  

### Bash 

* Bash utilites can be further defined to be strong-sense and weak-sense

* Weak-sense utility is essentially a collection of bash functions

* Strong-sense utilies is better sourced each time you use it due to its highly dynamic nature (e.g.: species genome inedx, pipeline to be run, etc.)

#### Bash functions

```sh
function foo()
{
    echo Bar bar bar chip chip rnaseq > foo.bar
    return 0
}
```

these functions are called extensively in my functional bash scripts hence bash will need them defined before running. You can check whether a function `foo()` is defined with `type foo`. You will get an error if bash cannot find it:

```sh
bash: type: foo: not found
```

#### Weak-sense utility

You can load a script of function definitions with 

```sh
source /home/feng/repos/BrachyPhoton/util.sh
## this variable contains the path of the util.sh
echo $UTIL
```
you can ask  bash shell to load it every time you login by putting the command to  `$HOME/.bash_profile` . I found  `nanos` to be the simplest command-line editor for me but feel free to prefer vi/vim/emacs. e.g.

```sh
nanos $HOME/.bash_profile
nanos ~/.bash_profile ### $HOME is synonymous to ~
echo source /home/feng/repos/BrachyPhoton/util.sh >> ~/.bash_profile
```

#### Strong-sense utility, also known as "environemnts"

Apart from functions, bash shell carry environment varibales with it, which can be listed with `env`. For specifically ones, do `echo $IDX_BOWTIE2 $BASH $FOO $BAR` etc.. An environment modifies some magic-variables like `$PATH  $LD_LIBRARY_PATH  $PYTHONPATH` so that bash knows where to find programs to use. 

Environments are currently listed in `/home/feng/envs`, to activate any of them, say "ref", do 

```sh
source /home/feng/envs/ref/bin/activate

### Magic variable to indicate currently environment loaded
echo $ENVDIR
```



[html](=Tools=.html)

In [9]:
! jupyter nbconvert --to html =Tools=.ipynb

[NbConvertApp] Converting notebook =Tools=.ipynb to html
[NbConvertApp] Writing 255097 bytes to =Tools=.html
