# Treponema genome analysis workflow - Grillova et al. 2019

## Setting up the environment

### Requirements

The workflows has been tested on Linux machines (Ubuntu 14.04, Debian 9) with Python 2.7.6 using Conda 4.5.11, Jupyter notebook 5.7.2 installed with Python 3.4.3 and bash_kernel 0.7.1.

In [7]:
uname -a
python --version
python3 --version
conda --version
echo "Jupyter notebook" `jupyter notebook --version`
echo "bash_kernel" `pip3 show bash_kernel | grep Version`

Linux thor 4.4.0-140-generic #166~14.04.1-Ubuntu SMP Sat Nov 17 01:52:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Python 2.7.6
Python 3.4.3
conda 4.5.11
Jupyter notebook 5.7.2
bash_kernel Version: 0.7.1


### Jupyter notebook

We will use the Jupyter notebook to run the analysis. The first step is therefore the installation of the Jupyter notebook itself. To ensure the compatibility with the Bash kernel we have to use the Python 3 version of the Jupyter notebook.

In [None]:
python3 -m pip install --upgrade pip
python3 -m pip install jupyter

Once you have the Jupyter notebook you should install the [Bash kernel](https://pypi.org/project/bash_kernel/).

In [None]:
sudo pip3 install bash_kernel
sudo python3 -m bash_kernel.install

And finally launch the Jupyter notebook.

In [None]:
jupyter notebook

### Conda

You can choose whether you install all the tools separately or use Conda environment. If you choose the Conda environemnt (recommended) you will first have to install the Conda itself. Please follow the instructions [here](https://conda.io/docs/user-guide/install/index.html). The example of installation of the Conda for 64-bit Linux is the following:

In [None]:
wget https://repo.continuum.io/archive/Anaconda2-2018.12-Linux-x86_64.sh
bash Anaconda2-2018.12-Linux-x86_64.sh

Once you installed the Conda environment you should install the tools and the dependencies.

In [None]:
# Create new conda environment
conda create --name treponema python=2.7

Solving environment: done


  current version: 4.5.11
  latest version: 4.5.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jan/Tools/anaconda/envs/treponema

  added / updated specs: 
    - python=2.7


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python-2.7.15              |       h9bab390_6        12.8 MB
    wheel-0.32.3               |           py27_0          34 KB
    sqlite-3.26.0              |       h7b6447c_0         1.9 MB
    certifi-2018.11.29         |           py27_0         146 KB
    pip-18.1                   |           py27_0         1.8 MB
    setuptools-40.6.3          |           py27_0         627 KB
    ncurses-6.1                |       he6710b0_1         958 KB
    ------------------------------------------------------------
                                           Tot

In [None]:
# Activate the environment and cofigure the channels
source activate treponema
conda config --add channels r
conda config --add channels conda-forge
conda config --add channels bioconda

Most of the tools are already available in Conda so we will use this great functionality and install them.

In [None]:
# Install all the required tools using Conda
conda install -c bioconda fastqc reaper multiqc cutadapt=1.15 bbmap=37.52 samtools=1.4 seqtk=1.2 R=3.4.3 bwa=0.7.15 picard=2.9.2 gatk=3.7 ngsutils=0.5.9 qualimap=2.2.2a bcftools=1.4 vcftools=0.1.15 vcflib freebayes=0.9.21 spades=3.10.1 quast besst=2.2.7 busco=3.0.2 blast=2.2.31 snpeff=4.2

Two tools ([jvarkit](https://github.com/lindenb/jvarkit) and [StrainSeeker](http://bioinfo.ut.ee/strainseeker/)) are not available in Conda so we have to install them separately. 

In [None]:
# Install the rest of the tools which are not available through conda
instal_dir="/home/jan/Tools"
mkdir $instal_dir

# StrainSeeker and it's database
cd $instal_dir/
mkdir strainseeker
cd strainseeker/
wget http://bioinfo.ut.ee/strainseeker/downloads/seeker.pl
wget bioinfo.ut.ee/strainseeker/downloads/builder.pl
wget http://bioinfo.ut.ee/strainseeker/executables/ss_db_w32_4324.tar.gz
wget http://bioinfo.ut.ee/strainseeker/downloads/ss_helper_scripts.tar.gz
tar xvzf ss_helper_scripts.tar.gz

# Jvarkit - BAM downsample
cd $instal_dir
wget https://github.com/lindenb/jvarkit/archive/v2018.04.05.tar.gz
tar xvzf v2018.04.05.tar.gz
cd jvarkit-2018.04.05/
make sortsamrefname
make biostar154220

## Running the workflow

### Inputs

### Outputs