In this notebook we prepare Bowtie2, STAR and SALMON indexes for the X.Laevis genome v9.2 using gff3 annotations for xenbase, hereon refered to as Xenla. 
Please refer to the notebook called pipeline_prepare-annotations_XenopusLaevis for instructions on how to generate the transcriptome fasta file, as well as how to generate annotation files used by the ChaAR-seq pipelines.

## Make a bowtie index

In [None]:
# Let's say we downloaded the Xenla genome in $GENOMES_ROOT/xenopus_laevis/v9.2

GENOMES_ROOT = "<root_folder_of_genomes>"
cd "${GENOMES_ROOT}/xenopus_laevis/v9.2"
mkdir -p bowtie_index
bowtie2-build --threads 8 XL9_2.fa.gz bowtie_index/genome

## Make a STAR index


We previously (notebook 01) downloaded the gff file.
```bash

cd "${GENOMES_ROOT}/xenopus_laevis/v9.2/annotations_xenbase"

```

Normally, STAR generate the splice junction and transcript annotation database by parsing the GFF3 file and looking for the "exon" features, which are then assigned to specific transcripts using the parent-child relationship indicated in the "transcript_id" field of the exon.

This STAR behavior is determined by the arguments

--sjdbGTFfeatureExon exon
--sjdbGTFtagExonParentTranscript transcript_id

We need to modify this to account for the idiosyncracy of the xenbase annotation file where the parent is given by the field "Parent"

Therefore our star index builiding command should be

```bash
mkdir -p STAR_index

STAR --runMode genomeGenerate --runThreadN 12 --genomeDir STAR_index --genomeFastaFiles ../XL9_2.fa --sjdbGTFfile XENLA_9.2_Xenbase.gff3 --sjdbOverhang 150 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFfeatureExon exon --genome ChrBinNbits 12
```

## Make a STAR index for genebodies

In tagtools,  exons take priority over introns. When a read doens't align to any exonic portion of any annotatated transcript, then tagtools checks if the alignment overlaps with any intron (or intron/exon junction). To do that, we use a trick where we run star with a gff file where the transcripts are the gene bodies rather true transcripts. This can be done using the same gff file but by setting `--sjdbGTFtagExonParentTranscript ID --sjdbGTFfeatureExon gene` when we build the index.

```bash
mkdir -p STAR_index_GENEBODIES

STAR --runMode genomeGenerate --runThreadN 12 --genomeDir STAR_index_GENEBODIES --genomeFastaFiles ../XL9_2.fa --sjdbGTFfile XENLA_9.2_Xenbase.gff3 --sjdbOverhang 150 --sjdbGTFtagExonParentTranscript ID --sjdbGTFfeatureExon gene --genomeChrBinNbits 12
```

## Make SALMON index

This makes a salmon index for the genebodies (includes introns).
```bash

cd "${GENOMES_ROOT}/xenopus_laevis/v9"
mkdir SALMON_index_v0.14
cd  SALMON_index_v0

salmon index -t ../XENLA_9.2_transcriptome.fa -i k21_index -k 21 -p 6
```

We want also (mostly in fact) a salmon index for the genes (exons only). We need to use the other transcritome file
```bash
salmon index -t ../XENLA_9.2_transcriptome_NOINTRONS_fastcompute.fa -i k21_index_exonsONLY -k 21 -p 12
```

Also, finally, there are solo exons (parent are genes not rna) that we can also incorporate in yet another index
```bash
salmon index -t ../XENLA_9.2_transcriptome_NOINTRONS_fastcompute_ALL.fa -i k21_index_exonsONLY_ALL -k 21 -p 12
```