# Data Preparation RNA-seq Reads

## Trimming Reads to remove sequence adapters

For every replicate, you define the paths where to find the input and where to save the output in order to execute the trim. Using qsub all tasks are launched at the same time.

In [1]:
%%bash

pathBunchFiles='.../RNA_Seq_Data/' # Path to find all the RNA seq samples
outputPath='.../RNA_Seq_Data/Trimmed'
qsubOutput='.../qsub_outputs'
logs='.../logs'

sh ../../../Scripts/1_Data_Preparation/RNA_seq/Trimming/1_trimmBunchSeqFiles.sh $pathBunchFiles $outputPath $qsubOutput $logs


## Alignment Reads to the Genome !

Alignment was done end to end using the STAR aligner against the human genome (hg38)

Input Files
```bash
"""
genomeDirectory : path
    Path to the index of the Genome
outputFilterMatchInt :  int
    Alignment will be output only if the number of matched bases is higher than or equal to this value
inputFilesR1_1: path
    Path to files that contain input reads : R1 - Replicate 1
inputFilesR1_2 : path
    Path to files that contain input reads : R1 - Replicate 2
inputFilesR2_1: path
    Path to files that contain input reads : R2 - Replicate 1
inputFilesR2_2 : path
    Path to files that contain input reads : R2 - Replicate 2
        
outputFile : path
    Path where output will be store
nameTask : string 
    qsub names task : To identify the task 
saveOutputQsub : path
    Path to save qsub output 
logPath : path
    Path to save log output 
"""
```
Output Files
```bash
"""
    Any STAR alignment output will be store in $outputFile
    e.g. Bam File
"""
```

In [4]:
%%bash

# For each sample do:

echo "Alignment to the genome -- RNA "
genomeDirectory='../../../Data_Input_Scripts/IndexStarRNA_Seq/'
outputFilterMatchInt=40

inputFilesR1_1='.../RNA_Seq_Data/Trimmed/R1_trimmed_repl1.fastq'
inputFilesR1_2='.../RNA_Seq_Data/Trimmed/R1_trimmed_repl2.fastq'

inputFilesR2_1='.../RNA_Seq_Data/Trimmed/R2_trimmed_repl1.fastq'
inputFilesR2_2='.../RNA_Seq_Data/Trimmed/R2_trimmed_repl2.fastq'

outputFile='.../RNA_Seq_Data/Data_Preparation/AlignmentReadsGenome/RNA/RNA_'
nameTask='Mapping_RNA_Reads'
saveOutputQsub='.../qsub_outputs'
logPath='.../logs'

sh ../../../Scripts/1_Data_Preparation/RNA_seq/Alignment/alignmentPairedEnd_RNA_Seq.sh $genomeDirectory $outputFilterMatchInt $inputFilesR1_1 $inputFilesR1_2 $inputFilesR2_1 $inputFilesR2_2 $outputFile $nameTask $saveOutputQsub $logPath


Alignment to the genome -- RNA 


## Index Bam File

Input Files
```bash
"""
inputFile : path 
    Path to file that contain the BAM File
outputFile : path 
    Path where output index will be store
nameTask : string
    qsub name task 
saveOutputQsub : path
    Path to save qsub output 
logPath : path
    Path to save log output 
"""
```
Output File
```bash
"""
    bai index will be store in $outputFile
"""
```

In [3]:
%%bash

echo 'Index For Bam File RNA'
inputFile='.../RNA_Seq_Data/Data_Preparation/AlignmentReadsGenome/RNA/RNA_Aligned.sortedByCoord.out.bam'
outputFile='.../RNA_Seq_Data/Data_Preparation/AlignmentReadsGenome/RNA/RNA_Aligned.sortedByCoord.out.bam.bai'
nameTask='IndexForBAMFile_RNA'
saveOutputQsub='.../qsub_outputs/'
logPath='.../logs/'

sh ../../../Scripts/1_Data_Preparation/RNA_seq/Alignment/creationIndexBamFile.sh $inputFile $outputFile $nameTask $saveOutputQsub $logPath


Index For Bam File RNA
