<a href="https://colab.research.google.com/github/marcexpositg/CRISPRed/blob/master/01.DescriptiveAnalysis/1.1.SequencingDataProcessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1.1. Sequencing data processing

This notebook contains the scripts used to prepare sequencing data (2.1. Trimming) and the script used to align the samples to the reference genome (2.2. Genomic alignment). None of them can be executed here because they were used in the remote server.

## 1.1. Trimming

Trimming was performed on all sequencing samples. The Trimming process uses Trimmomatic to delete the reads with low phred quality score.

```
[mexposit@mr-login Muscle-editing_library]$ cat trimming.sh
#!/bin/bash

#SBATCH --partition=normal
#SBATCH --nodes=1
#SBATCH --time=10:00:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=###
#SBATCH --mem=60000

#SBATCH -e stderr_filt_%j.err
#SBATCH -o stdout_filt_%j.out


### LOAD INITIAL MODULES ###

module load Trimmomatic
module load PEAR
module load BLAT

### SAVE VARIABLES ###
prefix=$1

raw=${prefix}*.fastq
R1=(${prefix}*R1_001.fastq)
R2=(${prefix}*R2_001.fastq)
sample=(`ls ${R1[@]} | rev | cut -d'/' -f 1 | rev | cut -d'_' -f 1`)

### PIPELINE ###

if [ ! -d "Trimmed" ]; then
       mkdir Trimmed
fi
for i in "${!R1[@]}"; do
       java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.36.jar PE -phred33 ${R1[$i]} ${R2[$i]} Trimmed/${sample[$i]}_R1_qfilt.fastq Trimmed/${sample[$i]}_R1_qfilt_unpaired.fastq Trimmed/${sample[$i]}_R2_qfilt.fastq Trimmed/${sample[$i]}_R2_qfilt_unpaired.fastq LEADING:3 TRAILING:3 MINLEN:36
done

```

## 1.2. Genomic alignment

Alignment with the reference genome C3H was only performed for the shotgun sequencing samples. The script uses BWA to index the reference genome and align each sample to it (using both the R1 and R2 reads). Then, it uses SAMtools to convert the alignment into binary format and index it, so that it can be easily visualized using a genome visualizer like IGV.

```
### LOAD INITIAL MODULES ###
module load BWA/0.7.17-foss-2016b
module load SAMtools/1.9-HTSlib-1.8-foss-2016b

### SAVE VARIABLES ###
prefix=$1

raw=${prefix}*.fastq
R1=(${prefix}*R1_qfilt.fastq)
R2=(${prefix}*R2_qfilt.fastq)
sample=(`ls ${R1[@]} | rev | cut -d'/' -f 1 | rev | cut -d'_' -f 1`)

### PIPELINE ###

# Indexing genome and mapping reads
bwa index GCA_001632575.1_C3H_HeJ_v1.fa
for i in "${!R1[@]}"; do
        bwa mem -M -t 16 GCA_001632575.1_C3H_HeJ_v1.fa Trimmed/${sample[$i]}_R1_qfilt.fastq Trimmed/${sample[$i]}_R2_qfilt.fastq > Alignments_C3H/${sample[$i]}_C3H_aln.sam
        echo ${sample[$i]}_R1_qfilt.fastq
        samtools view -Sb Alignments_C3H/${sample[$i]}_C3H_aln.sam > Alignments_C3H/${sample[$i]}_C3H_aln.bam
        samtools sort -o Alignments_C3H/${sample[$i]}_C3H_aln.sorted.bam Alignments_C3H/${sample[$i]}_C3H_aln.bam
        samtools index Alignments_C3H/${sample[$i]}_C3H_aln.sorted.bam
done


#### Command:
#### sbatch mapping_C3H.sh Trimmed/[1-5]
```

