# Mapping reads after quality control

## 1. Mapping to human reference genome
<br>
Download human reference genome GRCh37: https://www.ncbi.nlm.nih.gov/grc/human/data?asm=GRCh37.p5, merging chromosome sequences and saving them as hs_ref_GRCh37.p5.fa
<br>
<br>
<strong> Index human reference genome:</strong>

In [None]:
%%bash
bowtie2-build hs_ref_GRCh37.p5.fa hsGRCh37

Bowtie index created 6 files:<br>
hsGRCh37.1.bt2<br>
hsGRCh37.2.bt2<br>
hsGRCh37.3.bt2<br>
hsGRCh37.4.bt2<br>
hsGRCh37.rev.1.bt2<br>
hsGRCh37.rev.2.bt2<br>

<strong> Mapping and sorting .bam file</strong>

In [None]:
%%bash
bowtie2 --local --no-contain -x hsGRCh37 -1 sample_1_QC2.fq.gz -2 sample_2_QC2.fq.gz -S sample_hs.sam
samtools view -bS -f 2 sample_hs.sam  | samtools sort - sample_hs.sorted

Output is sample_hs.sorted.bam
<br>
We extracted the mapped paired-end reads having mapping quality score more than 25 

In [None]:
%%bash
samtools view -h -q 25 -b sample_hs.sorted.bam -o sample_hs.sorted.mapQ25.bam

The length of mapped sequence fragments was observed, making the cumulative distribution plot:

In [None]:
%%bash
samtools view sample_hs.sorted.mapQ25.bam | awk '$9 > 0 {print $9}' - > sample_hs_fragment_length

In [16]:
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [24]:
%%R
install.packages("ggplot2")
library(ggplot2)
library(reshape2)
library(dplyr)
CDF_plot <- function(filename){
  bitmap(paste0("/Users/hien/OneDrive - National University of Ireland, Galway/Documents/R/indu/cumulative_freq_plot/",filename,".tiff"), res = 600)
  data <- read.table(paste0("/Users/hien/OneDrive - National University of Ireland, Galway/Documents/R/indu/box_plot/box_plot_sample_2A/test_",filename), header = TRUE)
  colnames(data) <- c("fragment_length")
  samples <- rep(filename,nrow(data))
  data <- cbind(samples,data)
  new_data <- data %>% filter(fragment_length %in% (90:220))
  img <- ggplot(new_data, aes(x = new_data$fragment_length),linetype=3) +
    stat_ecdf() +
    theme_bw() +
    xlab("mapped fragment length") +
    scale_x_continuous(breaks=c(0,90,100,110,120,130,140,150,160,170,180,190,200,210,220))
  print(img)
  invisible(dev.off())
}

In [25]:
%%R
CDF_plot("2A_fragment_length")

## 2. Mapping to 601.0N0 
CTGGAGAATCCCGGTGCCGAGGCCGCTCAATTGGTCGTAGACAGCTCTAGCACCGCTTAAACGCACGTACGCGCTGTCCCCCGCGTTTTAACCGCCAAGGGGATTACTCCCTAGTCTCCAGGCACGTGTCAGATATATACATCCTGT
<br>
<br>
<strong>Index 601.0N0:</strong>

In [None]:
%%bash
bowtie2-build 601.0N0.fa 601.0N0

## 3. Mapping to 601.2.0W0 
CTGCAGAAGCTTGGTCCCGGGGCCGCTCAATTGGTCGTAGCAAGCTCTAGATCCGCTTAATCGAACGTACGCGCTGTCCCCCGCGTTTTAACCGCCAAGGGGATTACTCCCTAGTCTCCAGGCACGTGTCAGATATATACATCCTGT
<br>
<br>
<strong>Index 601.2.0W0:</strong>

In [None]:
%%bash
bowtie2-build 601.2.0W0.fa 601.2.0W0

## 4. Mapping to mmtvNucA
ACTTGCAACAGTCCTAACATTCACCTCTTGTGTGTTTGTGTCTGTTCGCCATCCCGTCTCCGCTCGTCACTTATCCTTCACTTTCCAGAGGGTCCCCCCGCAGACCCCGGCGACCCTCAGGTCGGCCGACTGCGGCACAGTTTTTTG 
<br>
<br>
<strong>Index mmtvNucA:</strong>

In [None]:
%%bash
bowtie2-build mmtvNucA.fa mmtvNucA