# Module 3: Genome Alignment

This notebook is based on the Cancer Analysis course by bioinformatics.ca (https://bioinformaticsdotca.github.io/CAN_2021).


In this workshop, we will present the main steps that are commonly used to process and to analyze cancer sequencing data. We will focus only on whole genome data and provide command lines that allow creating high quality alignment files usable for variant detection. This workshop will show you how to launch individual the first steps of a complete DNA-Seq SNV pipeline using to analyze cancer data

We will be working on a CageKid sample pair, patient C0098. The CageKid project is part of ICGC and is focused on renal cancer in many of it’s forms. The raw data can be found on EGA and calls, RNA and DNA, can be found on the ICGC portal.

For practical reasons we subsampled the reads from the sample because running the whole dataset would take way too much time and resources.





## Mount Google drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Install software

In [None]:
# install conda
!wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
!chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
!bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

!rm Miniconda3-py37_4.8.2-Linux-x86_64.sh

# install BWA
!conda install -y -c bioconda bwa

# install fastQC
!conda install -y -c bioconda fastqc

# install trimmomatic
!conda install -y -c bioconda trimmomatic

# install samtools
!conda install -y -c bioconda samtools

# install GATK 3.8
!conda install -y -c bioconda gatk

# install GATK 4
!conda install -y -c bioconda gatk4

--2021-12-27 19:50:44--  https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 85055499 (81M) [application/x-sh]
Saving to: ‘Miniconda3-py37_4.8.2-Linux-x86_64.sh’


2021-12-27 19:50:45 (178 MB/s) - ‘Miniconda3-py37_4.8.2-Linux-x86_64.sh’ saved [85055499/85055499]

PREFIX=/usr/local
Unpacking payload ...
Collecting package metadata (current_repodata.json): - \ | done
Solving environment: - \ | done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - asn1crypto==1.3.0=py37_0
    - ca-certificates==2020.1.1=0
    - certifi==2019.11.28=py37_0
    - cffi==1.14.0=py37h2e261b9_0
    - chardet==3.0.4=py37_1003
    - conda-package-handling==1.6.0=py37h7b6447

## Get data

In [None]:
%%bash
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment
wget --no-check-certificate https://hpc4health.ca/cbw/2021/CAN/Module3.tar
tar -xvf Module3.tar

## Quality control
The first thing we want to do after receiving sequencing data is assessing the quality of the data

In [None]:
%%bash
# Generate orgiginal QC
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment
mkdir -p originalQC/

cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/raw_reads/normal/run62DVGAAXX_1
fastqc -o /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/originalQC  -t 2 *.fastq.gz

Process is interrupted.


Per base and per tile sequence quality deteriorates in later amplification rounds. 

## Adapter trimming
Note about illuminaclip specifications
* seedMismatches: specifies the maximum mismatch count which will still allow a full match to be performed
* palindromeClipThreshold: specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment.
* simpleClipThreshold: specifies how accurate the match between any adapter etc. sequence must be against a read.

In [None]:
%%bash
# Trim and convert data
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

for file in raw_reads/*/run*_?/*.pair1.fastq.gz;
do
  FNAME=`basename $file`;
  DIR=`dirname $file`;
  OUTPUT_DIR=`echo $DIR | sed 's/raw_reads/reads/g'`;

  mkdir -p $OUTPUT_DIR;
  trimmomatic PE -threads 2 -phred64 \
  $file \
  ${file%.pair1.fastq.gz}.pair2.fastq.gz \
  ${OUTPUT_DIR}/${FNAME%.64.pair1.fastq.gz}.t30l50.pair1.fastq.gz \
  ${OUTPUT_DIR}/${FNAME%.64.pair1.fastq.gz}.t30l50.single1.fastq.gz \
  ${OUTPUT_DIR}/${FNAME%.64.pair1.fastq.gz}.t30l50.pair2.fastq.gz \
  ${OUTPUT_DIR}/${FNAME%.64.pair1.fastq.gz}.t30l50.single2.fastq.gz \
  TOPHRED33 ILLUMINACLIP:../../../adapters.fa:2:30:15 TRAILING:30 MINLEN:50 \
  2> ${OUTPUT_DIR}/${FNAME%.64.pair1.fastq.gz}.trim.out ;
  
done

cat reads/normal/run62DVGAAXX_1/normal.trim.out

## Quality control after adapter trimming

In [None]:
%%bash
# Generate trimmed QC
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment
mkdir -p trimmedQC/

cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/reads/normal/run62DVGAAXX_1
fastqc -o /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/trimmedQC  -t 2 *.fastq.gz

## Alignment

In [None]:
%%bash
# Download genome GrCh37
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/genome
wget http://igenomes.illumina.com.s3-website-us-east-1.amazonaws.com/Homo_sapiens/Ensembl/GRCh37/Homo_sapiens_Ensembl_GRCh37.tar.gz

# Unzip genome
tar -xvf Homo_sapiens_Ensembl_GRCh37.tar.gz

# Remove zipped file
rm Homo_sapiens_Ensembl_GRCh37.tar.gz

Note: A header is defined during the alignment to group reads by lanes and runs, thus facilitating downstream processing

In [None]:
%%bash
# Align data
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

for file in reads/*/run*/*.pair1.fastq.gz;
do
  FNAME=`basename $file`;
  DIR=`dirname $file`;
  OUTPUT_DIR=`echo $DIR | sed 's/reads/alignment/g'`;
  SNAME=`echo $file | sed 's/reads\/\([^/]\+\)\/.*/\1/g'`;
  RUNID=`echo $file | sed 's/.*\/run\([^_]\+\)_.*/\1/g'`;
  LANE=`echo $file | sed 's/.*\/run[^_]\+_\(.\).*/\1/g'`;

  mkdir -p $OUTPUT_DIR;

  bwa mem -M -t 2 \
-R "@RG\\tID:${SNAME}_${RUNID}_${LANE}\\tSM:${SNAME}\\t\
LB:${SNAME}_${RUNID}\\tPU:${RUNID}_${LANE}\\tCN:Centre National de Genotypage\\tPL:ILLUMINA" \
    /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/genome/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa \
    $file \
    ${file%.pair1.fastq.gz}.pair2.fastq.gz \
    | samtools sort -O bam -o ${OUTPUT_DIR}/${FNAME}.sorted.bam /dev/stdin
    
    samtools index ${OUTPUT_DIR}/$FNAME.sorted.bam
    
done

In [None]:
%%bash
# merge lanes
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/alignment

samtools merge normal/normal.sorted.bam \
normal/run62DPDAAXX_8/*.bam \
normal/run62DVGAAXX_1/*.bam \
normal/run62MK3AAXX_5/*.bam \
normal/runA81DF6ABXX_1/*.bam \
normal/runA81DF6ABXX_2/*.bam \
normal/runBC04D4ACXX_2/*.bam \
normal/runBC04D4ACXX_3/*.bam \
normal/runBD06UFACXX_4/*.bam \
normal/runBD06UFACXX_5/*.bam

samtools index normal/normal.sorted.bam

samtools merge tumor/tumor.sorted.bam \
tumor/run62DU0AAXX_8/*.bam \
tumor/run62DUUAAXX_8/*.bam \
tumor/run62DVMAAXX_4/*.bam \
tumor/run62DVMAAXX_6/*.bam \
tumor/run62DVMAAXX_8/*.bam \
tumor/run62JREAAXX_4/*.bam \
tumor/run62JREAAXX_6/*.bam \
tumor/run62JREAAXX_8/*.bam \
tumor/runAC0756ACXX_5/*.bam \
tumor/runBD08K8ACXX_1/*.bam \
tumor/run62DU6AAXX_8/*.bam \
tumor/run62DUYAAXX_7/*.bam \
tumor/run62DVMAAXX_5/*.bam \
tumor/run62DVMAAXX_7/*.bam \
tumor/run62JREAAXX_3/*.bam \
tumor/run62JREAAXX_5/*.bam \
tumor/run62JREAAXX_7/*.bam \
tumor/runAC0756ACXX_4/*.bam \
tumor/runAD08C1ACXX_1/*.bam

samtools index tumor/tumor.sorted.bam

In [None]:
%%bash
# check merged bam files
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/alignment/normal/

samtools view -H *bam | grep "^@RG"

## BAM exploration

In [None]:
%%bash
# check first four reads
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/alignment/normal/

samtools view normal.sorted.bam | head -n4

HISEQ7_0068:2:1101:11634:118944#AATATA	145	1	1379054	0	19S62M	9	130421305	0	TTTTTTTTTTTTTGGGTTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGAGACAGG	DD><0-DDDDBA25<8+2&50&00&00&8@7@<5&8<5&9B93BDDB7-BDDDB@DCFJJJJJJJJIJHHHHHFFFDFCCC	NM:i:0	MD:Z:62	MC:Z:101M	AS:i:62	XS:i:62	RG:Z:normal_BC04D4ACXX_2
HISEQ9_0205:4:1207:20646:145063#GGGGAC	419	1	2367820	19	7H40M3D27M21H	9	130483477	0	CCTTCCTCCCTCCTCTCTTCCCCCTTCCTCCCTCCTCCTTTCCTTTCTCCCTCCTCCCTTCCTCCCT	FFDFFHGBEGHIIJIIJJJJJBDE>FGBFCGFICEHFHHGGECFB@CFIIHCHHGHIIIEH@CHC17	NM:i:8	MD:Z:13A1C0T4T13A4^TCC27	MC:Z:101M	AS:i:33	XS:i:23	SA:Z:9,130483111,+,54S41M,39,0;	RG:Z:normal_BD06UFACXX_4
HISEQ9_0205:5:1105:14942:198874#ACTGGG	81	1	2616285	10	100M	9	130408920	0	GAGCATCTGACAGCCTGGAGCAGCACCCACACCCCAGGTGAGCATCTGACAGCCTGGAACAGCACCCTGCACACCCAGGTGAGCATCCGACAGCCTGGAG	@AADC>@4@?A<?DA:CCCACA<;DDD@;B<FFFHHGGHIJJJJIIIJIJJIIFIJJJIIHGD;IIHCGHHF?IJIGIH@JJJJJJJHHHHHFFFFFCCC	NM:i:0	MD:Z:100	MC:Z:94M	AS:i:100	XS:i:95	XA:Z:1,-2621627,100M,1;	RG:Z:normal_BD06U

In [None]:
%%bash
# check number of unaligned reads
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/alignment/normal/

samtools view -c -f4 normal.sorted.bam

28


## Cleaning up alignments

In [None]:
%%bash
# find indels and snp dense regions
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

java -jar /usr/local/opt/gatk-3.8/GenomeAnalysisTK.jar \
  -T RealignerTargetCreator \
  -R genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
  -o alignment/normal/realign.intervals \
  -I alignment/normal/normal.sorted.bam \
  -I alignment/tumor/tumor.sorted.bam \
  -L 9

In [None]:
%%bash
# number of regions that need to be realigned
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

wc -l alignment/normal/realign.intervals #simple word count command

497 alignment/normal/realign.intervals


In [None]:
%%bash
# realign around identified regions
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

java -jar /usr/local/opt/gatk-3.8/GenomeAnalysisTK.jar \
  -T IndelRealigner \
  -R genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
  -targetIntervals alignment/normal/realign.intervals \
  --nWayOut .realigned.bam \
  -I alignment/normal/normal.sorted.bam \
  -I alignment/tumor/tumor.sorted.bam

mv normal.sorted.realigned.ba* alignment/normal/
mv tumor.sorted.realigned.ba* alignment/tumor/

## Remove duplicate reads
Likely PCR duplicates

In [None]:
%%bash
# mark duplicates
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

gatk MarkDuplicates \
  --REMOVE_DUPLICATES false --CREATE_INDEX true \
  -I alignment/normal/normal.sorted.realigned.bam \
  -O alignment/normal/normal.sorted.dup.bam \
  --METRICS_FILE alignment/normal/normal.sorted.dup.metrics

gatk MarkDuplicates \
  --REMOVE_DUPLICATES false --CREATE_INDEX true \
  -I alignment/tumor/tumor.sorted.realigned.bam \
  -O alignment/tumor/tumor.sorted.dup.bam \
  --METRICS_FILE alignment/tumor/tumor.sorted.dup.metrics

In [None]:
%%bash
# investigate duplicate metrics
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

less -S alignment/normal/normal.sorted.dup.metrics
less -S alignment/tumor/tumor.sorted.dup.metrics

Note that up to 12% of fragments are duplicated in one of the tumor libraries.

## Base quality recalibration
To account for biases introduced by different sequencing technologies or vendors

In [None]:
%%bash
# download dbSNP annotation
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/genome/Homo_sapiens/Ensembl/GRCh37/Annotation/Variation
wget https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz



In [None]:
%%bash
# index dbSNP annotation file
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/genome/Homo_sapiens/Ensembl/GRCh37/Annotation/Variation

# generate index for feature file
gatk IndexFeatureFile \
  -I 00-All.vcf.gz

In [None]:
%%bash
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

for i in normal tumor
do
# calculate covariates for recalibration
  gatk BaseRecalibrator \
    -R genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
    --known-sites genome/Homo_sapiens/Ensembl/GRCh37/Annotation/Variation/00-All.vcf.gz \
    -L 9:130215000-130636000 \
    -O alignment/${i}/${i}.sorted.dup.recalibration_report.grp \
    -I alignment/${i}/${i}.sorted.dup.bam

# apply base quality score recalibration 
  gatk ApplyBQSR \
    -R genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
    -bqsr alignment/${i}/${i}.sorted.dup.recalibration_report.grp \
    -O alignment/${i}/${i}.sorted.dup.recal.bam \
    -I alignment/${i}/${i}.sorted.dup.bam
done

## Extract BAM metrics
To double-check quality of our data

### Compute coverage

In [None]:
%%bash
# compute coverage
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

for i in normal tumor
do
  java -jar /usr/local/opt/gatk-3.8/GenomeAnalysisTK.jar \
    -T DepthOfCoverage \
    --omitDepthOutputAtEachBase \
    --summaryCoverageThreshold 10 \
    --summaryCoverageThreshold 25 \
    --summaryCoverageThreshold 50 \
    --summaryCoverageThreshold 100 \
    --start 1 --stop 500 --nBins 499 -dt NONE \
    -R genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
    -o alignment/${i}/${i}.sorted.dup.recal.coverage \
    -I alignment/${i}/${i}.sorted.dup.recal.bam \
    -L 9:130215000-130636000 
done

In [None]:
%%bash
# check coverage
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

cat alignment/normal/normal.sorted.dup.recal.coverage.sample_interval_summary
cat alignment/tumor/tumor.sorted.dup.recal.coverage.sample_interval_summary

Target	total_coverage	average_coverage	normal_total_cvg	normal_mean_cvg	normal_granular_Q1	normal_granular_median	normal_granular_Q3	normal_%_above_10	normal_%_above_25	normal_%_above_50	normal_%_above_100
9:130215000-130636000	18531042	44.02	18531042	44.02	37	46	54	98.6	92.7	34.2	0.0
Target	total_coverage	average_coverage	tumor_total_cvg	tumor_mean_cvg	tumor_granular_Q1	tumor_granular_median	tumor_granular_Q3	tumor_%_above_10	tumor_%_above_25	tumor_%_above_50	tumor_%_above_100
9:130215000-130636000	23534037	55.90	23534037	55.90	45	57	69	98.8	95.6	63.9	0.9


### Check library insert size

In [None]:
%%bash
# calculate insert size
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

for i in normal tumor
do
  gatk CollectInsertSizeMetrics \
    -R genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
    -I alignment/${i}/${i}.sorted.dup.recal.bam \
    -O alignment/${i}/${i}.sorted.dup.recal.metric.insertSize.tsv \
    -H alignment/${i}/${i}.sorted.dup.recal.metric.insertSize.histo.pdf \
    --METRIC_ACCUMULATION_LEVEL LIBRARY
done

In [None]:
%%bash
# check insert size
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

head -20 alignment/normal/normal.sorted.dup.recal.metric.insertSize.tsv
head -20 alignment/tumor/tumor.sorted.dup.recal.metric.insertSize.tsv

### Check alignment metric

In [None]:
%%bash
# compute alignment metrics
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

for i in normal tumor
do
  gatk  CollectAlignmentSummaryMetrics \
    -R genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa \
    -I alignment/${i}/${i}.sorted.dup.recal.bam \
    -O alignment/${i}/${i}.sorted.dup.recal.metric.alignment.tsv \
    --METRIC_ACCUMULATION_LEVEL LIBRARY
done

Tool returned:
0
Tool returned:
0


16:59:50.868 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.2.4.0-0/gatk-package-4.2.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Dec 23 16:59:50 UTC 2021] CollectAlignmentSummaryMetrics --METRIC_ACCUMULATION_LEVEL LIBRARY --METRIC_ACCUMULATION_LEVEL ALL_READS --INPUT alignment/normal/normal.sorted.dup.recal.bam --OUTPUT alignment/normal/normal.sorted.dup.recal.metric.alignment.tsv --REFERENCE_SEQUENCE genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa --MAX_INSERT_SIZE 100000 --EXPECTED_PAIR_ORIENTATIONS FR --ADAPTER_SEQUENCE AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT --ADAPTER_SEQUENCE AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG --ADAPTER_SEQUENCE AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT --ADAPTER_SEQUENCE AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG --ADAPTER_SEQUENCE AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT --ADAPTER_SEQUENCE AGATCGGAAG

In [None]:
%%bash
# check alignment metrics
cd /content/drive/MyDrive/Work/13_CancerAnalysisCourse/labs/Module3_GenomeAlignment/

less -S alignment/normal/normal.sorted.dup.recal.metric.alignment.tsv
less -S alignment/tumor/tumor.sorted.dup.recal.metric.alignment.tsv

## htsjdk.samtools.metrics.StringHeader
# CollectAlignmentSummaryMetrics --METRIC_ACCUMULATION_LEVEL LIBRARY --METRIC_ACCUMULATION_LEVEL ALL_READS --INPUT alignment/normal/normal.sorted.dup.recal.bam --OUTPUT alignment/normal/normal.sorted.dup.recal.metric.alignment.tsv --REFERENCE_SEQUENCE genome/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa --MAX_INSERT_SIZE 100000 --EXPECTED_PAIR_ORIENTATIONS FR --ADAPTER_SEQUENCE AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT --ADAPTER_SEQUENCE AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG --ADAPTER_SEQUENCE AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT --ADAPTER_SEQUENCE AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG --ADAPTER_SEQUENCE AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT --ADAPTER_SEQUENCE AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG --IS_BISULFITE_SEQUENCED false --COLLECT_ALIGNMENT_INFORMATION true --ASSUME_SORTED true --STOP_AFTER 0 --VERBOSITY INFO --QU