# Read Mapping
Map reads from the evolved line to ancestral reference genome

<hr >

## Current Directory Structure

In [1]:
%%bash
cd ./analysis
ls -1F

assembly/
data/
fastqc-analysis/
trimmed/


- data: Raw FASTQ files
- trimmed: Sickle trimmed FASTQ files
- fastqc-analysis: FASTQC analysis of raw and trimmed FASTQ files
- assembly: reference genome assembly from ancestral genome

<hr >

## Bowtie2: Mapping sequence reads to a reference genome
- Install using conda
- Using bowtie2-2.3.4.3 version

### Index with bowtie2
- Use bowtie2-build to index reference genome
- bowtie2-build {reference_in} {bt2_index_base}

- Creates several .bt2 binary files

In [4]:
%%bash
cd ./analysis/assembly/bowtie2
ls 

contigs.fasta
contigs.paths
scaffolds.1.bt2
scaffolds.2.bt2
scaffolds.3.bt2
scaffolds.4.bt2
scaffolds.fasta
scaffolds.paths
scaffolds.rev.1.bt2
scaffolds.rev.2.bt2


### Map pair-end reads using Bowtie2
- Map the filtered and trimmed sequencing reads of evolved line to the reference genome
- bowtie2 -X 1000 -x PATH_TO_INDEX_PREFIX -1 read1.fq.gz -2 read2.fq.gz -S aln-pe.sam
    - -X: Adjust the maximum fragment size (length of paired-end alignments + insert size) to 1000bp.  Default is at 500bp and is often too short
    - -S: Sam file output

#### Output:
- 2445370 reads; of these:
      - 2445370 (100.00%) were paired; of these:
            - 26060 (1.07%) aligned concordantly 0 times
            - 2399778 (98.14%) aligned concordantly exactly 1 time
            - 19532 (0.80%) aligned concordantly >1 times
    ----
    - 26060 pairs aligned concordantly 0 times; of these:
        - 16085 (61.72%) aligned discordantly 1 time
    ----
    - 9975 pairs aligned 0 times concordantly or discordantly; of these:
        - 19950 mates make up the pairs; of these:
            - 8456 (42.39%) aligned 0 times
            - 7628 (38.24%) aligned exactly 1 time
            - 3866 (19.38%) aligned >1 times
99.83% overall alignment rate


#### Look at the created (unsorted) SAM file:

In [3]:
%%bash
head ./analysis/mappings/bowtie2/evolved-6.sam

@HD	VN:1.0	SO:unsorted
@SQ	SN:NODE_1_length_1394677_cov_15.3771	LN:1394677
@SQ	SN:NODE_2_length_1051867_cov_15.4779	LN:1051867
@SQ	SN:NODE_3_length_950567_cov_15.4139	LN:950567
@SQ	SN:NODE_4_length_925223_cov_15.3905	LN:925223
@SQ	SN:NODE_5_length_916389_cov_15.4457	LN:916389
@SQ	SN:NODE_6_length_772252_cov_15.4454	LN:772252
@SQ	SN:NODE_7_length_506590_cov_15.6969	LN:506590
@SQ	SN:NODE_8_length_473386_cov_15.0601	LN:473386
@SQ	SN:NODE_9_length_438517_cov_15.3909	LN:438517


#### Look at the SAM file without the header:

In [4]:
%%bash
grep -v '^@' head ./analysis/mappings/bowtie2/evolved-6.sam | head

./analysis/mappings/bowtie2/evolved-6.sam:M02810:197:000000000-AV55U:1:1101:10000:11540	83	NODE_1_length_1394677_cov_15.3771	582530	42	151M	=	582252	-429	TATGGTATCACTTATGGTATCACTTATGGCTATCACTAATGGCTATCACTTATGGTATCACTTATGACTATCAGACGTTATTACTATCAGACGATAACTATCAGACTTTATTACTATCACTTTCATATTACCCACTATCATCCCTTCTTTA	FHGHHHHHGGGHHHHHHHHHHHHHHHHHHGHHHHHHHHHHHGHHHHHGHHHHHHHHGDHHHHHHHHGHHHHGHHHGHHHHHHFHHHHGHHHHIHHHHHHHHHHHHHHHHHHHGHHHHHGHGHHHHHHHHEGGGGGGGGGFBCFFFFCCCCC	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:151	YS:i:-4	YT:Z:CP
./analysis/mappings/bowtie2/evolved-6.sam:M02810:197:000000000-AV55U:1:1101:10000:11540	163	NODE_1_length_1394677_cov_15.3771	582252	42	151M	=	582530	429	ATTTCAACCAAGATCGTTTCATCAAAGATCGTTTCAAGAGACAAAGATCGTTTCAATACAAGATCATTTCAATAGAATATCGTTTCAGAAGATTGATGAATACTATGAAGAGGTCGTGATCATGGTATGTTTACCATAATGATCCACAGCA	BBBBBFFFFBBFCGGGGCFGG6DDBBFHHHHHHGHHHGHGHHHHHGFHFHFGGHHGHHHHHHHHHHHHHHHHHHHGFGFHHGHHGFHHHHHGHHHHFGGHHHHHHHHEHHGHGHHGEGHHHHHHGFGHHHHHHHHFHHHHHHHBFHHHGFH	AS:i:-4	XN:i:0	XM

grep: head: No such file or directory
