# Counting Reads

In this notebook, I'll count the number of reads in both untrimmed and trimmed *C. virgincia* gonad sequence data from Illumina.

1. Untrimmed files
2. Trimmed files

## 0. Prepare for analyses

### 0a. Set working directory

In [1]:
pwd

'/Users/yaamini/Documents/paper-gonad-meth/code'

In [2]:
cd ../analyses/

/Users/yaamini/Documents/paper-gonad-meth/analyses


In [5]:
!mkdir 02-Counting-Reads

In [6]:
cd 02-Counting-Reads/

/Users/yaamini/Documents/yaamini-virginica/data/2019-03-17-Counting-Reads


## 1. Untrimmed files

The untrimmed files  have FastQC reports I can use to get read counts, instead of downloading the whole file. The link to these files can be found in the Nightingales spreadsheet.

In [67]:
#Create a new directory for downloading files and saving read counts
!mkdir Untrimmed-Reads

In [68]:
cd Untrimmed-Reads/

/Users/yaamini/Documents/yaamini-virginica/data/2019-03-17-Counting-Reads/2019-03-17-Untrimmed-Reads


### 1a. Download files

In [87]:
#Download files from owl. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A_s1_R1_fastqc.zip -A_s1_R2_fastqc.zip \
http://owl.fish.washington.edu/Athaliana/20180409_fastqc_Cvirginica_MBD/

--2019-03-18 14:16:10--  http://owl.fish.washington.edu/Athaliana/20180409_fastqc_Cvirginica_MBD/
Resolving owl.fish.washington.edu... 128.95.149.83
Connecting to owl.fish.washington.edu|128.95.149.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'owl.fish.washington.edu/Athaliana/20180409_fastqc_Cvirginica_MBD/index.html'

owl.fish.washington     [ <=>                ]  10.27K  --.-KB/s    in 0.02s   

2019-03-18 14:16:10 (627 KB/s) - 'owl.fish.washington.edu/Athaliana/20180409_fastqc_Cvirginica_MBD/index.html' saved [10512]

Loading robots.txt; please ignore errors.
--2019-03-18 14:16:10--  http://owl.fish.washington.edu/robots.txt
Reusing existing connection to owl.fish.washington.edu:80.
HTTP request sent, awaiting response... 404 Not Found
2019-03-18 14:16:23 ERROR 404: Not Found.

Removing owl.fish.washington.edu/Athaliana/20180409_fastqc_Cvirginica_MBD/index.html since it should be rejected.

--2019-03-18 14:16:23--  

In [88]:
#Move all files from owl folder to the current directory
!mv owl.fish.washington.edu/Athaliana/20180409_fastqc_Cvirginica_MBD/* .

In [89]:
#Confirm all files were moved
!ls

[34mmultiqc_data[m[m               zr2096_4_s1_R2_fastqc.zip
[34mowl.fish.washington.edu[m[m    zr2096_5_s1_R1_fastqc.zip
zr2096_10_s1_R1_fastqc.zip zr2096_5_s1_R2_fastqc.zip
zr2096_10_s1_R2_fastqc.zip zr2096_6_s1_R1_fastqc.zip
zr2096_1_s1_R1_fastqc.zip  zr2096_6_s1_R2_fastqc.zip
zr2096_1_s1_R2_fastqc.zip  zr2096_7_s1_R1_fastqc.zip
zr2096_2_s1_R1_fastqc.zip  zr2096_7_s1_R2_fastqc.zip
zr2096_2_s1_R2_fastqc.zip  zr2096_8_s1_R1_fastqc.zip
zr2096_3_s1_R1_fastqc.zip  zr2096_8_s1_R2_fastqc.zip
zr2096_3_s1_R2_fastqc.zip  zr2096_9_s1_R1_fastqc.zip
zr2096_4_s1_R1_fastqc.zip  zr2096_9_s1_R2_fastqc.zip


In [90]:
#Remove the empty owl directory
!rm -r owl.fish.washington.edu

### 1b. Count reads

First, I'll test a loop and ensure it identifies all of the  files I want to use by having the loop print the filename of each file (`f`):

In [91]:
%%bash
for f in *zip
do
    echo ${f}
done

zr2096_10_s1_R1_fastqc.zip
zr2096_10_s1_R2_fastqc.zip
zr2096_1_s1_R1_fastqc.zip
zr2096_1_s1_R2_fastqc.zip
zr2096_2_s1_R1_fastqc.zip
zr2096_2_s1_R2_fastqc.zip
zr2096_3_s1_R1_fastqc.zip
zr2096_3_s1_R2_fastqc.zip
zr2096_4_s1_R1_fastqc.zip
zr2096_4_s1_R2_fastqc.zip
zr2096_5_s1_R1_fastqc.zip
zr2096_5_s1_R2_fastqc.zip
zr2096_6_s1_R1_fastqc.zip
zr2096_6_s1_R2_fastqc.zip
zr2096_7_s1_R1_fastqc.zip
zr2096_7_s1_R2_fastqc.zip
zr2096_8_s1_R1_fastqc.zip
zr2096_8_s1_R2_fastqc.zip
zr2096_9_s1_R1_fastqc.zip
zr2096_9_s1_R2_fastqc.zip


Now that I know it works, I'm going to count the number of reads in each file. I will first unzip each file with `unzip`.

In [92]:
%%bash
for f in *zip
do
    unzip ${f}
done

Archive:  zr2096_10_s1_R1_fastqc.zip
   creating: zr2096_10_s1_R1_fastqc/
   creating: zr2096_10_s1_R1_fastqc/Icons/
   creating: zr2096_10_s1_R1_fastqc/Images/
  inflating: zr2096_10_s1_R1_fastqc/Icons/fastqc_icon.png  
  inflating: zr2096_10_s1_R1_fastqc/Icons/error.png  
  inflating: zr2096_10_s1_R1_fastqc/Icons/tick.png  
  inflating: zr2096_10_s1_R1_fastqc/summary.txt  
  inflating: zr2096_10_s1_R1_fastqc/Images/per_base_quality.png  
  inflating: zr2096_10_s1_R1_fastqc/Images/per_tile_quality.png  
  inflating: zr2096_10_s1_R1_fastqc/Images/per_sequence_quality.png  
  inflating: zr2096_10_s1_R1_fastqc/Images/per_base_sequence_content.png  
  inflating: zr2096_10_s1_R1_fastqc/Images/per_sequence_gc_content.png  
  inflating: zr2096_10_s1_R1_fastqc/Images/per_base_n_content.png  
  inflating: zr2096_10_s1_R1_fastqc/Images/sequence_length_distribution.png  
  inflating: zr2096_10_s1_R1_fastqc/Images/duplication_levels.png  
  inflating: zr2096_10_s1_R1_fastqc/Images/adapter_content

In [93]:
#Confirm files were unzipped
!ls

[34mmultiqc_data[m[m               [34mzr2096_5_s1_R1_fastqc[m[m
[34mzr2096_10_s1_R1_fastqc[m[m     zr2096_5_s1_R1_fastqc.zip
zr2096_10_s1_R1_fastqc.zip [34mzr2096_5_s1_R2_fastqc[m[m
[34mzr2096_10_s1_R2_fastqc[m[m     zr2096_5_s1_R2_fastqc.zip
zr2096_10_s1_R2_fastqc.zip [34mzr2096_6_s1_R1_fastqc[m[m
[34mzr2096_1_s1_R1_fastqc[m[m      zr2096_6_s1_R1_fastqc.zip
zr2096_1_s1_R1_fastqc.zip  [34mzr2096_6_s1_R2_fastqc[m[m
[34mzr2096_1_s1_R2_fastqc[m[m      zr2096_6_s1_R2_fastqc.zip
zr2096_1_s1_R2_fastqc.zip  [34mzr2096_7_s1_R1_fastqc[m[m
[34mzr2096_2_s1_R1_fastqc[m[m      zr2096_7_s1_R1_fastqc.zip
zr2096_2_s1_R1_fastqc.zip  [34mzr2096_7_s1_R2_fastqc[m[m
[34mzr2096_2_s1_R2_fastqc[m[m      zr2096_7_s1_R2_fastqc.zip
zr2096_2_s1_R2_fastqc.zip  [34mzr2096_8_s1_R1_fastqc[m[m
[34mzr2096_3_s1_R1_fastqc[m[m      zr2096_8_s1_R1_fastqc.zip
zr2096_3_s1_R1_fastqc.zip  [34mzr2096_8_s1_R2_fastqc[m[m
[34mzr2096_3_s1_R2_fastqc[m[m      zr2096_8

Now, I'll use `grep` to identify  "Total Sequences" within each sample file. Using `>>`, I can concatenate the results each time the loop runs, then save the entire output in a new file.

In [94]:
%%bash
for f in *fastqc
do
    grep "Total Sequences *" ${f}/fastqc_data.txt \
    >> Untrimmed-Read-Counts.txt
done

In [95]:
#Confirm total sequences were counted. The first 2 lines correspond to sample 10.
!head Untrimmed-Read-Counts.txt

Total Sequences	17717127
Total Sequences	17717127
Total Sequences	28982766
Total Sequences	28982766
Total Sequences	30798582
Total Sequences	30798582
Total Sequences	29892002
Total Sequences	29892002
Total Sequences	24341968
Total Sequences	24341968


In [96]:
#Sum the contents of the second column ($2), then divide by 2 to obtain the total number of paired-end reads.
!cat Untrimmed-Read-Counts.txt | awk -F"\t" '{ sum+=$2 / 2} END {print sum}'

279681264


## 2. Trimmed files

Since my files were trimmed with FastQC, I can use the information from the FastQC reports to get read information for each file. In the Basic Statistics module, FastQC includes Total Sequences (i.e. Total Reads) after trimming.

### 2a. Download files

In [66]:
cd ..

/Users/yaamini/Documents/yaamini-virginica/data/2019-03-17-Counting-Reads


In [14]:
!mkdir FastQC-Reports

In [15]:
cd FastQC-Reports/

/Users/yaamini/Documents/yaamini-virginica/data/2019-03-17-Counting-Reads/2019-03-17-FastQC-Reports


In [16]:
#Download files from owl. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A_fastqc.zip \
http://owl.fish.washington.edu/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/

--2019-03-18 09:39:10--  http://owl.fish.washington.edu/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/
Resolving owl.fish.washington.edu... 128.95.149.83
Connecting to owl.fish.washington.edu|128.95.149.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'owl.fish.washington.edu/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/index.html'

owl.fish.washington     [ <=>                ]  10.61K  --.-KB/s    in 0s      

2019-03-18 09:39:10 (61.3 MB/s) - 'owl.fish.washington.edu/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/index.html' saved [10864]

Loading robots.txt; please ignore errors.
--2019-03-18 09:39:10--  http://owl.fish.washington.edu/robots.txt
Reusing existing connection to owl.fish.washington.edu:80.
HTTP request sent, awaiting response... 404 Not Found
2019-03-18 09:39:10 ERROR 404: N

In [17]:
#Move all files from owl folder to the current directory
!mv owl.fish.washington.edu/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/* .

In [18]:
#Confirm all files were moved
!ls

[34mmultiqc_data[m[m                     zr2096_4_s1_R2_val_2_fastqc.zip
[34mowl.fish.washington.edu[m[m          zr2096_5_s1_R1_val_1_fastqc.zip
zr2096_10_s1_R1_val_1_fastqc.zip zr2096_5_s1_R2_val_2_fastqc.zip
zr2096_10_s1_R2_val_2_fastqc.zip zr2096_6_s1_R1_val_1_fastqc.zip
zr2096_1_s1_R1_val_1_fastqc.zip  zr2096_6_s1_R2_val_2_fastqc.zip
zr2096_1_s1_R2_val_2_fastqc.zip  zr2096_7_s1_R1_val_1_fastqc.zip
zr2096_2_s1_R1_val_1_fastqc.zip  zr2096_7_s1_R2_val_2_fastqc.zip
zr2096_2_s1_R2_val_2_fastqc.zip  zr2096_8_s1_R1_val_1_fastqc.zip
zr2096_3_s1_R1_val_1_fastqc.zip  zr2096_8_s1_R2_val_2_fastqc.zip
zr2096_3_s1_R2_val_2_fastqc.zip  zr2096_9_s1_R1_val_1_fastqc.zip
zr2096_4_s1_R1_val_1_fastqc.zip  zr2096_9_s1_R2_val_2_fastqc.zip


In [19]:
#Remove the empty owl directory
!rm -r owl.fish.washington.edu

### 2b. Count reads

In [30]:
%%bash
for f in *zip
do
    echo ${f}
done

zr2096_10_s1_R1_val_1_fastqc.zip
zr2096_10_s1_R2_val_2_fastqc.zip
zr2096_1_s1_R1_val_1_fastqc.zip
zr2096_1_s1_R2_val_2_fastqc.zip
zr2096_2_s1_R1_val_1_fastqc.zip
zr2096_2_s1_R2_val_2_fastqc.zip
zr2096_3_s1_R1_val_1_fastqc.zip
zr2096_3_s1_R2_val_2_fastqc.zip
zr2096_4_s1_R1_val_1_fastqc.zip
zr2096_4_s1_R2_val_2_fastqc.zip
zr2096_5_s1_R1_val_1_fastqc.zip
zr2096_5_s1_R2_val_2_fastqc.zip
zr2096_6_s1_R1_val_1_fastqc.zip
zr2096_6_s1_R2_val_2_fastqc.zip
zr2096_7_s1_R1_val_1_fastqc.zip
zr2096_7_s1_R2_val_2_fastqc.zip
zr2096_8_s1_R1_val_1_fastqc.zip
zr2096_8_s1_R2_val_2_fastqc.zip
zr2096_9_s1_R1_val_1_fastqc.zip
zr2096_9_s1_R2_val_2_fastqc.zip


In [35]:
%%bash
for f in *zip
do
    unzip ${f}
done

Archive:  zr2096_10_s1_R1_val_1_fastqc.zip
   creating: zr2096_10_s1_R1_val_1_fastqc/
   creating: zr2096_10_s1_R1_val_1_fastqc/Icons/
   creating: zr2096_10_s1_R1_val_1_fastqc/Images/
  inflating: zr2096_10_s1_R1_val_1_fastqc/Icons/fastqc_icon.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Icons/error.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Icons/tick.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/summary.txt  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Images/per_base_quality.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Images/per_tile_quality.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Images/per_sequence_quality.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Images/per_base_sequence_content.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Images/per_sequence_gc_content.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Images/per_base_n_content.png  
  inflating: zr2096_10_s1_R1_val_1_fastqc/Images/sequence_length_distribution.png  
  inflating: zr2096_10_s1_R1_val_1_f

In [36]:
#Confirm files were unzipped
!ls

[34mzr2096_10_s1_R1_val_1_fastqc[m[m     [34mzr2096_5_s1_R1_val_1_fastqc[m[m
zr2096_10_s1_R1_val_1_fastqc.zip zr2096_5_s1_R1_val_1_fastqc.zip
[34mzr2096_10_s1_R2_val_2_fastqc[m[m     [34mzr2096_5_s1_R2_val_2_fastqc[m[m
zr2096_10_s1_R2_val_2_fastqc.zip zr2096_5_s1_R2_val_2_fastqc.zip
[34mzr2096_1_s1_R1_val_1_fastqc[m[m      [34mzr2096_6_s1_R1_val_1_fastqc[m[m
zr2096_1_s1_R1_val_1_fastqc.zip  zr2096_6_s1_R1_val_1_fastqc.zip
[34mzr2096_1_s1_R2_val_2_fastqc[m[m      [34mzr2096_6_s1_R2_val_2_fastqc[m[m
zr2096_1_s1_R2_val_2_fastqc.zip  zr2096_6_s1_R2_val_2_fastqc.zip
[34mzr2096_2_s1_R1_val_1_fastqc[m[m      [34mzr2096_7_s1_R1_val_1_fastqc[m[m
zr2096_2_s1_R1_val_1_fastqc.zip  zr2096_7_s1_R1_val_1_fastqc.zip
[34mzr2096_2_s1_R2_val_2_fastqc[m[m      [34mzr2096_7_s1_R2_val_2_fastqc[m[m
zr2096_2_s1_R2_val_2_fastqc.zip  zr2096_7_s1_R2_val_2_fastqc.zip
[34mzr2096_3_s1_R1_val_1_fastqc[m[m      [34mzr2096_8_s1_R1_val_1_fastqc[m[m
zr2096_3_s1_R1_v

In [60]:
%%bash
for f in *fastqc
do
    grep "Total Sequences *" ${f}/fastqc_data.txt \
    >> Trimmed-Read-Counts.txt
done

In [61]:
#Confirm total sequences were counted. The first 2 lines correspond to sample 10.
!head Trimmed-Read-Counts.txt

Total Sequences	17448883
Total Sequences	17448883
Total Sequences	28603346
Total Sequences	28603346
Total Sequences	30325606
Total Sequences	30325606
Total Sequences	29548753
Total Sequences	29548753
Total Sequences	23970516
Total Sequences	23970516


In [150]:
#Sum the contents of the second column ($2), then divide by 2 to obtain the total number of paired-end reads.
!cat Trimmed-Read-Counts.txt | awk -F"\t" '{ sum+=$2 / 2} END {print sum}'

cat: 2019-03-17-Trimmed-Read-Counts.txt: No such file or directory



## 3. Reads that mapped to genome

Finally, I need to count how many trimmed reads mapped back to the genome. I can do this by looking at `bismark` processing reports. Each processing report outlines how many paired-end reads did not map to the genome under any condition. I can extract this number and subtract it from the total trimmed paired-end reads per sample to obtain how many reads mapped back to the genome.

### 3a. Download files

In [97]:
cd ..

/Users/yaamini/Documents/yaamini-virginica/data/2019-03-17-Counting-Reads


In [98]:
!mkdir Mapped-Reads

In [99]:
cd Mapped-Reads/

/Users/yaamini/Documents/yaamini-virginica/data/2019-03-17-Counting-Reads/2019-03-17-Mapped-Reads


In [100]:
#Download files from gannet. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A_s1_R1_val_1_bismark_bt2_PE_report.txt \
http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/

--2019-03-18 14:29:45--  http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html'

gannet.fish.washing     [ <=>                ]  61.14K  --.-KB/s    in 0.001s  

2019-03-18 14:29:47 (57.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html' saved [62605]

Loading robots.txt; please ignore errors.
--2019-03-18 14:29:47--  http://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:80.
HTTP request sent, awaiting response... 404 Not Found
2019-03-18 14:29:47 ERROR 404: Not Found.

Removing gann

In [101]:
#Move all files from owl folder to the current directory
!mv gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/* .

In [102]:
#Confirm files were moved
!ls

[34m@eaDir[m[m
[34mgannet.fish.washington.edu[m[m
zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt


In [105]:
#Remove empty folders
!rm -r gannet.fish.washington.edu

### 3b. Count unmapped reads

In [106]:
%%bash
for f in *txt
do
    echo ${f}
done

zr2096_10_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_2_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_3_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_4_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_5_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_6_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_7_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_8_s1_R1_val_1_bismark_bt2_PE_report.txt
zr2096_9_s1_R1_val_1_bismark_bt2_PE_report.txt


In [111]:
#Identify what information is needed from the report
!head zr2096_1_s1_R1_val_1_bismark_bt2_PE_report.txt

Bismark report for: /gscratch/scrubbed/yaamini/data/Virginica-MBD/2018-10-17-Trimmed-Files//zr2096_1_s1_R1_val_1.fq.gz and /gscratch/scrubbed/yaamini/data/Virginica-MBD/2018-10-17-Trimmed-Files//zr2096_1_s1_R2_val_2.fq.gz (version: v0.19.0)
Bismark was run with Bowtie 2 against the bisulfite genome of /gscratch/scrubbed/yaamini/data/Virginica-MBD/2018-04-27-Bismark-Inputs/ with the specified options: -q --score-min L,0,-1.2 -p 28 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500
Option '--non_directional' specified: alignments to all strands were being performed (OT, OB, CTOT, CTOB)

Final Alignment report
Sequence pairs analysed in total:	28603346
Number of paired-end alignments with a unique best hit:	8273829
Mapping efficiency:	28.9% 
Sequence pairs with no alignments under any condition:	17321484


In [116]:
%%bash
for f in *txt
do
    grep "Sequence pairs with no alignments under any condition *" ${f} \
    >> Unmapped-Read-Counts.txt
done

In [134]:
#Confirm file was created. The first entry corresponds to sample 10.
!head Unmapped-Read-Counts.txt

Sequence pairs with no alignments under any condition:	4524985
Sequence pairs with no alignments under any condition:	17321484
Sequence pairs with no alignments under any condition:	8686774
Sequence pairs with no alignments under any condition:	7362526
Sequence pairs with no alignments under any condition:	5579694
Sequence pairs with no alignments under any condition:	8772590
Sequence pairs with no alignments under any condition:	5434045
Sequence pairs with no alignments under any condition:	8438065
Sequence pairs with no alignments under any condition:	9943053
Sequence pairs with no alignments under any condition:	9175758


In [140]:
#Sum the contents of the second column ($2) to obtain the total number of paired-end reads that did not map to the genome.
!cat Unmapped-Read-Counts.txt | awk -F"\t" '{ sum+=$2} END {print sum}'

85238974


### 3c. Calculate mapped reads

Now I need to subtract the number of reads that did not align to the total paired-end read count from trimmed files to get the number of paired end reads that mapped back to the genome.

In [148]:
275914272 - 85238974

190675298

190675298 reads mapped to the genome.