# Counting Reads

In this notebook, I'll count the number of reads in both untrimmed and trimmed *C. virgincia* gonad sequence data from Illumina.

1. Untrimmed files
2. Trimmed files

## 0. Prepare for analyses

### 0a. Set working directory

In [1]:
pwd

'/Users/yaamini/Documents/yaamini-virginica/notebooks'

In [4]:
cd ../data/

/Users/yaamini/Documents/yaamini-virginica/data


In [5]:
!mkdir 2019-03-17-Counting-Reads

In [6]:
cd 2019-03-17-Counting-Reads/

/Users/yaamini/Documents/yaamini-virginica/data/2019-03-17-Counting-Reads


## 1. Untrimmed files

### 1a. Download files

In [8]:
#Download files from owl. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A_s1_R1.fastq.gz -A_s1_R2.fastq.gz \
http://owl.fish.washington.edu/nightingales/C_virginica/

--2019-03-17 15:11:06--  http://owl.fish.washington.edu/nightingales/C_virginica/
Resolving owl.fish.washington.edu... 128.95.149.83
Connecting to owl.fish.washington.edu|128.95.149.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'owl.fish.washington.edu/nightingales/C_virginica/index.html'

owl.fish.washington     [ <=>                ]  21.97K  --.-KB/s    in 0.001s  

2019-03-17 15:11:06 (38.8 MB/s) - 'owl.fish.washington.edu/nightingales/C_virginica/index.html' saved [22501]

Loading robots.txt; please ignore errors.
--2019-03-17 15:11:06--  http://owl.fish.washington.edu/robots.txt
Reusing existing connection to owl.fish.washington.edu:80.
HTTP request sent, awaiting response... 404 Not Found
2019-03-17 15:11:06 ERROR 404: Not Found.

Removing owl.fish.washington.edu/nightingales/C_virginica/index.html since it should be rejected.

--2019-03-17 15:11:06--  http://owl.fish.washington.edu/nightingales/C_virginica/?C=N;O=

In [9]:
#Move all files from owl folder to the current directory
!mv owl.fish.washington.edu/nightingales/C_virginica/* .

In [10]:
#Confirm all files were moved
!ls

[34m@eaDir[m[m                   zr2096_3_s1_R1.fastq.gz  zr2096_7_s1_R1.fastq.gz
[34mowl.fish.washington.edu[m[m  zr2096_3_s1_R2.fastq.gz  zr2096_7_s1_R2.fastq.gz
zr2096_10_s1_R1.fastq.gz zr2096_4_s1_R1.fastq.gz  zr2096_8_s1_R1.fastq.gz
zr2096_10_s1_R2.fastq.gz zr2096_4_s1_R2.fastq.gz  zr2096_8_s1_R2.fastq.gz
zr2096_1_s1_R1.fastq.gz  zr2096_5_s1_R1.fastq.gz  zr2096_9_s1_R1.fastq.gz
zr2096_1_s1_R2.fastq.gz  zr2096_5_s1_R2.fastq.gz  zr2096_9_s1_R2.fastq.gz
zr2096_2_s1_R1.fastq.gz  zr2096_6_s1_R1.fastq.gz
zr2096_2_s1_R2.fastq.gz  zr2096_6_s1_R2.fastq.gz


In [11]:
#Remove the empty owl directory
!rm -r owl.fish.washington.edu

### 1b. Count reads

First, I'll test a loop and ensure it identifies all of the  files I want to use by having the loop print the filename of each file (`f`):

In [12]:
%%bash
for f in *.fastq.gz
do
    echo ${f}
done

zr2096_10_s1_R1.fastq.gz
zr2096_10_s1_R2.fastq.gz
zr2096_1_s1_R1.fastq.gz
zr2096_1_s1_R2.fastq.gz
zr2096_2_s1_R1.fastq.gz
zr2096_2_s1_R2.fastq.gz
zr2096_3_s1_R1.fastq.gz
zr2096_3_s1_R2.fastq.gz
zr2096_4_s1_R1.fastq.gz
zr2096_4_s1_R2.fastq.gz
zr2096_5_s1_R1.fastq.gz
zr2096_5_s1_R2.fastq.gz
zr2096_6_s1_R1.fastq.gz
zr2096_6_s1_R2.fastq.gz
zr2096_7_s1_R1.fastq.gz
zr2096_7_s1_R2.fastq.gz
zr2096_8_s1_R1.fastq.gz
zr2096_8_s1_R2.fastq.gz
zr2096_9_s1_R1.fastq.gz
zr2096_9_s1_R2.fastq.gz


Now that I know it works, I'm going to count the numbere of reads in each file. There are four lines per read in each Illumina FASTQ file, so I can count the number of lines then divide by four to get the number or reads.

In [13]:
%%bash
for f in .fastq.gz
do
    echo $(( $(gunzip -c ${f} | wc -l) / 4 ))
done

0


gunzip: can't stat: .fastq.gz (.fastq.gz.gz): No such file or directory


## 2. Trimmed files

### 2a. Download files

In [None]:
#Download files from owl. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A_s1_R1_val_1.fq.gz _s1_R2_val_2.fq.gz \
http://owl.fish.washington.edu/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/

In [None]:
#Move all files from owl folder to the current directory
!mv owl.fish.washington.edu/Athaliana/20180411_trimgalore_10bp_Cvirginica_MBD/* .

In [None]:
#Confirm all files were moved
!ls

In [None]:
#Remove the empty owl directory
!rm -r owl.fish.washington.edu

### 2b. Count reads