# Snow Crab (SC) Notebook 01

Project: Gene expression analysis of juvenile Snow crab exposed to varying pH conditions 

Description of study: All crabs were starved for 24 h.  Short-term crabs were then transferred from Ambient to pH 7.8 and 7.5 tanks and exposed for 8h.  Then all crabs were sacrificed by puncturing the carapace throught the cardiac region and preserved in RNA Later overnight at 4C prior to being transferred to -80C.  Long duration is ~12 weeks, short 8 h. All crabs sacrificed on 7/20/2021

In this notebook I will record the steps taken to process RNASeq files from raw data to gene counts that is ready for analyzing in DESeq2. All steps will be conducted on Sedna, the high computing node. Locations of files, code, and scripts will be documented in this notebook.  

### For various stats, please see the Google Spreadsheet [SC RNASeq Stats](https://docs.google.com/spreadsheets/d/14B6Y0rq71c122Zct9J1EK1H-tT3PP2Ce/edit?usp=sharing&ouid=114796369849302950348&rtpof=true&sd=true)

### For a daily summary of this work, please see the Google Doc [Laura's Red King Crab RNA-Seq Notebook](https://docs.google.com/document/d/1HzMTreqnY2BD-oyjEJRA-JECpFE4CXlEoWKkmiaTYis/edit?usp=sharing)

## Package version list 

- FastQC v0.11.9  
- multiqc, version 1.11  
- cutadapt, version 3.5  
- Bowtie2, version 2.4.2

## Step 1:  Concatenate sequence data from same individual that was collected in different lanes

There are 63 libraries (from individual crab), which were each run in 2 lanes (each crab is identified by its sample number) with Paired-end sequencing of 150nt.  Therefore, sequencing data for the same crab is split into two samples.  I needed to concatenate the data by sample#, which I did so by modifying a script written by Giles (located on Sedna: 'biodata/ggoetz/nichols/201910-redking_crab-rnaseq/scripts/concat_fastq_files.sh') into this one: [concat_fastq_files_2022-01-05.sh](https://raw.githubusercontent.com/laurahspencer/snow-crab_RNASeq-2022/main/scripts/concat_fastq_files_2022-01-05.sh). 


FYI I'm fairly new to `awk` so played around with it on my local Cygwin terminal using dummy data (located in snow-crab_RNASeq-2022/testing/).  This script and successfully concatenated my dummy data into files named "sample_6_R1.fastq": 

```
FOLDERS=$(ls -1d 5* | awk -F "/" '{ print $NF }')

for folder in ${FOLDERS}
do
    echo ${folder}
    SUB_FOLDER=${folder}
    SAMPLES=$(ls ${SUB_FOLDER}/5*_R1_*.fastq.gz | \
        awk -F "/" '{ print $NF }' | \
        awk -F "_" '{ print $2 }')

    for sample in ${SAMPLES}
    do
        echo ${sample}
        cat ${SUB_FOLDER}/${folder}_${sample}_*_R1_*.fastq.gz \
            >> concat/sample_${sample}_R1.fastq
        cat ${SUB_FOLDER}/${folder}_${sample}_*_R2_*.fastq.gz \
            >> concat/sample_${sample}_R2.fastq
    done
done

```







He then compressed the concatenated files using this script:  [compress_concat_files.sh](https://raw.githubusercontent.com/laurahspencer/red-king_RNASeq-2022/main/scripts/compress_concat_files.sh). Script location on Sedna: 'biodata/ggoetz/nichols/201910-redking_crab-rnaseq/scripts/compress_concat_files.sh'

He copied the concatenated/compressed data over to a new directory on Sedna, which is where I will retrieve the data for further processing: `share/nwfsc/ggoetz/red_king_crab/illumina/`

Here is a list of the concatenated sequence files: 

```
Tank_10_Crab_1_R1.fastq.gz  Tank_15_Crab_3_R1.fastq.gz  Tank_20_Crab_2_R1.fastq.gz  Tank_5_Crab_1_R1.fastq.gz
Tank_10_Crab_1_R2.fastq.gz  Tank_15_Crab_3_R2.fastq.gz  Tank_20_Crab_2_R2.fastq.gz  Tank_5_Crab_1_R2.fastq.gz
Tank_10_Crab_2_R1.fastq.gz  Tank_16_Crab_1_R1.fastq.gz  Tank_20_Crab_3_R1.fastq.gz  Tank_5_Crab_2_R1.fastq.gz
Tank_10_Crab_2_R2.fastq.gz  Tank_16_Crab_1_R2.fastq.gz  Tank_20_Crab_3_R2.fastq.gz  Tank_5_Crab_2_R2.fastq.gz
Tank_10_Crab_3_R1.fastq.gz  Tank_16_Crab_2_R1.fastq.gz  Tank_2_Crab_1_R1.fastq.gz   Tank_5_Crab_3_R1.fastq.gz
Tank_10_Crab_3_R2.fastq.gz  Tank_16_Crab_2_R2.fastq.gz  Tank_2_Crab_1_R2.fastq.gz   Tank_5_Crab_3_R2.fastq.gz
Tank_11_Crab_1_R1.fastq.gz  Tank_16_Crab_4_R1.fastq.gz  Tank_2_Crab_2_R1.fastq.gz   Tank_7_Crab_1_R1.fastq.gz
Tank_11_Crab_1_R2.fastq.gz  Tank_16_Crab_4_R2.fastq.gz  Tank_2_Crab_2_R2.fastq.gz   Tank_7_Crab_1_R2.fastq.gz
Tank_11_Crab_2_R1.fastq.gz  Tank_18_Crab_1_R1.fastq.gz  Tank_2_Crab_3_R1.fastq.gz   Tank_7_Crab_3_R1.fastq.gz
Tank_11_Crab_2_R2.fastq.gz  Tank_18_Crab_1_R2.fastq.gz  Tank_2_Crab_3_R2.fastq.gz   Tank_7_Crab_3_R2.fastq.gz
Tank_11_Crab_3_R1.fastq.gz  Tank_18_Crab_2_R1.fastq.gz  Tank_3_Crab_1_R1.fastq.gz   Tank_7_Crab_4_R1.fastq.gz
Tank_11_Crab_3_R2.fastq.gz  Tank_18_Crab_2_R2.fastq.gz  Tank_3_Crab_1_R2.fastq.gz   Tank_7_Crab_4_R2.fastq.gz
Tank_13_Crab_1_R1.fastq.gz  Tank_18_Crab_3_R1.fastq.gz  Tank_3_Crab_2_R1.fastq.gz   Tank_9_Crab_1_R1.fastq.gz
Tank_13_Crab_1_R2.fastq.gz  Tank_18_Crab_3_R2.fastq.gz  Tank_3_Crab_2_R2.fastq.gz   Tank_9_Crab_1_R2.fastq.gz
Tank_13_Crab_2_R1.fastq.gz  Tank_1_Crab_1_R1.fastq.gz   Tank_3_Crab_3_R1.fastq.gz   Tank_9_Crab_2_R1.fastq.gz
Tank_13_Crab_2_R2.fastq.gz  Tank_1_Crab_1_R2.fastq.gz   Tank_3_Crab_3_R2.fastq.gz   Tank_9_Crab_2_R2.fastq.gz
Tank_13_Crab_3_R1.fastq.gz  Tank_1_Crab_2_R1.fastq.gz   Tank_4_Crab_1_R1.fastq.gz   Tank_9_Crab_3_R1.fastq.gz
Tank_13_Crab_3_R2.fastq.gz  Tank_1_Crab_2_R2.fastq.gz   Tank_4_Crab_1_R2.fastq.gz   Tank_9_Crab_3_R2.fastq.gz
Tank_15_Crab_1_R1.fastq.gz  Tank_1_Crab_3_R1.fastq.gz   Tank_4_Crab_2_R1.fastq.gz   Tank_9_Crab_4_R1.fastq.gz
Tank_15_Crab_1_R2.fastq.gz  Tank_1_Crab_3_R2.fastq.gz   Tank_4_Crab_2_R2.fastq.gz   Tank_9_Crab_4_R2.fastq.gz
Tank_15_Crab_2_R1.fastq.gz  Tank_20_Crab_1_R1.fastq.gz  Tank_4_Crab_3_R1.fastq.gz
Tank_15_Crab_2_R2.fastq.gz  Tank_20_Crab_1_R2.fastq.gz  Tank_4_Crab_3_R2.fastq.gz
```