Skip to content

fish546-2021/Megan-project

Repository files navigation

Megan-project

Class project for FISH 546 winter 2021

project results and methods project presentation

Software information
  • OS: macOS Big Sur Version 11.2 (20D64)
  • RStudio Version 1.3.959
  • kallisto 0.46.2
  • FastQC v0.11.9 (Win/Linux)
  • GitHub Desktop Version 2.5.4
  • JupyterLab 3.0.6

RNA seq data: coho salmon treated with a steroid and looked at gonadal transcriptional alterations

Data courtesy of Chris Monson (UW) and Giles Goetz (NOAA). A more through description can be found in the data subdirectory's readme

Data Location

All RNA seq raw data files can be found here

  • Only a subset of files were used for this project due to storage limitations, but all codes are written so they can be executed with the full dataset if you have space on your computer or external hard drive to do so. The subset of files are: 17104-02RT-01-10_S18_L002_R1_001.fastq.gz 17104-02RT-01-10_S18_L002_R2_001.fastq.gz 17104-02RT-01-11_S19_L002_R1_001.fastq.gz 17104-02RT-01-11_S19_L002_R2_001.fastq.gz 17104-02RT-01-13_S21_L002_R2_001.fastq.gz 17104-02RT-01-13_S21_L003_R1_001.fastq.gz 17104-02RT-01-13_S21_L003_R2_001.fastq.gz 17104-02RT-01-14_S22_L002_R1_001.fastq.gz 17104-02RT-01-7_S15_L002_R2_001.fastq.gz 17104-02RT-01-7_S15_L003_R1_001.fastq.gz 17104-02RT-01-7_S15_L003_R2_001.fastq.gz 17104-02RT-01-8_S16_L002_R1_001.fastq.gz 17104-02RT-01-8_S16_L002_R2_001.fastq.gz 17104-02RT-01-8_S16_L003_R1_001.fastq.gz 17104-02RT-01-8_S16_L003_R2_001.fastq.gz
    • The R1 or R2 in the file names correspond to the read ends
    • The sequences of the adapters used in library prep are R1 : AGATCGGAAGAGCACACGTCTGAACTCCAGTCA and R2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT.

General Workflow

As of March 3rd, 2021, Week 8 Below, the code files for this project are listed and described in order. File names for the final project are formatted step#-description Old file names / draft code names are formatted MMDD-description where MMDD is the day they were created.

Origin of Data:

Giles transferred the files from a NOAA server to Steven's ostrich server. Steven then transferred them to gannet, which is linked above. The md5sum text, 0128-Giles-md5sums.txt file generated by generated by Giles during the initial transfer is located in data/raw/ subdirectory.

step1-gettingDataFromGannet.ipynb

Retrieves the data from gannet using wget. Recall, that not all of the files were used in this project, but all files are available on gannet. Saves data to data/raw/subdirectory.

step2-md5sums.ipynb

Compares the file with md5sums that Giles provided, 0128-Giles-md5sums.txtto the md5sums of the downloaded files.

step3-fastqcForMultipleFiles.ipynb

Runs fastqc for all of our raw data files. Output directed to analyses/step3-fastqc/

step4-multiqcOnFastqc.ipynb

Runs multiqc using all of the fastqc outputs directed to analyses/step3-fastqc/in order to visualize all our sequences' qualities. Output directed to analyses/step4-multiqc/and the html output is multiqc_report.html within this subdirectory. Multiqc showed that the first ~15 bp of all sequences needed to be trimmed.

step5-trimming.ipynb

Skipped in this project for the sake of time

step6-kallisto.ipynb

Gene expression quantified and put into a trinity matrix using Kallisto. Kallisto index built using the ensembl reference transcriptome for Oncorhynchus kisutch, located in data/Oncorhynchus_kisutch.Okis_V2.ncrna.fa Outputs directed to analyses/step6-kallisto.idx and analyses/step6-output/

step7-deseq2visualization.Rmd

Used DESeq2 to identify DEGs, and visualize DEGs with volcano plot and heatmap. Images of the volcano plot and heatmap are in images/

step8-blast.ipynb

Ran blastx for the reference transcriptome to identify what the DEGs' functions were.

Note: only one of 12 DEGs had a match to the reference transcriptome. The remaining 11 were identified using the web version of blastn using default settings and the fasta file data/Oncorhynchus_kisutch.Okis_V2.ncrna_11-DEGs.fa

step9-joining.ipynb

joins the DEG statistics generated using DESeq2 with the expression levels of the 12 DEGs. Output is analyes/step9-DEGandBlastTable.tab

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published