Skip to content
Liz Suter edited this page Aug 28, 2020 · 10 revisions

Welcome to the MarineAnimalDisease wiki!

Here is a break down of the research steps for this project. As you move along, you can check off what is in progress and what is completed by adding items to the Project board (in the "Projects" tab above).

Part 1: Introduction

Discussion

  • Intro to Research & marine microbiology
  • What is BVCN?
  • Discussion of amplicons, microbial ecology of the oceans, disease & climate change
  • What is R? Rstudio? Rstudio cloud?
  • What is Cyverse?
  • What is Dada2?
  • What is Markdown?
  • Watch videos on common field and lab approaches (links in doc in Google Drive)

To do


Part 2: Amplicon Pipeline Practice

Discussion

  • What is amplicon science?
  • Potential datasets- what are we looking for?
  • Tips for working in Discovery Environment
  • Tips for documenting (Markdown)
  • Intro to NCBI's SRA

To do

  • Complete BVCN R lessons 3-5
  • Complete BVCN Amplicons lesson 3b, a tutorial using the analysis from Happy Belly Bioinformatics in the Cyverse app, “rstudio-dada2-decipher”
    • Follow along with Amplicons_Lesson_03b_cyverse.Rmd to do the tutorial in Cyverse
    • See the Cyverse prereq video from lesson 3a for some tips on setting up in the Discovery Environment.
    • Share analysis and data folder with me (Cyverse user name: esuter). Ask questions!
    • Read about some details of the analysis at the Happy Belly website as you move along

Part 3: Getting Set Up

To Do

  • Start looking for papers for amplicons datasets- discuss with group in Slack (see some guidelines here)
    • Finalize dataset (with approval from me)
  • Start personalizing your MarineAnimalDisease repo by putting links to your dataset’s Bioproject page (or similar) in the readme file
  • Download and install cyberduck (instructions)
  • Begin to download your dataset (fastq files) from SRA and import into Cyverse’s Data Store using Cyberduck. Share data folder with me in Cyverse. There are 3 options for this step:
    • If you have Conda, you can do this by cloning and following this example Jupyter notebook in your local Terminal. (NOTE: for QIIME2, you should follow the whole notebook. For DADA2, you do not need to do the final steps of making a manifest file).
    • For a small number of fastq files, you can use the app, fastq-dump in the Discovery Environment.
    • If you do not have Conda and you have a large number of files, provide me with a list of accession numbers and I will set up the data folder and share it with you in Cyverse

Part 4: Amplicon Pipeline Analysis

To Do

  • Start a full amplicon analysis of your dataset, bringing the fastq files to count tables, taxonomy tables, etc. by following the DADA2 pipeline in R using the Cyverse app. Make a copy of the Amplicons_Lesson_03b_cyverse.Rmd notebook from lesson 3b and make appropriate modifications for your dataset.
    • At various points in this you will likely need input. Every dataset is different, and the pipeline depends a lot on your primers, the sequencing platform, and the overall quality of the sequences! Please discuss each step carefully with me and ask the group for help on Slack

Part 5: Post-Analysis of Amplicon Pipeline Output [Ecological Analysis]

To Do

  • Complete BVCN R lessons lessons 6, 8a (skipping 7)
  • Start applying some of the R analyses to your own dataset (cleaning up count tables, making abundance plots, using ecological statistics to determine patterns)
    • I went through this entire process with an example dataset from Green et al. From this directory you can follow along and modify the scripts. You can also follow along with the full tutorial at this site)
  • Write up research summary and make sure all data and documents you produced are available in a shared Cyverse folder. Final results files and script should be in a github repo