Skip to content

sr320/eimd-sswd

Repository files navigation

###Jupyter Notebooks and data supplemental to the manuscript: "Up in arms: Immune and nervous system response to sea star wasting disease"


The repository includes jupyter notebooks (.ipynb file) that can be downloaded locally and interactively executed. The code in the jupyter notebook eimd_analysis.ipynb will process data such that analysis can be reproduced.


###Description of Files and Directories

  • eimd_analysis.ipynb - Jupyter notebook that can be interactively executed locally or viewed online designed so that user can replicate analysis. Requires several dependancies.
  • eimd_data-only.ipynb - Jupyter notebok that can be interactively executed locally or viewed online designed so that user can simply explore data files. Only requires IPython.
  • data/Phel_transcriptome.fasta - P hel coelocytes transcriptome. Contains 29476 contigs from de novo assembly.
  • data/Phel_countdata.txt - Tab-delimited text file with read count data from 6 P hel RNA-seq libraries, 3 treated and 3 control libraries.
  • scripts/count_fasta.pl - Perl script: Author: Joseph Fass (modified from script by Brad Sickler) last revised: November 2010 - bioinformatics.ucdavis.edu.
  • wd - subdirectory that serves as output directory (working directory) when the repository is downloaded and the notebook eimd_analysis.ipynb is executed locally.
  • precompiled_wd - subdirectory that provides data that will be produced in wd by running commands in notebook. Used primarily for viewing data in eimd_data-only.ipynb
  • misc/P_miniata-protein-comparison.ipynb - secondary analysis to better understand the relationship of contigs that have similar annotation.
  • misc/Supplemental_File-01.tab - Copy of supplemental tab delimited file with annotation information for all contigs including annotations (protein and GO), differential expression statistics, and enrichment terms. Specific column headers include Contig spID e-value log2FoldChange padj Protein names Enriched GO Terms

#Instructions for data-only (interactive viewing) notebook

  1. Download the repository zip file to a local directory and uncompress.

  2. Launch IPython from the repository primary directory. For example, using Terminal on MacOSX.

$ cd /Desktop/eimd-sswd
$ ipython notebook

This will launch IPython in your web browser.

  1. Open notebook by clicking on eimd_data-only.ipynb. This will open a new tab in your browser.

  2. Execute code as written or modify to your likely. To execute cell type shift-enter.


#Instructions for analysis (interactive execution) notebook

  1. Before you get started

To execute the eimd_analysis.ipynb IPython Notebook in its entirety you will need:


In addition you will need a local copy of the UniProt/SwissProt Blast Database. If you do not already have this database you can create it once you install NCBI Blast. Create a blast database first download a fasta file from http://www.uniprot.org/downloads that of Reviewed (Swiss-Prot) then run make blastdb commands. Below is an example code if you wanted to create the database in a subdirectory named blastdb. This will result in files > 300 MB.

$ mkdir blastdb

$ cd blastdb

$ curl -O ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

$ gunzip uniprot_sprot.fasta.gz

$ makeblastdb -in uniprot_sprot.fasta -dbtype prot -out uniprot_sprot

This will generate a Protein database that you can you to blast sequences.


  1. Download the repository zip file to a local directory and uncompress.

  2. Launch IPython from the repository primary directory. For example, using Terminal on MacOSX.

$ cd /Desktop/eimd-sswd
$ ipython notebook

This will launch IPython in your web browser.

  1. Open notebook by clicking on eimd_analysis.ipynb. This will open a new tab in your browser.

  2. Modify the cell near the top of the notebook …

#Variables user needs to modify accordingly
db="~/blastdb/uniprot_sprot"
sqls="~/sqlshare-pythonclient/tools/"
usr="user@gmail.com"

db refers to location of blastdb (instructions on bow to create database in step 1).
sqls refers to the location of your sqlshare-pythonclient tools subdirectory (instructions to install in step 1).
usr refers to your SQLshare user name (yep, see step 1 if need be)

  1. Execute code as written or modify to your liking. To execute cell type shift-enter. Once variables are set, you should be able to run all cells.

We are actively trying to improve this realizing that we are likely missing dependancies, etc. Any suggestions and feedback is welcome.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages