This is a workflow to assemble RNA reads from 454 into a transcriptome. The procedure is as follows:
-
Quality control
- Base calling with
PyroBayes
(if.sff
files are provided) - Quality, length and adaptor trimming with
SnoWhite
- Base calling with
-
Assembly
With
gsAssembler
from Roche's Data Analysis tools (newbler
)
-
Clone the repo
git clone https://github.com/jlanga/smsk_454.git # Clone cd smsk_454
-
Activate the environment (
deactivate
to deactivate):source bin/activate
-
Install software and packages via pip and homebrew (edit whatever is necessary):
bash bin/install/brew.sh bash bin/install/from_brew.sh bash bin/install/from_pip3.sh bash bin/install/from_tarball.sh
-
Additional requirements
Pyrobayes is an accurate base caller for 454 datasets. It used to be available through free registration at here. It used to be a file called
pyrobayes.unified_release_64bit.tar.gz
. If you are able to get it, do the followingmkdir -p src/ pushd src/ cp /path/to/tarball.tar.gz . tar xvf pyrobayes.unified_release_64bit.tar.gz popd cp src/UnifiedRelease/bin/PyroBayes bin/
The same applies to get
gsAssembler
. You should go to Roche and ask for a copy (it is free but requires registration). You should get a file calledDataAnalysis_2.9_All_20130530_1559.tgz
. If you are connecting through ssh, use the-X
option to allow graphic interfaces (ssh server -X
).From here,
mkdir -p src/ pushd src/ cp /path/to/DataAnalysis_2.9_All_20130530_1559.tgz . tar xvf DataAnalysis_2.9_All_20130530_1559.tgz pushd DataAnalysis_2.9_All/ bash setup.sh
And a window will pop up. Select "local installation" and choose as installation path the
src/
directory of this project (in my case/home/jlanga/pipelines/smsk_454/src/454/
) -
Download sample data from the European Nucleotide Archive (ENA; two sff files and two fastq files):
bash bin/download_test_data.sh
-
Execute the pipeline (should take up to 10 minutes):
snakemake -j 24
The hierarchy of the folder is the one described in A Quick Guide to Organizing Computational Biology Projects:
smsk
├── .linuxbrew: brew files
├── bin: scripts,binaries and snakemeake related files.
├── data: raw data, hopefully links to backuped data.
├── doc: logs.
├── README.md
├── results: processed data, reports, etc.
└── src: additional source code, tarballs, etc.
-
[PyroBayes](Pyrobayes: an improved base caller for SNP discovery in pyrosequences)