smsk: A Snakemake skeleton to jumpstart projects

1. Description

This is a workflow to assemble RNA reads from 454 into a transcriptome. The procedure is as follows:

Quality control
1. Base calling with PyroBayes (if .sff files are provided)
2. Quality, length and adaptor trimming with SnoWhite
Assembly

With gsAssembler from Roche's Data Analysis tools (newbler)

2. First steps

Clone the repo

git clone https://github.com/jlanga/smsk_454.git # Clone
cd smsk_454

Activate the environment (deactivate to deactivate):
```
source bin/activate
```

Install software and packages via pip and homebrew (edit whatever is necessary):

bash bin/install/brew.sh
bash bin/install/from_brew.sh
bash bin/install/from_pip3.sh
bash bin/install/from_tarball.sh

Additional requirements

Pyrobayes is an accurate base caller for 454 datasets. It used to be available through free registration at here. It used to be a file called pyrobayes.unified_release_64bit.tar.gz. If you are able to get it, do the following
```
mkdir -p src/
pushd src/
cp /path/to/tarball.tar.gz .
tar xvf pyrobayes.unified_release_64bit.tar.gz
popd
cp src/UnifiedRelease/bin/PyroBayes bin/
```
The same applies to get gsAssembler. You should go to Roche and ask for a copy (it is free but requires registration). You should get a file called DataAnalysis_2.9_All_20130530_1559.tgz. If you are connecting through ssh, use the -X option to allow graphic interfaces (ssh server -X).

From here,
```
mkdir -p src/
pushd src/
cp /path/to/DataAnalysis_2.9_All_20130530_1559.tgz .
tar xvf DataAnalysis_2.9_All_20130530_1559.tgz
pushd DataAnalysis_2.9_All/
bash setup.sh
```
And a window will pop up. Select "local installation" and choose as installation path the src/ directory of this project (in my case /home/jlanga/pipelines/smsk_454/src/454/)
Download sample data from the European Nucleotide Archive (ENA; two sff files and two fastq files):
```
bash bin/download_test_data.sh
```
Execute the pipeline (should take up to 10 minutes):
```
snakemake -j 24
```

3. File organization

The hierarchy of the folder is the one described in A Quick Guide to Organizing Computational Biology Projects:

smsk
├── .linuxbrew: brew files
├── bin: scripts,binaries and snakemeake related files.
├── data: raw data, hopefully links to backuped data.
├── doc: logs.
├── README.md
├── results: processed data, reports, etc.
└── src: additional source code, tarballs, etc.

Bibliography and resources

A Quick Guide to Organizing Computational Biology Projects
Snakemake—a scalable bioinformatics workflow engine
Seqclean
Fastqc
Snowhite
Tagdust
[PyroBayes](Pyrobayes: an improved base caller for SNP discovery in pyrosequences)
Linuxbrew

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
bin		bin
data/adaptors		data/adaptors
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
config_all.yaml		config_all.yaml
config_fq.yaml		config_fq.yaml
config_sff.yaml		config_sff.yaml
dag.svg		dag.svg
rulegraph.svg		rulegraph.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smsk: A Snakemake skeleton to jumpstart projects

1. Description

2. First steps

3. File organization

Bibliography and resources

About

Releases

Packages

Languages

License

jlanga/smsk_454

Folders and files

Latest commit

History

Repository files navigation

smsk: A Snakemake skeleton to jumpstart projects

1. Description

2. First steps

3. File organization

Bibliography and resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages