Skip to content

Commit

Permalink
add building source
Browse files Browse the repository at this point in the history
  • Loading branch information
maxibor committed Jul 10, 2019
1 parent 4115f40 commit ec6a7da
Show file tree
Hide file tree
Showing 5 changed files with 56 additions and 223 deletions.
19 changes: 19 additions & 0 deletions docs/custom_sources.md
@@ -0,0 +1,19 @@
# Custom sources

> Different taxonomic classifers will give different results, and **the taxonomic classifier used to produce the *source* OTU count table must the same as the one used to produced the *sink* OTU count table**.
While there are many available taxonomic classifiers available to produce the source and sink OTU table, the Sourcepredict author provide a simple pipeline to generate the source and sink OTU table.

This pipeline is written using [Nextflow](https://www.nextflow.io/), and handles the dependancies using [conda](https://conda.io/en/latest/).
Briefly, this pipelines will firt trim and clip the sequencing files with [AdapterRemoval](https://github.com/MikkelSchubert/adapterremoval) before performing the taxonomic classification with [Kraken2](https://ccb.jhu.edu/software/kraken2).

## Pipeline installation

```
$ conda install -c bioconda nextflow
$ nextflow pull maxibor/kraken-nf
```

## Running the pipeline

See the [README](https://github.com/maxibor/kraken-nf) of [maxibor/kraken-nf](https://github.com/maxibor/kraken-nf)
1 change: 1 addition & 0 deletions docs/index.rst
Expand Up @@ -20,6 +20,7 @@ __ homepage_
usage
results
methods
custom_sources



Expand Down
45 changes: 36 additions & 9 deletions docs/usage.md
@@ -1,5 +1,16 @@
# Usage



## Running sourcepredict on the test dataset

```
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/test/dog_test_sample.csv -O dog_test_sample.csv
$ sourcepredict dog_test_sample.csv
```

## Command line interface

```bash
$ sourcepredict -h
usage: SourcePredict v0.33 [-h] [-a ALPHA] [-s SOURCES] [-l LABELS]
Expand Down Expand Up @@ -41,14 +52,6 @@ optional arguments:
-k KFOLD Number of fold for K-fold cross validation in parameter
optimization. Default = 5
-t THREADS Number of threads for parallel processing. Default = 2

```
## Running sourcepredict on the test dataset
```bash
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/test/dog_test_sample.csv -O dog_test_sample.csv
$ sourcepredict -t 6 dog_test_sample.csv
```
## Command line arguments
Expand Down Expand Up @@ -119,7 +122,7 @@ Default = `data/modern_gut_microbiomes_labels.csv`
+----------+--------+
```
### -n NORMALIZATION
### -n NORMALIZATION
Normalization method. One of `RLE`, `CLR`, `Subsample`, or `GMPR`. Default = `GMPR`
Expand Down Expand Up @@ -186,3 +189,27 @@ Number of threads for parallel processing. Default = `2`
_Example:_
`-t 2`
## Choice of the taxonomic classifier
Different taxonomic classifiers will give different results, because of different algorithms, and different databases.
In order to produce correct results with Sourcepredict, **the taxonomic classifier used to produce the *source* OTU count table must the same as the one used to produced the *sink* OTU count table**.
Because Sourcepredict relies on machine learning, at least 10 samples per sources are required, but more source samples will lead to a better prediction by Sourcepredict.
Therefore, running all these samples through a taxonomic classifier ahead of Sourcepredict requires a non-negligeable computational time.
Hence the choice of the taxonomic classifier is a balance between precision, and computational time.
While this documentation doesn't intent to be a benchmark of taxonomic classifiers, the author of Sourcepredict has had decent results with [Kraken2](https://ccb.jhu.edu/software/kraken2/) and recommends it for its good compromise between precision and runtime.
The example *source* and *sink* data provided with Sourcepredict were generated with Kraken2.
80 changes: 0 additions & 80 deletions utils/kraken_pipeline/bin/kraken_parse.py

This file was deleted.

134 changes: 0 additions & 134 deletions utils/kraken_pipeline/kraken_pipe.nf

This file was deleted.

0 comments on commit ec6a7da

Please sign in to comment.