Skip to content

Commit

Permalink
update demo docs to new draft
Browse files Browse the repository at this point in the history
  • Loading branch information
robinandeer committed Aug 31, 2015
1 parent ef85c69 commit e9a58de
Showing 1 changed file with 24 additions and 48 deletions.
72 changes: 24 additions & 48 deletions docs/introduction.rst
Expand Up @@ -24,87 +24,63 @@ First we need some files to work with. Chanjo comes with some pre-installed demo

.. code-block:: console
$ chanjo demo chanjo-demo && cd chanjo-demo
$ chanjo demo chanjo-demo && cd chanjo-demo
This will create a new folder (``chanjo-demo``) in your current directory and fill it with the example files you need.

.. note::
You can name the new folder anything you like but it *must not already exist*!
You can name the new folder anything you like but it *must not already exist*!


Setup and configuration
~~~~~~~~~~~~~~~~~~~~~~~~
Your first task will be to create a config file (``chanjo.toml``). It can be used to store commonly used options to avoid having to type everything on the command line. Chanjo will walk you through setting it up by running:
The first task is to create a config file (``chanjo.yaml``) and prepare the database. Chanjo will walk you through setting it up by running:

.. code-block:: console
$ chanjo init
$ chanjo init
$ chanjo db setup
.. note::
Chanjo uses project-level config files by default. This means that it will look for a possible ``chanjo.toml`` file in the **current directory** where you execute your commands. You can also point to a diffrent config file using the ``chanjo -c /path/to/chanjo.toml`` option.

If you accepted all defaults, Chanjo will be set up so that it knows e.g. that you want to store your SQL database in the current direcory with the name "coverage.sqlite3".
Chanjo uses project-level config files by default. This means that it will look for a ``chanjo.yaml`` file in the **current directory** where you execute your command. You can also point to a diffrent config file using the ``chanjo -c /path/to/chanjo.yaml`` option.


Defining interesting regions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One important thing to note is that Chanjo doesn't consider coverage across the entire genome or exome. Instead you need to define some intervals you are interested in checking the coverage across.

For whole exome sequencing, this could be your targeted regions. Or for clinical sequencing it might be exons from the manually curated CCDS database. In fact the default adapter already converts CCDS transcripts into the BED\* interval file that Chanjo expects.
One important thing to note is that Chanjo considers coverage across exonic regions of the genome. It's perfectly possible to compose your own list of intervals. Just make sure to follow the BED conventions (http://genome.ucsc.edu/FAQ/FAQformat.html#format1). You then add a couple of additional columns that define relationships between exons and transcripts and transcripts and genes:

.. code-block:: console
.. code-block::
$ sort -k1,1 -k2,2n CCDS.mini.txt | chanjo convert > CCDS.mini.bed
.. note::
The input to ``chanjo convert`` needs to be sorted according to contig/chromosome and start value. You can ensure this easily using the sort command above.
#chrom chromStart chromEnd name score strand transcripts genes
chr1 120032 120162 exon1 0 + transcript1,transcript2 gene1,gene1
.. note::
It's perfectly possible to compose your own list of intervals. Just make sure to follow the BED conventions (http://genome.ucsc.edu/FAQ/FAQformat.html#format1).
If an exon belongs to multiple transcripts you define a list of ids and an equal number of gene identifiers to match.


Initializing a SQL database
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With the Chanjo formatted BED-file we are ready to build our SQL database that will hold the coverage data for long-term storage. By default Chanjo will setup a SQLite 3 database. Make sure you are using version 3 and not version 2 of SQLite.
Linking exons/transcripts/genes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let's tell Chanjo which exons belong to which transcripts and which transcripts belong to which genes. It's fine to use the output from Sambamba as long as the two columns after "strand" are present in the file.

.. code-block:: console
$ chanjo build CCDS.mini.bed
If you prefer to use a MySQL database, the build pipeline would look something like this:
$ chanjo link sample1.coverage.bed
.. code-block:: console
$ chanjo convert resources/ccds/CCDS.txt | \
> chanjo --db username:password@localhost/chanjo_test --dialect "mysql+pymysql" build
.. note::
The `dialect syntax`_ is taken from SQLAlchemy and is defined as ``<dialect or database>+<Python connector>``.


Annotating coverage
~~~~~~~~~~~~~~~~~~~~
If you happen to have misplaced your BED-file from the previous step, it's possible to re-generate it as a BED-stream from an existing Chanjo database. Let's use this stream as the input to the *annotate* subcommand.
Loading annotations
~~~~~~~~~~~~~~~~~~~~~
After running ``sambamba depth region`` you can take the output and load it into the database. Let's also add a group identifier to indicate that the sample is related to some other samples.

.. code-block:: console
$ chanjo export | chanjo annotate --prefix=chr alignment.bam | tee annotations.bed
Chanjo will during this step read the BED stream and annotate each interval with coverage and completeness. We use the ``--prefix`` to synchronize how contigs are defined in the BED stream and BAM alignment file.

.. note::
So what is this "completeness"? Well, it's pretty simple. You start by setting a level of "sufficient" coverage (``--cutoff``). Chanjo will then, for each interval, determine the percentage of bases with at least sufficient levels of coverage.
$ chanjo load sample1.coverage.bed --group group1
Importing annotations for storage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To close the circle, we can import the output from *annotate* to the last command: *import*. It will take the annotations and store them in your SQLite database.
Extracting informtion
~~~~~~~~~~~~~~~~~~~~~~
We now have some information loaded for a few samples and we can now start exploring what coverage looks like!

.. code-block:: console
$ chanjo import annotations.bed
This is the complete Chanjo coverage analysis pipeline. Extracting basic coverage metrics like "average coverage", "overall completeness", etc. is as easy as a couple of SQL statements.
.. note::
So what is this "completeness"? Well, it's pretty simple; the percentage of bases with at least "sufficient" (say; 10x) coverage.


What's next?
Expand Down

0 comments on commit e9a58de

Please sign in to comment.