Skip to content

Commit

Permalink
Updated command lines for the three main programs in the documentation (
Browse files Browse the repository at this point in the history
  • Loading branch information
lucventurini committed Oct 26, 2018
1 parent 392e2af commit 959a9e9
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 34 deletions.
34 changes: 24 additions & 10 deletions docs/Usage/Pick.rst
Original file line number Diff line number Diff line change
Expand Up @@ -319,18 +319,21 @@ Usage::
usage: Mikado pick [-h] [--start-method {fork,spawn,forkserver}] [-p PROCS]
--json-conf JSON_CONF [--scoring-file SCORING_FILE]
[-i INTRON_RANGE INTRON_RANGE] [--pad]
[--pad-max-splices PAD_MAX_SPLICES]
[--pad-max-distance PAD_MAX_DISTANCE]
[--subloci_out SUBLOCI_OUT] [--monoloci_out MONOLOCI_OUT]
[--loci_out LOCI_OUT] [--prefix PREFIX] [--no_cds]
[--source SOURCE] [--flank FLANK] [--purge]
[--subloci-from-cds-only] [--monoloci-from-simple-overlap]
[-db SQLITE_DB] [-od OUTPUT_DIR] [--single] [-l LOG]
[-v | -nv] [-lv {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[--source SOURCE] [--flank FLANK] [--purge] [--cds-only]
[--monoloci-from-simple-overlap]
[--consider-truncated-for-retained] [-db SQLITE_DB]
[-od OUTPUT_DIR] [--single] [-l LOG] [-v | -nv]
[-lv {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
[--mode {nosplit,stringent,lenient,permissive,split}]
[gff]

positional arguments:
gff

optional arguments:
-h, --help show this help message and exit
--start-method {fork,spawn,forkserver}
Expand All @@ -350,14 +353,20 @@ Usage::
outside of this range will be penalised. Default: (60,
900) (default: None)
--pad Whether to pad transcripts in loci. (default: False)
--pad-max-splices PAD_MAX_SPLICES
Maximum splice sites that can be crossed during
transcript padding. (default: None)
--pad-max-distance PAD_MAX_DISTANCE
Maximum amount of bps that transcripts can be padded
with (per side). (default: None)
--subloci_out SUBLOCI_OUT
--monoloci_out MONOLOCI_OUT
--loci_out LOCI_OUT This output file is mandatory. If it is not specified
in the configuration file, it must be provided here.
(default: None)
--prefix PREFIX Prefix for the genes. Default: Mikado (default: None)
--no_cds Flag. If set, not CDS information will be printed out
in the GFF output files. (default: None)
in the GFF output files. (default: False)
--source SOURCE Source field to use for the output files. (default:
None)
--flank FLANK Flanking distance (in bps) to group non-overlapping
Expand All @@ -366,8 +375,7 @@ Usage::
--purge Flag. If set, the pipeline will suppress any loci
whose transcripts do not pass the requirements set in
the JSON file. (default: False)
--subloci-from-cds-only
"Flag. If set, Mikado will only look for overlap in
--cds-only "Flag. If set, Mikado will only look for overlap in
the coding features when clustering transcripts
(unless one transcript is non-coding, in which case
the whole transcript will be considered). Default:
Expand All @@ -378,6 +386,11 @@ Usage::
transcripts by simple overlap, not by looking at the
presence of shared introns. Default: False. (default:
False)
--consider-truncated-for-retained
Flag. If set, Mikado will consider as retained intron
events also transcripts which lack UTR but whose CDS
ends within a CDS intron of another model. (default:
False)
-db SQLITE_DB, --sqlite-db SQLITE_DB
Location of an SQLite database to overwrite what is
specified in the configuration file. (default: None)
Expand All @@ -399,7 +412,7 @@ Usage::
but also split when both ORFs lack BLAST hits - split:
split multi-orf transcripts regardless of what BLAST
data is available. (default: None)

Log options:
-l LOG, --log LOG File to write the log to. Default: decided by the
configuration file. (default: None)
Expand All @@ -412,6 +425,7 @@ Usage::
file. (default: None)



.. block end
Expand Down
16 changes: 7 additions & 9 deletions docs/Usage/Prepare.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,16 +48,12 @@ Command line usage:
[-s | -sa STRAND_SPECIFIC_ASSEMBLIES] [--list LIST]
[-l LOG] [--lenient] [-m MINIMUM_LENGTH] [-p PROCS]
[-scds] [--labels LABELS] [--single] [-od OUTPUT_DIR]
[-o OUT] [-of OUT_FASTA] [--json-conf JSON_CONF]
[-o OUT] [-of OUT_FASTA] [--json-conf JSON_CONF] [-k]
[gff [gff ...]]
Mikado prepare analyses an input GTF file and prepares it for the picking
analysis by sorting its transcripts and performing some simple consistency
checks.
positional arguments:
gff Input GFF/GTF file(s).
optional arguments:
-h, --help show this help message and exit
--fasta FASTA Genome FASTA file. Required.
Expand All @@ -79,7 +75,7 @@ Command line usage:
-m MINIMUM_LENGTH, --minimum_length MINIMUM_LENGTH
Minimum length for transcripts. Default: 200 bps.
-p PROCS, --procs PROCS
Number of processors to use (default 1)
Number of processors to use (default None)
-scds, --strip_cds Boolean flag. If set, ignores any CDS/UTR segment.
--labels LABELS Labels to attach to the IDs of the transcripts of the
input files, separated by comma.
Expand All @@ -91,6 +87,9 @@ Command line usage:
Output file. Default: mikado_prepared.fasta.
--json-conf JSON_CONF
Configuration file.
-k, --keep-redundant Boolean flag. If invoked, Mikado prepare will retain
redundant models.
Collection of transcripts from the annotation files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -124,4 +123,3 @@ Mikado prepare will produce two files:
* a FASTA file of the transcripts, in the proper cDNA orientation.
.. warning:: contrary to other tools such as eg gffread from Cufflinks [Cufflinks]_, Mikado prepare will **not** try to calculate the loci for the transcripts. This task will be performed later in the pipeline. As such, the GTF file is formally incorrect, as multiple transcripts in the same locus but coming from different assemblies will *not* have the same gene_id but rather will have kept their original one. Moreover, if two gene_ids were identical but discrete in the input files (ie located on different sections of the genome), this error will not be corrected. If you desire to use this GTF file for any purpose, please use a tool like gffread to calculate the loci appropriately.
33 changes: 18 additions & 15 deletions docs/Usage/Serialise.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,19 +85,16 @@ Usage::
$ mikado serialise --help
usage: Mikado serialise [-h] [--start-method {fork,spawn,forkserver}]
[--orfs ORFS] [--transcripts TRANSCRIPTS]
[-mr MAX_REGRESSION]
[-mr MAX_REGRESSION] [--codon-table CODON_TABLE]
[--max_target_seqs MAX_TARGET_SEQS]
[--blast_targets BLAST_TARGETS] [--discard-definition]
[--xml XML] [-p PROCS] [--single-thread]
[--genome_fai GENOME_FAI] [--junctions JUNCTIONS]
[-mo MAX_OBJECTS] [-f] --json-conf JSON_CONF
[-l [LOG]] [-od OUTPUT_DIR]
[--blast_targets BLAST_TARGETS] [--xml XML] [-p PROCS]
[--single-thread] [--genome_fai GENOME_FAI]
[--junctions JUNCTIONS]
[--external-scores EXTERNAL_SCORES] [-mo MAX_OBJECTS]
[-f] --json-conf JSON_CONF [-l [LOG]] [-od OUTPUT_DIR]
[-lv {DEBUG,INFO,WARN,ERROR}]
[db]

Mikado serialise creates the database used by the pick program. It handles
Junction and ORF BED12 files as well as BLAST XML results.

optional arguments:
-h, --help show this help message and exit
--start-method {fork,spawn,forkserver}
Expand All @@ -123,13 +120,14 @@ Usage::
"Amount of sequence in the ORF (in %) to backtrack in
order to find a valid START codon, if one is absent.
Default: None
--codon-table CODON_TABLE
Codon table to use. Default: 0 (ie Standard, NCBI #1,
but only ATG is considered a valid stop codon.

--max_target_seqs MAX_TARGET_SEQS
Maximum number of target sequences.
--blast_targets BLAST_TARGETS
Target sequences
--discard-definition Flag. If set, the sequences IDs instead of their
definition will be used for serialisation.
--xml XML XML file(s) to parse. They can be provided in three
ways: - a comma-separated list - as a base folder -
using bash-like name expansion (*,?, etc.). In this
Expand All @@ -146,6 +144,11 @@ Usage::
--genome_fai GENOME_FAI
--junctions JUNCTIONS

--external-scores EXTERNAL_SCORES
Tabular file containing external scores for the
transcripts. Each column should have a distinct name,
and transcripts have to be listed on the first column.

-mo MAX_OBJECTS, --max-objects MAX_OBJECTS
Maximum number of objects to cache in memory before
committing to the database. Default: 100,000 i.e.
Expand All @@ -157,17 +160,17 @@ Usage::
-l [LOG], --log [LOG]
Optional log file. Default: stderr
-lv {DEBUG,INFO,WARN,ERROR}, --log_level {DEBUG,INFO,WARN,ERROR}
Log level. Default: INFO
Log level. Default: derived from the configuration; if
absent, INFO
db Optional output database. Default: derived from
json_conf



Technical details
~~~~~~~~~~~~~~~~~

The schema of the database is quite simple, as it is composed only of 7 discrete tables in two groups. The first group, *chrom* and *junctions*, serialises the information pertaining to the reliable junctions - ie information which is not relative to the transcripts but rather to their genomic locations.
The second group serialises the data regarding ORFs and BLAST files. The need of using a database is mainly driven by the latter, as querying a relational database is faster than retrieving the information from the XML files themselves at runtime.
The schema of the database is quite simple, as it is composed only of 9 discrete tables in two groups. The first group, *chrom* and *junctions*, serialises the information pertaining to the reliable junctions - ie information which is not relative to the transcripts but rather to their genomic locations.
The second group serialises the data regarding ORFs, BLAST files and external arbitrary data. The need of using a database is mainly driven by the latter, as querying a relational database is faster than retrieving the information from the XML files themselves at runtime.

.. database figure generated with `SchemaCrawler <http://sualeh.github.io/SchemaCrawler/>`_, using the following command line:
schemacrawler -c graph -url=jdbc:sqlite:sample_data/mikado.db -o docs/Usage/database_schema.png --outputformat=png -infolevel=maximum
Expand Down

0 comments on commit 959a9e9

Please sign in to comment.