-
Notifications
You must be signed in to change notification settings - Fork 2
Use cDNA alignments with or without genetic map information to evaluate the completeness and correctness of a de novo genome or transcriptome assembly.
License
sahammond/gnavigator
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
# gnavigator
Gnavigator uses GMAP to align cDNAs to an assembly and evaluates the results.
Alignments are categorized based on their percent identity, percent coverage,
and whether or not the cDNA aligned to a single scaffold in the assembly.
cDNA Alignment Categories
Complete: 95% of sequence is aligned with 95% identity
Duplicated: as Complete, but at multiple locations
Partial: less than 95% of sequence is aligned to a single scaffold, other 5% unaligned
Fragmented: 95% of a sequence is aligned, but over multiple scaffolds
Poorly mapped: as complete, partial, or fragmented, but with less than 95% identity
Missing: not aligned by gmap
If a genetic map is provided, scaffolds with two or more aligned cDNAs that
are classified as "Complete" are further assessed. The genetic map must be a tab-separated
file with three columns, like so:
linkage group position in cM cDNA ID
LG01 0.000 cDNA1
LG01 0.010 cDNA2
... ... ...
LG14 153.10 cDNA144
Genetic Map Categories
Same LG, right order: all Complete cDNAs are from the same linkage group and their positions
on the scaffold are concordant with the genetic map
Same LG, wrong order: all Complete cDNAs are from the same linkage group but their positions
on the scaffold do not agree with the genetic map. This may indicate a misassembly or a
lineage-specific genomic rearrangement, depending on context.
Different LG: the Complete cDNAs that aligned to this scaffold are from different linkage groups.
This may indicate a misassembly or a lineage-specific genomic rearrangement, depending on context.
usage: gnavigator.py [-h] [-r] [-p PREFIX] [-d DB_DIR] [-n DB_NAME]
[-t THREADS] [-m GENETIC_MAP] [-i IDENTITY] [-c COVERAGE]
cDNA genome
Assess assembly quality & completeness using cDNA sequences
positional arguments:
cDNA FASTA file of cDNA sequences to align to assembly
genome FASTA file of genome assembly to assess
optional arguments:
-h, --help show this help message and exit
-r, --transcriptome Transcriptome assessment mode. See manual for details.
[off]
-p PREFIX, --prefix PREFIX
Prefix to use for intermediate and output files
[gnavigator]
-d DB_DIR, --db_dir DB_DIR
Path to directory containing prebuilt GMAP index
[optional]
-n DB_NAME, --db_name DB_NAME
Name of prebuilt GMAP index [optional]
-t THREADS, --threads THREADS
Number of threads for GMAP alignment [1]
-m GENETIC_MAP, --genetic_map GENETIC_MAP
Genetic map file as tsv with LG:cDNA pairs [optional]
-i IDENTITY, --identity IDENTITY
Minimum identity threshold [0.95]
-c COVERAGE, --coverage COVERAGE
Minimum coverage threshold [0.95]
Other notes
Gnavigator will use the first GMAP executable on your PATH. If you would prefer a different
executable, you may specify it in "gmap_config.txt".
About
Use cDNA alignments with or without genetic map information to evaluate the completeness and correctness of a de novo genome or transcriptome assembly.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published