-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #9 from luispedro/better_output_docs
Better output docs
- Loading branch information
Showing
5 changed files
with
63 additions
and
95 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,59 +1,55 @@ | ||
# Output | ||
|
||
Explaination of the files in the output | ||
Explanation of the files in the output | ||
|
||
|
||
## prodigal\_out.faa , prodigal\_out.fna , gene.coords.gbk | ||
|
||
## prodigal_out.faa , prodigal_out.fna , gene.coords.gbk | ||
These three files are the output of the prodigal. | ||
These three files are the output prodigal. | ||
|
||
prodigal_out.faa is the protein sequence. | ||
- `prodigal_out.faa` protein sequence | ||
- `prodigal_out.fna` DNA sequence | ||
- `gene.coords.gbk` gene information in Genebank format | ||
|
||
prodigal_out.fna is the dna sequence. | ||
|
||
gene.coords.gbk is the gene information | ||
|
||
|
||
|
||
|
||
|
||
## hit_table.tsv : | ||
## hit\_table.tsv : | ||
|
||
The results of the queries to the GMGC. | ||
|
||
There are five columns in the file. | ||
|
||
- query_name: the name/id of the input genome contig | ||
- gene_id: the gene_id with the best hit_score in GMGC | ||
- align_category: there are four different classes of alignment | ||
- gene_dna : the dna sequence of the hitted gene in GMGC | ||
- gene_protein : the protein sequence of the hitted gene in GMGC | ||
|
||
Align_category | ||
- `query_name`: the name/id of the input genome contig | ||
- `gene_id`: the gener\_id with the best score in GMGC | ||
- `align_category: there are four different classes of alignment (see below) | ||
- `gene\_dna`: the DNA sequence of the best hit in GMGC | ||
- `gene\_protein`: the protein sequence of the best hit in GMGC | ||
|
||
- EXACT : above 95% nucleotide identity with at least 95% coverage | ||
- SIMILAR : above 80% nucleotide identity with at least 80% coverage | ||
- MATCH : above 50% nucleotide identity with at least 50% coverage | ||
- NO MATCH : no match in GMGC | ||
### Alignment category | ||
|
||
- `EXACT`: at least 95% nucleotide identity with at least 95% coverage. As | ||
unigenes in the GMGC represent 95% nucleotide clusterings (species-level | ||
threshold), this would mean that the query gene would have clustered with | ||
the GMGC unigene. | ||
- `SIMILAR`: at least 80% amino acid identity with at least 80% coverage. | ||
- `MATCH`: at least 50% amino acid identity with at least 50% coverage. | ||
- `NO MATCH`: no match in GMGC. | ||
|
||
|
||
## `genome\_bin.tsv` | ||
|
||
|
||
## genome_bin.tsv | ||
|
||
Times of a genome bin that input genes hitting it | ||
Genome bins (MAGs) found in the results (and a count of how often many genes | ||
are contained in them). | ||
|
||
There are two columns in the file. | ||
|
||
* genome_bin : the name of genome bins in GMGC | ||
* times_gene_hit : the times of input genes hitting it | ||
|
||
|
||
|
||
- `genome\_bin`: the name of genome bins in GMGC | ||
- `times\_gene\_hit`: the times of input genes hitting it | ||
|
||
Note that GMGC unigenes can while not all GMGC unigenes are contained in a | ||
genome bin, some are contained in many. Thus, the total counts will not (except | ||
by coincidence) correspond to the number of genes queried. | ||
|
||
## summary.txt | ||
|
||
Summary of the query | ||
Human-readable summary of the results. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters