Skip to content

Commit

Permalink
update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
psj1997 committed Jun 7, 2020
1 parent f96d173 commit 5149fa9
Show file tree
Hide file tree
Showing 4 changed files with 59 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ The output folder contains :

(2) hit_table.tsv : results of the query. There are five columns in the file: query_name,gene_id,align_category,gene_dna,gene_protein.

(3) genome_bin.tsv : results of times of a genome bin that genes hitting it.
(3) genome_bin.tsv : times of a genome bin that input genes hitting it

(4) summary.txt : Summary of the query.

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The output folder contains :

(2) hit_table.tsv : results of the query. There are five columns in the file: query_name,gene_id,align_category,gene_dna,gene_protein.

(3) genome_bin.tsv : results of times of a genome bin that genes hitting it.
(3) genome_bin.tsv : times of a genome bin that input genes hitting it

(4) summary.txt : Summary of the query.

Expand Down
56 changes: 56 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Output

Explaination of the files in the output



## prodigal_out.faa , prodigal_out.fna , gene.coords.gbk
These three files are the output of the prodigal.

prodigal_out.faa is the protein sequence.

prodigal_out.fna is the dna sequence.

gene.coords.gbk is the gene information



## hit_table.tsv :

The results of the queries to the GMGC.

There are five columns in the file.

- query_name: the name/id of the input genome contig
- gene_id: the gene_id with the best hit_score in GMGC
- align_category: there are four different classes of alignment
- gene_dna : the dna sequence of the hitted gene in GMGC
- gene_protein : the protein sequence of the hitted gene in GMGC

#### Align_category

- EXACT : above 95% nucleotide identity with at least 95% coverage

- SIMILAR : above 80% nucleotide identity with at least 80% coverage

- MATCH : above 50% nucleotide identity with at least 50% coverage

- NO MATCH : no match in GMGC



## genome_bin.tsv

Times of a genome bin that input genes hitting it

There are two columns in the file.

* genome_bin : the name of genome bins in GMGC
* times_gene_hit : the times of input genes hitting it



## summary.txt

Summary of the query

1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ nav:
- 'GMGC-Finder': index.md
- 'Install': install.md
- 'Usage': usage.md
- 'Output' : output.md

0 comments on commit 5149fa9

Please sign in to comment.