fix the bug of copping output.md

BigDataBiology · Jun 12, 2020 · 6b441c6 · 6b441c6 · luispedro · Jun 12, 2020
1 parent a339c99
commit 6b441c6
Show file tree

Hide file tree

Showing 2 changed files with 55 additions and 1 deletion.
diff --git a/gmgc_finder/output.md b/gmgc_finder/output.md
@@ -0,0 +1,54 @@
+# Output
+
+Explanation of the files in the output
+
+## Prodigal output
+
+These three files are the output prodigal.
+
+- `prodigal_out.faa` protein sequence
+- `prodigal_out.fna` DNA sequence
+- `gene.coords.gbk` gene information in Genebank format
+
+
+## Hit Table (`hit_table.tsv`)
+
+The results of the queries to the GMGC.
+
+There are five columns in the file.
+
+- `query_name`: the name/id of the input genome contig
+- `gene_id`: the Unigene with the best score in GMGC
+- `align_category: there are four different classes of alignment (see below)
+- `gene_dna`: the DNA sequence of the best hit in GMGC
+- `gene_protein`: the protein sequence of the best hit in GMGC
+
+### Alignment category
+
+- `EXACT`: at least 95% nucleotide identity with at least 95% coverage. As
+   unigenes in the GMGC represent 95% nucleotide clusterings (species-level
+   threshold), this would mean that the query gene would have clustered with
+   the GMGC unigene.
+- `SIMILAR`: at least 80% amino acid identity with at least 80% coverage.
+- `MATCH`: at least 50% amino acid identity with at least 50% coverage.
+- `NO MATCH`: no match in GMGC.
+
+
+## Genome bins (`genome_bin.tsv`)
+
+Genome bins (MAGs) found in the results (and a count of how often many genes
+are contained in them).
+
+There are two columns in the file.
+
+- `genome_bin`: the name of genome bins in GMGC
+- `times_gene_hit`: the times of input genes hitting it 
+
+Note that GMGC unigenes can while not all GMGC unigenes are contained in a
+genome bin, some are contained in many. Thus, the total counts will not (except
+by coincidence) correspond to the number of genes queried.
+
+## Summary (`summary.txt`)
+
+Human-readable summary of the results.
+
diff --git a/setup.py b/setup.py
@@ -30,7 +30,7 @@
           'tqdm',
       ],
       package_data={
-             'docs': ['*.md']},
+             'gmgc_finder': ['*.md']},
       zip_safe=False,
       entry_points={
             'console_scripts': ['gmgc-finder=gmgc_finder.main:main'],