Skip to content

Commit

Permalink
DOC Better output.md
Browse files Browse the repository at this point in the history
Fix some grammar mistakes. More explicit information
  • Loading branch information
luispedro committed Jun 12, 2020
1 parent cad842b commit b4106f6
Showing 1 changed file with 15 additions and 16 deletions.
31 changes: 15 additions & 16 deletions docs/output.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# Output
# Output of GMGC-finder

Explanation of the files in the output
Explanation of the files in the output directory

## Prodigal output

These three files are the output prodigal.
These three files are the output of prodigal (if GMGC-finder was called in
genome mode)

- `prodigal_out.faa` protein sequence
- `prodigal_out.fna` DNA sequence
Expand All @@ -17,8 +18,8 @@ The results of the queries to the GMGC.

There are five columns in the file.

- `query_name`: the name/id of the input genome contig
- `gene_id`: the Unigene with the best score in GMGC
- `query_name`: the name/id of the input gene
- `gene_id`: the Unigene with the best score in the GMGC
- `align_category: there are four different classes of alignment (see below)
- `gene_dna`: the DNA sequence of the best hit in GMGC
- `gene_protein`: the protein sequence of the best hit in GMGC
Expand All @@ -36,28 +37,26 @@ There are five columns in the file.

## Genome bins (`genome_bin.tsv`)

Genome bins (MAGs) found in the results (and a count of how often many genes
are contained in them).
Genome bins (MAGs) found in the results (and a count of how many genes are
contained in them).

There are two columns in the file.

- `genome_bin`: the name of genome bins in GMGC
- `times_gene_hit`: the times of input genes hitting it

Note that GMGC unigenes can while not all GMGC unigenes are contained in a
genome bin, some are contained in many. Thus, the total counts will not (except
by coincidence) correspond to the number of genes queried.
Note while not all GMGC unigenes are contained in a genome bin, some are
contained in many. Thus, the total counts will not (except by coincidence)
correspond to the number of genes queried.

## Summary (`summary.txt` and `runlog.yaml`)

The file `summary.txt` provides a human-readable summary of the results, while
`runlog.yaml` is a summary of run (as a YaML file, it is both machine and
human-readable).

`runlog.yaml` is a summary of run metadata (as a YaML file, it is both machine
and human-readable).

The file `summary.txt` should be reproducible and running GMGC-finder twice on
the same input should produce the same results. By design, though,
`runglog.yaml` includes information on timing and, thus, is not reproducible.


`runglog.yaml` includes information such as the time when the analysis was run
which is not reproducible.

0 comments on commit b4106f6

Please sign in to comment.