Skip to content

Latest commit

 

History

History
38 lines (29 loc) · 1.77 KB

README.md

File metadata and controls

38 lines (29 loc) · 1.77 KB

GffRead usage examples

GffRead can be used to simply read an annotation file in a GFF format, and print it in either GFF3 (default) or GTF2 format (with the -T option), while discarding any non-trasncript features and optional attributes. It can also report some potential issues found in the input GFF records. The command line for such a quick GFF/GTF file cleanup would be:

gffread -E annotation.gff -o ann_simple.gff

This will create a minimalist GFF3 re-formatting of the transcript records found in the input file (annotation.gff in this example). The -E option directs GffRead to "expose" (display warnings about) any potential formatting issues encountered while parsing the input file.

In order to obtain the GTF2 version of the same transcript records, the -T option should be added:

gffread annotation.gff -T -o annotation.gtf

GffRead can be used to generate a FASTA file with the DNA sequences for all transcripts in a GFF file. For this operation a fasta file with the genomic sequences has to be provided as well. This can be accomplished with a command line like this:

gffread -w transcripts.fa -g genome.fa annotation.gff

The file genome.fa in this example would be a multi-fasta file with the chromosome/contig sequences of the target genome. This also requires that every contig or chromosome name found in the 1st column of the input GFF file (annotation.gff in this example) must have a corresponding sequence entry in the genome.fa file.

gffread --table @id,@chr,@start,@end,@strand,@exons,Name,gene,product \
  -o annotation.tbl annotation.gff

This shows how the --table option can make a tab delimited table out of a GFF3 input.

The output directory contains all the output files that should be generated by the above examples.