The gtf_extract
utility extracts selected data items from a GTF file and output in tab-delimited format.
Note
The program can also operate on GFF files provided the --gff
option is specified.
General usage syntax:
gtf_extract OPTIONS <gft_file>
Options:
--version
show program's version number and exit
-h, --help
show the help message and exit
-f FEATURE_TYPE, --feature=FEATURE_TYPE
only extract data for lines where feature is FEATURE_TYPE
--fields=FIELD_LIST
comma-separated list of fields to output in tab-delimited format for each line in the GTF, e.g. chrom,start,end
.
Fields can either be a GTF field name (i.e. chrom
, source
, feature
, start
, end
, score
, strand
and frame
), or the name of an attribute (e.g. gene_name
, gene_id
etc).
Data items are output in the order they appear in FIELD_LIST
. If a field doesn't exist for a line then '.'
will be output as the value.
-o OUTFILE
write output to OUTFILE (default is to write to stdout)
--gff
specify that the input file is GFF rather than GTF format
The program outputs a tab-delimited line of data for each matching line found in the input GTF file; the data items in the line are those specified by the --fields
option (or else all data items, if no fields were specified).
For example, for --fields=chrom,start,end,strand
, the GTF line:
chr1 HAVANA gene 11869 14412 . + . gene_id "ENSG00000223972.4" ...
will produce the output:
chr1 11869 14412 +
By default the output of the program is written to stdout; use the -o
option to direct the output to a named file instead.