Parsing tools for GTF (gene transfer format) files
Clone or download
iskandr Merge pull request #15 from openvax/get-rid-of-unix-specific-memory-u…
…sage-logging

deleted util module, which provided UNIX-specific access to memory usage
Latest commit c79cab0 Oct 10, 2018

README.md

Build Status Coverage Status PyPI

gtfparse

Parsing tools for GTF (gene transfer format) files.

Example usage

Parsing all rows of a GTF file into a Pandas DataFrame

from gtfparse import read_gtf

# returns GTF with essential columns such as "feature", "seqname", "start", "end"
# alongside the names of any optional keys which appeared in the attribute column
df = read_gtf("gene_annotations.gtf")

# filter DataFrame to gene entries on chrY
df_genes = df[df["feature"] == "gene"]
df_genes_chrY = df_genes[df_genes["seqname"] == "Y"]

Getting gene FPKM values from a StringTie GTF file

from gtfparse import read_gtf

df = read_gtf(
    "stringtie-output.gtf",
    column_converters={"FPKM": float})

gene_fpkms = {
    gene_name: fpkm
    for (gene_name, fpkm, feature)
    in zip(df["gene_name"], df["FPKM"], df["feature"])
    if feature == "gene"
}