Skip to content

fiszbein-lab/emats-genes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

emats-genes

DOI

The code in this repository was used to identify EMATS genes in Uriostegui et al., 2023.

EMATS, exon-mediated activation of transcription starts, is a phenomenon in which efficient exon splicing stimulates proximal, upstream transcription initiation from a weak promoter (Fiszbein et al., 2019). To select genes that host this architecture, we define the following:

  • a weak promoter has a median alternative first exon percen spliced-in (PSI) value less than the dataset wide median—i.e., first exon usage serves as a proxy for promoter usage; and
  • a strong skipped exon has a median skipped exon PSI value greater than the dataset-wide median.

If a weak promoter's transcription start site is then within 5 kilobases and upstream from the skipped exon's 3' splice site, the host gene is defined an EMATS gene.


Tables

In Uriostegui et al., 2023, we applied to above criteria to 17,350 GTEx samples spanning 54 tissue sub-types, generating an organism-wide gene set, tables/emats-genes.tsv, as well as a gene set for each tissue, tables/tissue-specific-emats-genes.tsv. The former has the format

Column Description
gene-id The EMATS gene ID.
gene-name The EMATS gene name.
first-exon The first exon in generic genome-browser format, e.g., chr1:100-200.
skipped-exon The skipped exon in generic genome-browser format.
kb-distance The kilobase distance between the first and skipped exons, computed as described above.
strand The occupied strand, plus or minus for forward or reverse.

whereas in the latter, column 1 is gene-id, column 2 is gene-name, and the remaining columns are tissues, with 1 indicating the gene is EMATS-specific to the column's tissue.