GRIN2

GRIN2 (Genomic Random Interval, version 2) identifies genes that are recurrently hit by genomic lesions — copy-number gains/losses, SNV/indels, fusions, and structural variants — across a cohort, and tests whether that recurrence is greater than expected by chance. It produces a sortable Top Genes table and a genome-wide Manhattan plot.

It is based on the GRIN method: Pounds S, et al. A genomic random interval model for statistical analysis of genomic lesion data. Bioinformatics 2013;29(17):2088–95. doi:10.1093/bioinformatics/btt372.

Running GRIN2

Open the GRIN2 chart for a dataset (e.g. from the chart menu in the Mass app).
Apply a cohort filter to choose which samples to analyze (e.g. a diagnosis or subtype).
In the controls, check the data types to include (SNV/indel, CNV, Fusion, SV — only those available for the dataset are shown) and set their options (below).
Click Run GRIN2.

The analysis runs server-side and results are cached, so re-running the same filter + options returns instantly.

Options

SNV/indel

Consequences — which mutation classes to include (missense, frameshift, nonsense, splice, etc.). Defaults to the protein-changing set (plus start-lost/stop-lost). Use Select All / Clear All / Default to adjust. If none are selected, all classes are included.
MAF filter (if the dataset provides it) — keep only mutations whose variant allele fraction (VAF) passes a threshold (e.g. VAF > 0.1). The filter term can pool allele counts across assays — for the ASH dataset the Tumor DNA term sums WGS + WES read counts; you can also filter on a single assay.

CNV

CNV calls are segment log2 ratios. A segment counts as a lesion when its value crosses a threshold:

Loss Threshold — a segment is a loss when its log2 ratio ≤ this value (default −0.4).
Gain Threshold — a segment is a gain when its log2 ratio ≥ this value (default 0.4; a dataset may set its own default, e.g. 0.3).
Max Segment Length — segments longer than this (in bp) are dropped before analysis (default 2,000,000 = 2 Mb; set to 0 to disable). This prevents a single very large passenger CNV from inflating significance for every gene it spans.

Tightening the thresholds (e.g. −0.1/0.1 → −0.4/0.3) removes low-amplitude calls; the Max Segment Length cap controls broad arm/chromosome-scale events.

Fusion / SV

Included as lesions when the corresponding data type is checked; no per-type cutoffs.

Exclude artifact genes (recommended, on by default)

Removes genes that sit in known artifact regions before the statistics run, so the table is not dominated by non-driver loci that recur for technical or germline reasons (olfactory-receptor clusters, HLA, FAM90, POTE, GOLGA8, APOBEC3, KANSL1, etc.).

Exclude artifact genes (checkbox, default on) — toggles the mask.
Min gene overlap (default 0.5) — a gene is excluded only when at least this fraction of its span lies inside a masked region. The 0.5 default removes genes that sit inside artifact regions while sparing real drivers that merely abut one (e.g. KRAS overlaps a segmental duplication by ~10% and is kept).

The mask is the union of four hg38 region sets:

Layer	Source	What it removes
Blacklist	ENCODE / Kundaje GRCh38 unified blacklist	anomalous-signal / low-mappability regions
Segmental duplications	UCSC `genomicSuperDups`	duplicated, recombination-prone, paralogous regions
Assembly gaps	UCSC `gap` + `centromeres`	centromeres, telomeres, gaps
Common germline CNVs	DGV Gold Standard, frequency ≥ 1%	loci that are copy-number-polymorphic in the normal population

The germline-CNV layer is essential: loci like OR clusters, HLA class II, and KANSL1 are well-mapped but copy-number-variable in healthy people, so mappability/blacklist alone does not catch them.

When the mask runs, the results panel shows a Region Mask (excluded artifact genes) section reporting how many genes were excluded, a few examples, and the genome fraction masked (~14% with all four layers).

References: ENCODE blacklist — Amemiya et al., Sci Rep 2019, doi:10.1038/s41598-019-45839-z; segmental duplications — Sharp et al., AJHG 2005, doi:10.1086/431652; DGV — MacDonald et al., NAR 2014, doi:10.1093/nar/gkt958; region-exclusion practice — Ogata et al. (excluderanges), Bioinformatics 2023, doi:10.1093/bioinformatics/btad198.

Display

Max genes to show — number of rows in the Top Genes table (default 500).
Significance (q-value) threshold — q-values below this (default 0.05) are flagged in the table/tooltips and shown as interactive points in the Manhattan plot.

Reading the results

Top Genes table

For each gene and each included lesion type the table reports:

P-value (Gain/Loss/Mutation/…) — probability of seeing at least the observed number of affected subjects under the random-interval null.
Q-value (…) — the p-value adjusted for multiple testing (FDR). Use the q-value to judge significance (e.g. q < 0.05).
Subject Count (…) — number of subjects with that lesion type overlapping the gene.

When more than one lesion type is analyzed, additional N Lesion Types columns appear (1/2/3 …). These are constellation tests that combine evidence across lesion types — e.g. the "2 Lesion Types" q-value flags genes significant when gain+loss (or mutation+CNV) are considered jointly. A gene driven by a single type will be most significant in that type's column; a gene hit by several types will rise in the multi-type columns.

Sort any column by clicking its header. The table is sorted by overall significance by default.

Manhattan plot

Each point is a gene at its genomic position; the y-axis is −log₁₀(q-value). Colors distinguish lesion types (mutation, gain, loss, fusion, SV). Points above the significance threshold are interactive (hover for gene, type, subject count, q-value). The y-axis auto-scales/caps for very significant peaks.

How the statistics work

GRIN models each lesion as an interval placed at a random genomic location, and asks how often a gene would be overlapped by chance. For each gene it computes the probability that k or more subjects are hit (a Bernoulli convolution over per-subject hit probabilities), giving a p-value per lesion type. P-values are converted to q-values by Benjamini–Hochberg FDR correction. Multi-type constellation p-values are derived from the per-type p-value order statistics, so a gene recurrently hit by several lesion types can be significant even if no single type is.

Because the null assumes lesions land uniformly at random, artifact-dense regions (low-mappability, segmental-duplication, germline-CNV loci) accumulate spurious recurrence — which is why the Exclude artifact genes mask is on by default.

Tips & caveats

Always read q-values, not p-values, for significance.
Keep the artifact mask on unless you have a specific reason to inspect raw results; with it off, expect OR/HLA/segdup/germline-CNV genes near the top.
CNV thresholds and Max Segment Length are the main knobs for controlling how many CNV lesions enter the analysis.
Genuine drivers that happen to sit ≥50% inside an artifact region would be excluded by the mask — check the excluded-gene summary if a gene you expect is missing.
Results are cached on the analysis inputs (filter + options); changing a display-only setting reuses the cached statistics.

Home
Installation
Development
Usage
- Embedding
- URL parameters
ProteinPaint Portals
Plots
Tracks
Apps
- Data Browser
Publications
- St. Jude Survivorship Portal Data Releases
References

GRIN2

GRIN2

Running GRIN2

Options

SNV/indel

CNV

Fusion / SV

Exclude artifact genes (recommended, on by default)

Display

Reading the results

Top Genes table

Manhattan plot

How the statistics work

Tips & caveats

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally