# Clinical Variant Explorer 

In this notebook, we will create a custom visualization to view human genetic variants from the [ClinVar database](https://www.ncbi.nlm.nih.gov/clinvar/) genome-wide. 

In [None]:
#! pip install gosling[all]==0.0.8
import gosling as gos

## Load gene annotation track

In `gos`, a _Track_ is a composable element which may be combined with other tracks to create more sophisticated visualizations. 

The `./tracks.py` file relative to this notebook contains a predefined track which displays gene annotation information for the `hg38` assembly. It is fully specified in `gos`, and can be imported just like any Python module.

In [None]:
import sys
import os

if 'google.colab' in sys.modules:
    # make sure `./tracks.py` from GitHub repo is available if in Colab
    os.system('curl -s https://raw.githubusercontent.com/gosling-lang/gos-example/main/notebooks/tracks.py -o tracks.py')

In [None]:
import tracks # import the ./tracks.py module

# print(tracks.gene_annotation_track()) # print the literal definition
tracks.gene_annotation_track()

In [None]:
# Set initial domain to chromosome 17
tracks.gene_annotation_track().properties(
    xDomain=gos.GenomicDomain(chromosome="chr17")
)

We will combine this track with our custom variant visualization to provide context when navigating the viewer.

## Create the variant track

The data sources for this section are derived from the VCF at:

- https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz

The full preprocessing workflow can be run via `snakemake` at the root of this repository, but we provide preprocessed versions of the datasets at your convenience. Comments in the notebook will highlight where to replace URLs if you are interested in running these steps on your own. The two derived ClinVar datasets are:

- `clinvar.bed.beddb` - the individual variants and classifications in multires bed-like format

- `density.multires.mv5` - a precomputed, multiresolution aggregation of variant classifications genome-wide

Our dataset includes sevaral categories of clinical relevance, and we make a mapping of each category to a color for our visualization.

In [None]:
categories = [
    "Benign", 
    "Benign/Likely_benign", 
    "Likely_benign", 
    "Uncertain_significance",
    "Likely_pathogenic", 
    "Pathogenic/Likely_pathogenic",
    "Pathogenic",
]

colors = ["#5A9F8C", "#5A9F8C", "#029F73", "gray", "#CB96B3", "#CB71A3", "#CB3B8C"]

For the visualization, we will use an advanced feature of Gosling called [**semantic zooming**](http://gosling-lang.org/docs/semantic-zoom/) which allows users to dynamically switch between visual representations upon zooming in and out.


### 1.) Lolipop represenation of individual variants

First we encode the individual variants using a lolipop representation, which layers `bar` and `point` marks.

In [None]:
# define our data source 
variants = gos.beddb(
    url="https://server.gosling-lang.org/api/v1/tileset_info/?d=clinvar-beddb",
    # Alternatively, the BEDDB file can be greated locally via Snakemake. See README.md for details.
    # url="../data/agg/clinvar.bed.beddb", 
    genomicFields=[{"index": 1, "name": "start"}, {"index": 2, "name": "end"}],
    valueFields=[{"index": 7, "name": "significance", "type": "nominal"}],
)

# some constants
lollipop_height = 200
dy = lollipop_height / len(categories) / 2

strips = gos.Track(variants).mark_bar().encode(
    x="start:G",
    y=gos.Y(
        "significance:N",
        domain=categories,
        range=[lollipop_height + dy, dy],
        baseline="Uncertain_significance",
    ),
    ye=gos.value(lollipop_height/2),
    size=gos.value(1),
    color=gos.value("lightgray"),
    stroke=gos.value("lightgray"),
    strokeWidth=gos.value(1),
    opacity=gos.value(0.3),
).transform_filter('significance', oneOf=categories).properties(height=lollipop_height)

# just the "handles" of the lolipop
strips.view()

In [None]:
# just the "tops" of the lolipop
points = gos.Track(variants).mark_point().encode(
    x="start:G",
    color=gos.Color("significance:N", domain=categories, range=colors),
    row=gos.Row("significance:N", domain=categories),
    size=gos.value(7),
    opacity=gos.value(0.8),
).properties(height=lollipop_height)

points.view()

In [None]:
# combined
gos.overlay(strips, points)

### 2.) Density representation of variant classifications

Next we use a `bar` encoding to display the multiresolution aggregate density of each variant classification.

In [None]:
density = gos.multivec(
    url="https://server.gosling-lang.org/api/v1/tileset_info/?d=clinvar-multivec",
    # Alternatively, the multivec file can be greated locally via Snakemake. See README.md for details.
    # url="../data/agg/density.multires.mv5",
    row="significance",
    column="position",
    value="count",
    categories=categories,
    binSize=4,
)

bars = gos.Track(density).mark_bar().encode(
    x="start:G",
    xe="end:G",
    y=gos.Y("count:Q", axis="none"),
    color=gos.Color("significance:N", domain=categories, range=colors, legend=True)
)

bars.view().properties(xDomain=gos.GenomicDomain(chromosome="chr17"))

## Combined semantic zoom track

Now that we have defined our visualizations in isolation, we can combine them into a single view that switches the visual enocoding when zooming. We the semantic zoom properties with `visibility_lt` and `visibility_gt` options.

In [None]:
lolipop = gos.overlay(
    strips.visibility_lt(
        measure="zoomLevel",
        target="mark",
        threshold=100000,
        transitionPadding=100000,
    ),
    points.visibility_lt(
        measure="zoomLevel",
        target="mark",
        threshold=1000000,
        transitionPadding=1000000,
    ),
    bars.visibility_gt(
        measure="zoomLevel",
        target="mark",
        threshold=500000,
        transitionPadding=500000,
    )
)
lolipop.properties(
    xDomain=gos.GenomicDomain(chromosome="17"),
)

# Final visualization

Finally we can add the gene annotation track with our custom lolipop visualization for the final exploratory viewer.

In [None]:
view = gos.stack(
    tracks.gene_annotation_track().properties(id="view1", height=95, width=725),
    lolipop,
).properties(
    id="view1",
    xDomain=gos.GenomicDomain(chromosome="13", interval=[31500000, 33150000]),
)

view

# Controlling the viewer from Python with `ipywidgets`

A Gosling visualization only defines the initial view location for the visualization which is rendered automatically in the notebook. This default behavior is useful for experimenting with visual encodings for a dataset, but there is limited control from Python of the resulting viewer. 

In `gos` rendering a visualization is completely decoupled from the core Python API, allowing alternative renders to be configured for other use cases. We create a Jupyter Widget ([`gosling-widget`](https://github.com/gosling-lang/gosling-widget)) which allows the viewer itself to be controlled within Python.

An instance of `GoslingWidget` can be created for any Gos visualization by calling the `.widget()` method. This returns a "live" viewer which can be interacted within Python.

In [None]:
widget = view.widget()
widget

We can now call the `widget.zoom_to()` API to navigate the viewer from Python!

In [None]:
widget.zoom_to(view.id, "chr17") # zoom our view to a particular chromosome

In [None]:
import ipywidgets 

# A dropdown to navigate the viewer to particular genomic regions
dropdown = ipywidgets.Dropdown(
    options=[
        ("TP53", "chr17:7668421-7687490"),
        ("TNF", "chr6:31575565-31578336"),
    ],
    description='Gene:',
)

dropdown.observe(lambda c: widget.zoom_to(view.id, c.new) if c.type == 'change' and c.name == 'value' else None)

ipywidgets.VBox([dropdown, widget])