# **`gos`** by example

This notebook demonstrates the core features of `gos`:

- Author declarative genomics visualizations which adhere to the [Gosling](http://gosling-lang.org/) JSON Specification 
- Render visualizations directly in notebook
- Combine local, remote, and in-memory genomics data sources in visualizations
- Control a Gosling visualization from Python

Start by importing `gosling`. It is a convention to import as `gos` and then access the API through this namespace. 

In [4]:
import gosling as gos

## Creating `gos.Track`s and `gos.View`s

**`gos`** exposes two fundemental building-blocks for genomics visualizatinos provided by the Gosling grammar:

- `gos.Track`
- `gos.View`

A _Track_ is the core component of a genomics visualization that defines explict **transformations** and **mappings** of genomics data to **visual properties**. A _Track_ ay be composed with other _Tracks_ or **grouped** into a _View_ that share the same linked genomic domain.

Every _Track_ therefore binds a data source. In `gos` we define a CSV dataset and bind it to a _Track_. We will start by loading a CSV containing UCSC hg38 cytoband information

In [9]:
csv_url = "https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.csv"

data = gos.csv(
    url=csv_url,
    chromosomeField="Chromosome",
    genomicFields=["chromStart", "chromEnd"],
)

gos.Track(data)

Track({
  data: {'type': 'csv', 'url': 'https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.csv', 'chromosomeField': 'Chromosome', 'genomicFields': ['chromStart', 'chromEnd']},
  height: 180,
  mark: 'bar',
  width: 800
})

The _Track_ above is bound to the genomics data, but we haven't declared how to map and transform these data to visual properties. For this, we will use the `gos.Track.mark_*()` and `gos.Track.encode()` method to declare which visual encoding to use and how to map and transform these data to this encoding. 

In [22]:
track = gos.Track(data).mark_rect().encode(
    # defines start and end of rectangle mark
    x=gos.X("chromStart:G", axis="top"),
    xe=gos.Xe("chromEnd:G"),
    # defines how to map values in "Stain" column to colors
    color=gos.Color(
        "Stain:N", 
        domain=["gneg", "gpos25", "gpos50", "gpos75", "gpos100", "gvar"],
        range=["white", "#D9D9D9", "#979797", "#636363", "black", "#A0A0F2"]
    ),
    # customize the style of the visual marks. 
    size=gos.value(20),
    stroke=gos.value("gray"),
    strokeWidth=gos.value(0.5)
)
track

Track({
  color: Color({
    domain: ['gneg', 'gpos25', 'gpos50', 'gpos75', 'gpos100', 'gvar'],
    range: ['white', '#D9D9D9', '#979797', '#636363', 'black', '#A0A0F2'],
    shorthand: 'Stain:N'
  }),
  data: {'type': 'csv', 'url': 'https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.csv', 'chromosomeField': 'Chromosome', 'genomicFields': ['chromStart', 'chromEnd']},
  height: 180,
  mark: 'rect',
  size: SizeValue({
    value: 20
  }),
  stroke: StrokeValue({
    value: 'gray'
  }),
  strokeWidth: StrokeWidthValue({
    value: 0.5
  }),
  width: 800,
  x: X({
    axis: 'top',
    shorthand: 'chromStart:G'
  }),
  xe: Xe({
    shorthand: 'chromEnd:G'
  })
})

Our `gos.Track` now is fully specified, however, the Gosling grammar defines the root of every visualization as a _View_, which may contain one or more _Tracks_. Therefore in order to complete a Gosling specification for the track in isolation, we use the `gos.Track.view()` method to cast the track within a `gos.View`.

In [21]:
track.view()

Additional parameters for the `gos.View` can be passed in as well for convenience. We can easily set a `title` and `xDomain` for our _View_, initializing the visualization to display "chr1". 

> Notice how we reuse the `track` instance to create new, modified views. This is a common pattern in **`gos`**.

In [27]:
track.view(
    title="Gos is awesome!",
    xDomain=gos.GenomicDomain(chromosome="chr1"),
)

In [1]:
import gosling as gos
import tracks

lollipop_height = 200

clin_var_beddb = gos.beddb(
    "../data/agg/clinvar.bed.beddb",
    genomicFields=[{"index": 1, "name": "start"}, {"index": 2, "name": "end"}],
    valueFields=[{"index": 7, "name": "significance", "type": "nominal"}],
)

categories = [
    "Benign", "Benign/Likely_benign", "Likely_benign", 
    "Uncertain_significance", "Likely_pathogenic", 
    "Pathogenic/Likely_pathogenic", "Pathogenic",
]

colors = ["#5A9F8C", "#5A9F8C", "#029F73", "gray", "#CB96B3", "#CB71A3", "#CB3B8C"]

clin_var_multivec = gos.multivec(
    "../data/agg/density.multires.mv5",
    row="significance",
    column="position",
    value="count",
    categories=categories,
    binSize=4,
)

dy = lollipop_height / len(categories) / 2 # workaround to offset
strips = gos.Track(clin_var_beddb).mark_bar().encode(
    x="start:G",
    y=gos.Y("significance:N", domain=categories, range=[lollipop_height + dy, dy], baseline="Uncertain_significance", legend=True),
    ye=gos.value(lollipop_height/2),
    size=gos.value(1),
    color=gos.value("lightgray"),
    stroke=gos.value("lightgray"),
    strokeWidth=gos.value(1),
    opacity=gos.value(0.3),
).visibility_lt(
    measure="zoomLevel",
    target="mark",
    threshold=100000,
    transitionPadding=100000,
)

points = gos.Track(clin_var_beddb).mark_point().encode(
    x="start:G",
    color=gos.Color("significance:N", domain=categories, range=colors),
    row=gos.Row("significance:N", domain=categories),
    size=gos.value(7),
    opacity=gos.value(0.8),
).visibility_lt(
    measure="zoomLevel",
    target="mark",
    threshold=1000000,
    transitionPadding=1000000,
)

bars = gos.Track(clin_var_multivec).mark_bar().encode(
    x="start:G",
    xe="end:G",
    y=gos.Y("count:Q", axis="none"),
    color=gos.Color("significance:N", domain=categories, range=colors, legend=True)
).visibility_gt(
    measure="zoomLevel",
    target="mark",
    threshold=500000,
    transitionPadding=500000,
)

lolipop = gos.overlay(strips, points, bars).properties(
    width=725,
    height=lollipop_height,
    xDomain=gos.GenomicDomain(chromosome="13", interval=[31500000, 33150000]),
)

# TODO: Need to specify top-level 
view = gos.stack(
    tracks.gene_annotation.properties(id="view1", height=95, width=725),
    lolipop.properties(height=lollipop_height, width=150),
).properties(
    id="view1",
    xDomain=gos.GenomicDomain(chromosome="13", interval=[31500000, 33150000]),
)

view

In [2]:
w = view.widget()
w

GoslingWidget()