# Gosling Demo

In this notebook, I am going to do an example gosling demo. This will include:
1. A basic gosling visualization of gene expression data
2. Several tutorials from the 2022 ISMB demo
3. A recreation of [this](https://gosling.js.org/?example=LINKING) visualization using the gos package

In [1]:
# imports
import gosling as gos

#### 1. Gene expression data (warmup)
To start, I am making a basic visual of gene expression data using the BIGWIG ChIP-seq dataset in the gosling documentation. This is to test out different configurations and how they impact a visualization.

Dataset: [BigWig ChIP-seq for H2AZ](https://s3.amazonaws.com/gosling-lang.org/data/4DNFIMPI5A9N.bw), from gosling plain datasets

In [3]:
# create a new track
vis = gos.Track(
    data=gos.BigWigData(
        'bigwig',
        url='https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/a57e91de-47f9-4157-9462-50fc78f357e3/4DNFIMPI5A9N.bw',
        # setting this value allows you to update the value of y!
        #column='position'
        value='line'
    ),
).mark_area(
    # mark_area: sets the output to an area graph
    # other option that may apply here --> mark_line
).encode(
    # can use this format for x, or set x to the column setting from the data declaration
    x=gos.X('position:G', domain=gos.DomainChr('chr2')),
    # DomainChr: allows you to zoom on a specific chromosome
    y='line:Q'
).view()

# can also save as html to view in browser!
display(vis)
vis.save('visuals/single-track/basic-gene-expr.json')

### 2. Tutorials
In this section, I'm going to learn more about writing in-depth gosling specs via the ISMB tutorials. 

#### Single Track
Data: [BED file](https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.bed) with cytoband information

In [4]:
# import data into csv wrapper
data=gos.csv(
    url='https://raw.githubusercontent.com/sehilyi/gemini-datasets/master/data/UCSC.HG38.Human.CytoBandIdeogram.bed',
    separator="\t",
    # identify headers and genomic fields
    headerNames=['chrom', 'chromStart', 'chromEnd', 'name', 'stain'],
    chromosomeField="chrom",
    genomicFields=["chromStart", "chromEnd"]
)

# add data to track and view
view = gos.Track(data).mark_point().encode(
    # apply visual encoding
    x=gos.X('chromStart', type='genomic'),
    # or: x=gos.X('chromStart:G')
    # including the Y code below will add a y axis with same data
    #y=gos.Y('chromStart', type='genomic',axis='left'),
    color=gos.value('lightblue')
).view()

# save as json and html (html for easy visual purposes)
view.save('visuals/single-track/tutorial-intro.json')
display(view)

In [6]:
'''
exercise: add following changes to code:
- add a size encoding with the constant value 10
- change the mark type from point to triangleRight
- add a y encoding to use the "chromEnd" field instead of "chromStart"
- change the width and height of the track to be 500
'''
tracky = gos.Track(data).mark_triangleRight().encode(
    x=gos.X("chromStart", type="genomic"),
    y=gos.Y("chromEnd", type="genomic", axis="left"),
    # additional encodings ...
    size=gos.value(10)

).properties(
    # track property overrides ...
    width=500,
    height=500
).view()

# view as html!
tracky.save('visuals/single-track/tutorial-exercise.json')
display(tracky)

Next, we will expand this single-track visual by adding multiple domains and ranges, allowing for us to stratify each domain by color

In [7]:
color_strat= gos.Track(data).mark_point().encode(
    x=gos.X('chromStart:G'),
    y=gos.Y('stain:N'),
    color=gos.Color(
        # separate by color
        'stain:N',
        domain=["gneg", "gpos25", "gpos50", "gpos75", "gpos100", "gvar"],
        # define color per domain (recall mapping from domain to range!)
        range=['#265653', '#2A9D8F', '#8AB17D', '#E9C46A', '#F4A261', '#E76F51'],
        # add legend
        legend=True
    )
).view()
color_strat.save('visuals/single-track/stratified-color.json')
display(color_strat)

#### Combining Tracks
Now that we can do simple tracks, we can move on to composite tracks. This will use four subsets of a scATAC-seq dataset from Corces et al (2020).

In [13]:
# access data through url list
urls = [
    f"https://s3.amazonaws.com/gosling-lang.org/data/{file}"
    for file in [
        "ExcitatoryNeurons-insertions_bin100_RIPnorm.bw",
        "InhibitoryNeurons-insertions_bin100_RIPnorm.bw",
        "Microglia-insertions_bin100_RIPnorm.bw",
        "Astrocytes-insertions_bin100_RIPnorm.bw",
    ]
]

# use data from first url
data = gos.bigwig(
    urls[0], 
    column='position', 
    value='peak'
)

# use base track to derive other tracks --> do not have to repeat defs
base = gos.Track(data).encode(
    x=gos.X('position:G')
).properties(height=100)

heatmap = base.mark_rect().encode(
    # color code based on the size of the peaks to create a heatmap
    color=gos.Color('peak:Q')
).view()

line_graph = base.mark_line().encode(
    # create line graph
    y=gos.Y("peak:Q"),
    color=gos.value("black")
).properties(title="")
display(line_graph.view().properties(title="Line view"))

points = line_graph.mark_point()

colored_points = points.encode(
    # add color to points corresponding w/ height
    color=gos.Color('peak:Q'),
).properties(title="")
display(colored_points.view().properties(title="Colored point view"))

# overlay the two views.
overlaid_view = gos.overlay(line_graph, colored_points)

overlaid_view.save('visuals/single-track/single_track_overlay.json')
display(overlaid_view.properties(title="Overlaid view"))


Now that we can reuse tracks to change our visual and add overlays, we can now use the same track definition to create visuals of other data sources!

In [17]:
# create function to generate barplot for any scATAC-seq
def barplot(url: str, title: str=None, color: str=None):
    data=gos.bigwig(
        url=url, 
        column='position', 
        value='peak'
    )
    track=gos.Track(data).mark_bar().encode(
        x=gos.X('position:G'),
        y=gos.Y('peak:Q', axis='right')
    )
    if color:
        track=track.encode(color=gos.value(color))
    if title:
        track=track.properties(title=title)
    return track.properties(height=80)

# iterate through and save each json!
for i in range(0, len(urls)):
    plot = barplot(urls[i], f'Iterative view of plot {i}').view()
    plot.save(f'visuals/single-track/iterative-view-{i}.json')
    display(plot)

Suppose we want to align these plots. We can zip the colors and tracks such that
the colors will be divided based on url and the barplots can be aligned.

In [9]:
all_tracks = []

for url, color in zip(urls, ['#2A9D8F', '#8AB17D', '#E9C46A', '#F4A261']):
    title = url.split("/")[-1].split("-")[0]
    track = barplot(url=url, title=title, color=color)
    all_tracks.append(track)

# use gos.stack to stack into a view
gos.stack(*all_tracks).properties(
    xDomain=(gos.GenomicDomain(chromosome='1'))
).save('visuals/single-track/stacked_views.json')

# and, for good measure, a craaaazy mess of a visualization: overlay
gos.overlay(*all_tracks).properties(
    xDomain=(gos.GenomicDomain(chromosome='1')),
).save('visuals/single-track/overlaid_bars.json')
# what a terrible visualization. cool i could make it though :D

#### 3. Visual Linking
The goal of this section is to recreate [this](https://gosling.js.org/?example=LINKING) visualization using gos

How is this being structured?
- view with 3 tracks
    - one subview (horizontal) and one track

To do this, we can structure from the individual tracks up to the views (recall that alignment happens in the view level!)

We can complete this by writing each track individually, then combining them into a big view. How do you eat an elephant? You chunk it down.

In [25]:
# first, define the data
data=gos.MultivecData(
        type='multivec',
        url='https://server.gosling-lang.org/api/v1/tileset_info/?d=cistrome-multivec',
        row='sample',
        column='position',
        value='peak',
        # define categories for use in individual overlays!
        categories = ['sample 1', 'sample 2', 'sample 3', 'sample 4'],
)

In [40]:
# create upper left track
base_chart=gos.Track(
    data=data,
    ).mark_bar().encode(
        # use start and end to define x
        row='sample:N',
        x=gos.X('start:G', 
                domain=gos.GenomicDomain(gos.DomainChr(chromosome='chr1')),),
        xe='end:G',
        y='peak:Q',
        color=gos.Color('sample:N'),
    )

# view chart
#base_chart.view().save('visuals/recreation/base_view.json')

We will use this base chart as both our linear and circular visuals— both will have the brush functionality sweeping over the chart. Advantage of using a track is that you can make two views with it!!

Now, we are going to make the brush and detailed view. The detailed view will be based on our initial base_chart, but will have a smaller genomic domain.

In [51]:
detailed_track=base_chart.encode(
    x=gos.X(
        'start:G',
        # use style to set background color + opacity
        domain=gos.GenomicDomain(gos.DomainChrInterval(chromosome='chr1', interval=[160000000, 200000000])),
        # use linking id
        linkingId='brushId'
    ),
    style=gos.Style(background='steelblue', backgroundOpacity=0.2),
).properties(
    width=690,
    height=200,
    title="Detailed View"
)

# view chart
#detailed_track.view().save('visuals/recreation/detail_view.json')

Next, we can make the brush track views. This will entail 
1. Making the circle_view from the base view and adding a brush
2. Making the linear_view from the base view and adding a brush

In [72]:
brush_track=base_chart.mark_brush().encode(
    x=gos.X(
        'start:G',
        linkingId='brushId'
    ),
    color=gos.value('steelblue')
)

# create views
circle_brush_track=base_chart.properties(
    layout='circular',
    static=True,
)

linear_brush_track=base_chart.properties(
    layout='linear',
    static=True,
)

# overlay brush onto views
circ_overview=gos.overlay(
    circle_brush_track,
    brush_track
).properties(
    layout='circular',
    centerRadius=0.4,
    width=250,
    height=130,
)

lin_overview=gos.overlay(
    linear_brush_track,
    brush_track
).properties(
    height=200,
    width=400,
)

Finally, create a composition of the sections


In [74]:
aligned_views=gos.horizontal(
    circ_overview,
    lin_overview,
    spacing=40
)
final_view=gos.vertical(
    aligned_views,
    detailed_track
).properties(
    title="Recreation of Example Visual",
    subtitle="Drag the brush to get a detailed view"
)

final_view.save('visuals/recreation/recreated_view.json')
display(final_view)
