# Connect to Nucleome Browser inside Jupyter notebook

One of the key novel features of Nucleome Browser is a generic [*event-driven communication*](https://nb-docs.readthedocs.io/en/latest/index.html) mechanism. It allows Nucleome Browser to not only send and receive messages across panels in the Nucleome Browser but also communicate with external data portals via the Nucleome Bridge extension. This demo shows you how to use JavaScript inside jupyter notebook and connect your local Jupyter notebook to the [Nucleome Browser](http://vis.nucleome.org) using the Chrome extension **Nucleome Bridge**. 

## 1 Install Nucleome Bridge

First, add [Nucleome Bridge](https://chrome.google.com/webstore/detail/nucleome-bridge/djcdicpaejhpgncicoglfckiappkoeof) extension to your Chrome, Firefox web browser or any [Chromium-based](https://en.wikipedia.org/wiki/Chromium_(web_browser)) web browsers.

## 2 The demo of interactive data visualization inside Jupyter notebook using Nucleome Browser

This section shows a demo of connecting Nucleome Browser and Jupyter notebook via the nb-dispatch and Nucleome Bridge web browser extension. Here, we will use the TSA-seq data from Yu et al. 2018 (https://rupress.org/jcb/article/217/11/4025/120670/Mapping-3D-genome-organization-relative-to-nuclear) as a case study. TSA-seq is a genomic technique that can convert the cytological distance of chromatin to nuclear bodies (such as nuclear speckles) into digital readout that can be shown on the genome browser. In this paper, the authors found a striking lamina-speckles axis of chromatin by measuring the estimated average distance (called TSA-seq score in the genome readout) of chromatin to nuclear speckles/lamina from a population of K562 cells. They further divided the human genome into ten equaled-size groups (a.k.a, TSA-seq decile). Decile 10 regions are closest to nuclear speckles while Decile 1 regions are furthest from nuclear speckles. 

In the paper, the authors found a strong correlation between TSA-seq deciles and functional genomic data. Here, we will try to re-analyze the data and verify some important findings in the paper. For example, we will explore the relationship between TSA-seq scores with laminB DamID and replication timing. To make this demo run quickly in the Binder platform, we have pre-processed the data by restricting our analysis in chromosome 2.

### 2.1 Process laminB DamID and replication-timing data

We first load bigWig files into a python object using the pyBigWig package. The K562 laminB DamID (array-based) data is downloaded from the UCSC genome browser (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/laminB1.txt.gz). We have already converted it into bigwig format and only kept the signal in chromosome 2. 

The K562 replication-timing data is downloaded from the UCSC genome browser http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeUwRepliSeq/wgEncodeUwRepliSeqK562WaveSignalRep1.bigWig. Similarly, we have already processed the original data and only kept the signal in chromosome 2.

In [1]:
# Load libary
import numpy as np
import math
import pandas as pd
import pyBigWig

# Load the laminB DamID data
bw_laminB = pyBigWig.open("data/LaminB_DamID_hg19_signal.chr2.bw")

# Load the replication-timing data
bw_repliseq = pyBigWig.open("data/wgEncodeUwRepliSeqK562WaveSignalRep1.chr2.bw")

We then cut the chromosome 2 into non-overlapping 20kb bins and calcualte the average genomic signal over each bin. To save computation time, here we only do calculation for the first 100M on chromosome2. 


In [2]:
# For this demo, we only use data on chromosome 2
chrom_name = 'chr2'
chrom_start = 0
#chrom_end = 243199373 # this is the total length of the chromosome 2
chrom_end = 100000000
bin_res = 20000

def extract_ave_signal_over_bin(bw, chrom, start, end, bin_size):
    array = []
    for idx in range(math.floor(start/bin_size), math.floor(end/bin_size)):
        query_start = idx*bin_size
        query_end = query_start + bin_size
        ave_signal = bw.values(chrom, query_start, min(query_end, end))
        if len(ave_signal) == 0 or ave_signal is None:
            array.apapend(0.0)
        else:
            array.append(np.nanmean(ave_signal))
    return array

# Calculate the average signal of chromosome 2 at 20kb resolution
signal_laminB = extract_ave_signal_over_bin(bw_laminB, chrom_name, chrom_start, chrom_end, bin_res)
signal_repliseq = extract_ave_signal_over_bin(bw_repliseq, chrom_name, chrom_start, chrom_end, bin_res)




We further convert laminB DamID score and replication timing score into z-scores so that they are comparable. Finally, we construct a data frame with each row representing a 20kb bin.

In [3]:
# convert laminB log2 ratio into z-score
signal_laminB_zscore = (signal_laminB - np.nanmean(signal_laminB))/np.nanstd(signal_laminB)
signal_repliseq_zscore = (signal_repliseq - np.nanmean(signal_repliseq))/np.nanstd(signal_repliseq)
chrom_list = [chrom_name]*math.floor((chrom_end-chrom_start)/bin_res)
start_list = [math.floor(chrom_start/bin_res)+bin_res*idx for idx in range(len(chrom_list))]
end_list = [start_list[1]+bin_res*idx for idx in range(len(chrom_list))]
data = pd.DataFrame(data = {'chrom':chrom_list, 
                            'start':start_list, 
                            'end':end_list, 
                            'laminB':signal_laminB_zscore, 
                            'repliseq':signal_repliseq_zscore})

In [4]:
# check the data
data.head()

Unnamed: 0,chrom,start,end,laminB,repliseq
0,chr2,0,20000,-0.504095,
1,chr2,20000,40000,-0.519381,-0.793167
2,chr2,40000,60000,0.711371,-0.794941
3,chr2,60000,80000,0.435813,-0.799336
4,chr2,80000,100000,1.159452,-0.806289


### 2.2 Link Nucleome Browser to visualize the processed data in the notebook
> **Important: you should open this Nucleome Browser session (https://vis.nucleome.org/v1/main.html?config=/share/jOrDrXjfqfHZIqAUsOFvmvaGbiQxDEyj) to view the TSA-seq deciles data. Nucleome Browser (version later than v0.9.9) supports the users to set channel ID for communication. In this demo, we will use the default communication channel (N-cnbChan01). If you want to use other channel (e.g., N-cnbChan02). you need to switch the channel using dispatch.chanId("<new channel>").**

In [5]:
from IPython.display import Javascript
import json

In [6]:
%%javascript
require.config({
    paths: {
        'd3': ['https://d3js.org/d3.v5.min'],
        'nb': ['https://vis.nucleome.org/static/lib/nb-dispatch'],
        'Plotly': ['https://cdn.plot.ly/plotly-latest.min']
    }
});

<IPython.core.display.Javascript object>

In [7]:
Javascript("""
(function(element){
    require(['d3', 'nb', 'Plotly'], function(d3, nb, Plotly) {
        // init div for plot
        d3.select(element[0]).append('div')
            .attr('id', 'myDiv')
        var log = d3.select(element.get(0)).append('div')
            .attr("class", "log")
        // prepare data
        var df = %s,
            labelG1 = %s,
            labelG2 = %s;
        var allChrom = d3.map(df['chrom']).entries().map(x=>x.value),
            allStart = d3.map(df['start']).entries().map(x=>x.value),
            allEnd = d3.map(df['end']).entries().map(x=>x.value),
            allG1 = d3.map(df[labelG1]).entries().map(x=>x.value),
            allG2 = d3.map(df[labelG2]).entries().map(x=>x.value),
            currentIdx = [],
            currentG1 = [],
            currentG2 = [];
        // prepare nb
        var out = d3.select(element.get(0)).append('div')
            .attr("id", "out")
        var a = nb.dispatch("update","brush")
        // Use the default channel ID
        a.chanId("cnbChan01")
        a.connect(function(d){
          d3.select("#out").html = d.connection
        })
        // update figure when highlight in nb
        a.on("brush",function(d){
            getHighlightedData(d)
            updateViolinPlot(currentG1, currentG2, labelG1, labelG2)
        })
        var regionText = function (d) {
            return d.chr + ":" + (d.start+1) + "-" + d.end;
        };
        // filter data
        function getHighlightedData(regions) {
            currentIdx = [];
            currentG1 = [];
            currentG2 = [];
            regions.forEach(function (d) {
                for (var i = 0 ; i < allChrom.length; i++){
                    if ( allChrom[i] === d.chr && d.start < allStart[i] && allEnd[i] < d.end) {
                        currentIdx.push(i);
                        currentG1.push(allG1[i]);
                        currentG2.push(allG2[i]);
                    }
                }
            })
            log.text("log: " + currentIdx.length+" bins selected")
        }
        // plot function
        function updateViolinPlot(data1, data2, label1, label2) {
            var violin1 = {
              type: 'violin',
              y: data1,
              points: 'none',
              box: {
                visible: true
              },
              boxpoints: false,
              line: {
                color: 'black'
              },
              fillcolor: '#fbb4ae',
              opacity: 0.6,
              meanline: {
                visible: true
              },
              x0: label1
            };
        
            var violin2 = {
              type: 'violin',
              y: data2,
              points: 'none',
              box: {
                visible: true
              },
              boxpoints: false,
              line: {
                color: 'black'
              },
              fillcolor: '#b3cde3',
              opacity: 0.6,
              meanline: {
                visible: true
              },
              x0: label2
            };
        
            var layout = {
              title: "",
              yaxis: {
                zeroline: false,
                range: [-3, 3]
              }
            };
    
            Plotly.newPlot('myDiv', [violin1, violin2], layout);
        }
    });
})(element);
""" % (data.to_json(orient = 'columns'), json.dumps('laminB'), json.dumps('repliseq')))

<IPython.core.display.Javascript object>

### 2.3 Results

In the Nucleome Browser, when you highlight different regions by brushing on the tracks or click different TSA-seq deciles, you should see two violin plots showing the distribution of data for regions overlapping those highlighted regions. Importantly, those violin plots will automatically update as the highlighted regions changes as shown in the animation below. 

![Animation](img/demo_1_1.8x_opt.gif "Ainimation 1")

You should observe some patterns. First, regions with high TSA-seq scores such as the Decile 1 tend to have the lowest laminB DamID scores and the highest replication timing signal (i.e., replicate earlier). Conversely, regions with low TSA-seq scores have high laminB DamID scores and low replication timing signals. This observation is consistent with the observation in Yu et al. 2018. 

To summarize, in this demo we demonstrate that Nucleome Browser and its plugin Nuclome Bridge provide a novel angle to allow users to interactively explore data in the Jupyter notebook. Users can do data cleaning, data exploration, and data visualization inside the notebook while using the Nucleome Browser to facilitate interactive data exploration and hypothesis formation. 

## 3 The demo of using Jupyter notebook to navigate Nucleome Browser

This demo illustrates that it is also possible to control the Nucleome Browser inside the notebook to navigate to or highlight specific region(s). The first code block loads the required Javascript packages. [Nb-dispatch](https://github.com/nucleome/nb-dispatch) is a cross-domain event dispatcher for message broadcast across the Nucleome Browser. Currently, it supports sending and receiving *navigation* and *highlight(s)* messages across the different domains in the allowlist. You can view a collection of examples using nb-dispatch at [JSfiddle](https://jsfiddle.net/user/nucleome/fiddles/) or [Codepen](https://codepen.io/collection/DkGVYL/). 

In [8]:
%%javascript
require.config({
    paths: {
        'd3': ['https://d3js.org/d3.v5.min'],
        'nb': ['https://vis.nucleome.org/static/lib/nb-dispatch']
    }
});

<IPython.core.display.Javascript object>

### 3.1 Prepare some region(s)

We create a Python list with some genomic coordinates and assign them to a Python list object.

In [12]:
data2 = ['chr2:1-100000000', 'chr2:0-10000000:green,chr2:10000000-20000000:red,chr2:20000000-30000000:green,chr2:30000000-40000000:red,chr2:40000000-50000000:green']

### 3.2 Link notebook to the Nucleome Browser

We then create a drop-down list to show these regions. You can click the buttons to control the navitable regions on the Nucleome Browser. 

> **Important: After you run the following code block, open the [Nucleome Browser](http://vis.nucleome.org) on your web browser. Select a region from the select tool. After you click either the *Navigate to* or *Highlight* button, the view on the Nucleome Browser should update accordingly.** 

In [13]:
Javascript("""
(function(element) {
    require(['d3', 'nb'], function(d3, nb) { 
        var out = d3.select(element.get(0)).append('div')
            .attr("id", "out")
        //
        var dropdown = d3.select(element.get(0)).append('select')
            .attr("id","select1")
            .attr("class", "drop")
        var options = dropdown
                .selectAll('option')
                .data(%s)
                .enter()
                .append('option')
                .text(function(d) {return d;})
        var log = d3.select(element.get(0)).append('div')
            .attr("class", "log")
        //
        var a = nb.dispatch("update","brush")
        a.connect(function(d){
          d3.select("#out").html = d.connection
        })
        var selectRegion = null;
        var message = "";
        var obj = {};
        var button_nav = d3.select(element.get(0)).append('button')
            .text('Navigate to')
            .attr("id", "nav")
            .classed("button", true)
            .on('click', function() {
                selectRegion = d3.select('#select1').property('value');
                selectRegion = selectRegion.split(',');
                if (selectRegion.length > 1) {
                    message = "Found multiple genomic regions. Only the first region will be used.";
                } 
                selectRegion = selectRegion[0]
                log.text(message + "Navigate to: " + selectRegion)
                selectRegion = selectRegion.split(/[-:]/);
                a.call("update",this,[{chr:selectRegion[0],start:selectRegion[1],end:selectRegion[2]}])
                message = ""
            });
        var button_highlight = d3.select(element.get(0)).append('button')
            .text("Highlight")
            .attr("id", "highlight")
            .classed("button", true)
            .on('click', function() {
                selectRegion = d3.select('#select1').property('value');
                selectRegion = selectRegion.split(',');
                log.text(message + "Highlight: " + selectRegion)
                var selectRegion = selectRegion.map(function(region) {
                    region = region.split(/[-:]/);
                    obj = {};
                    obj['chr'] = region[0];
                    obj['start'] = region[1];
                    obj['end'] = region[2];
                    obj['color'] = region[3];
                    return obj;
                });
                a.call("brush",this,selectRegion)
                message = ""
            })
    })
})(element);
""" % (json.dumps(data2)))

<IPython.core.display.Javascript object>

### 3.1 Results

As the following animation shows, you can choose region(s) from the drop-down list and click buttons to either navigate to or highlight the region(s).

![Animation](img/demo_2_1.8x_opt.gif "Ainimation 2")

## 4 Use Nucleome Bridge in your Jupyter notebook

This section shows how to allow Nucleome Bridge to work on your Jupyter notebook hosted on custom domains other than the localhost (127.0.0.1)

For security issues, Nucleome Bridge can only recognize messages sent from the allowlist of websites. The localhost (127.0.0.1) is supported by default. You can view the complete allowlist on the documentation of the Nucleome Browser (https://nb-docs.readthedocs.io/en/latest/nb_dispatch_api.html#overview). Therefore, it is possible to send messages to Nucleome Browser via the Nucleome Bridge on your local machine. If you want to start a Jupyter notebook on a remote server, you need to set up SSH port forwarding. The following tutorial shows you how to do that.

### 2.1 Start a Jupyter notebook on a remote server
Once you log in to the remote machine, you can start a Jupyter notebook server.
```bash
jupyter notebook --no-browser --port XXXX
# --no-browser: this will start a notebook without opening a browser
# XXXX: this is the port used on the remote server
```

### 2.2 Forward port XXXX on the remote server to YYYY on 127.0.0.1 and listen to it
The following script can forward port YYYY on the remote server to your localhost.
```bash
ssh -N -f -L 127.0.0.1:YYYY:127.0.0.1:XXXX <user id>@<remote server>
# -L: this a required parameter in the form of local_socket:remote_socket. In this example, 127.0.0.1:YYYY is the local socket and 127.0.0.1:YYYY is the remote socket.
# you need to type in the password to finish this step
```

### 2.3 Fire-up Jupyter Notebook
In your local machine, you can then access the Jupyter notebook via 127.0.0.1:YYYY. It may ask you for the token of your notebook. You can get this from the message on the remote server.