<h3>Epiviz File Server: Query and Compute on indexed genomic files</h3>

Epiviz is an interactive and integrative web application for visual analysis and exploration of functional genomic datasets. We currently support a couple of different ways in which data can be provided to epiviz. 1) using the epivizr R/Bioconductor package, users can interactively visualize and explore genomic data loaded in R. 2) MySQL database which stores each genomic dataset as a table. Storing genomic data into a database is a challenge especially when the data sets are huge (millions of rows). When the dataset is huge, it is time consuming to import the dataset into a database table and challenging to then optimize the table for faster queries (even after table partitioning).

Epiviz file server is a Python library, an in-situ data query system on indexed genomic files, not only for visualization but also for transformation. The library provides various modules to perform various tasks - 1) Parser to read various genomic file formats, 2) Query to access only necessary bytes of file for a given genomic region (chromosome, start and end), 3) Compute to apply transformations on data, 4) Server to instantly convert the datasets into a REST API for developers to build apps or visualizations and 5) Visualization. Using the library, users will be able to explore data from publicly hosted indexed genomic files. We currently support various file formats - BigBed, BigWig, HDF5 and any format that can be indexed using tabix. Once the data files are defined, users can also define summarizations and transformations on these data files using numpy functions. We use dask to manage, distribute and schedule various query and compute requests on files. 



<h4>1. Import files - a json configuration</h4>

We use the NIH's roadmap epigenomics project ([File Browser](http://egg2.wustl.edu/roadmap/web_portal/processed_data.html#ChipSeq_DNaseSeq), [FTP Site](http://egg2.wustl.edu/roadmap/data/byFileType/), [metadata](http://egg2.wustl.edu/roadmap/web_portal/meta.html)). An example configuration files looks like this - 

``` json
[
  {
    url: "https://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidated/macs2signal/foldChange/E094-H3K27me3.fc.signal.bigwig",
    file_type: "bigwig",
    datatype: "bp",
    name: "E094-H3K27me3",
    annotation: {
        group: "digestive",
        tissue: "Gastric",
        marker: "H3K27me3"
    }
  }, {
    url: "https://egg2.wustl.edu/roadmap/data/byFileType/signal/consolidated/macs2signal/foldChange/E109-H3K27me3.fc.signal.bigwig",
    file_type: "bigwig",
    datatype: "bp",
    name: "E109-H3K27me3",
    annotation: {
        group: "digestive",
        tissue: "Small Intestine",
        marker: "H3K27me3"
    }
  }
]
```

<h4>2. Define Transformations</h4>

We first load the configuration file into the file server. We can now filter for measurements, and define a computed measurement using a `numpy` function.

``` python
    # create measurements manager
    mMgr = MeasurementManager()

    # create file handler for dask
    mHandler = create_fileHandler()
    
    # import measurements from json
    roadmap = mMgr.import_files(os.getcwd() + "/roadmap.json", mHandler)

    #filter measurements for "H3K4me2"
    froadmap = [m for m in roadmap if m.name.find("H3K27me3") != -1 and m.name.find("singal") != -1]

    # add a computed measurement
    computed_measurement = mMgr.add_computed_measurement("computed", "diff_H3K27me3_binding", "diff H3K27me3 binding (Gastric & Intestine) ",
                                    measurements=froadmap, computeFunc=numpy.diff)

    # Run the API server
    app = setup_app(mMgr)
    app.run(host="0.0.0.0", port=8000)
```


Note: since we use an asynchronous server library (Sanic) , The above code will not work inside a IPython notebook. Please copy the script into a separate file and run the code.

<h4>3. Interactive Visualization with Epiviz</h4>

We can use Epiviz, to visualize data from the file directly using the file server

<img src="epiviz2.gif">

<h4>4. Embedding Visualization in Jupyter notebooks</h4>

In this example, we will manually load the configuration file, create a computed measurement and embed the result as a line track in this notebook. We are working on simplifying this process, especially the part where we embed html in Jupyter notebooks.

In [14]:
from epivizfileserver import setup_app, create_fileHandler, MeasurementManager
import os
import numpy

# create measurements manager
mMgr = MeasurementManager()

# import measurements from json
roadmap = mMgr.import_files(os.getcwd() + "/roadmap.json")

In [15]:
froadmap = [m for m in roadmap if m.annotation["group"] == "digestive"]

In [16]:
# add a computed measurement
computed_measurement = mMgr.add_computed_measurement("computed", "diff_H3K27me3_binding", "diff H3K27me3 binding (Gastric & Intestine) ",
                                measurements=froadmap, computeFunc=numpy.diff)

In [17]:
result, _ = await computed_measurement.get_data("chr11", 30054187, 30064187)
result.head()

Unnamed: 0,chr,E094-H3K27me3,start,end,E109-H3K27me3,diff_H3K27me3_binding
"(30054187, 30054192]",chr11,0.917645,30054187,30054192,0.903366,-0.014279
"(30054192, 30054197]",chr11,0.917645,30054192,30054197,0.903366,-0.014279
"(30054197, 30054202]",chr11,0.917645,30054197,30054202,0.903366,-0.014279
"(30054202, 30054207]",chr11,0.917645,30054202,30054207,0.903366,-0.014279
"(30054207, 30054212]",chr11,0.917645,30054207,30054212,0.903366,-0.014279


In [18]:
%%html
<script src="bower_components/jquery/dist/jquery.js"></script>
<script src="bower_components/jquery-ui/jquery-ui.js"></script>
<script src="bower_components/webcomponentsjs/webcomponents-lite.js"></script>
<link rel="import" href="bower_components/epiviz-charts/epiviz-charts.html">

In [19]:
from IPython.display import HTML, IFrame
import ujson
import pandas
from epivizfileserver.server.utils import format_result

params = {"measurement": 'diff_H3K27me3_binding', "metadata": None}

resp = format_result(result, params)
HTML("<epiviz-line-track dim-s=['diff_H3K27me3_binding'] json-data='" + ujson.dumps(resp) + "'></epiviz-line-track>")

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self._setitem_with_indexer(indexer, value)
