# Huddinge Browser

This is a tool for browsing kmer enrichments in interactive two dimensional plots. This is work in progress so the user interface is a bit involved and the backend is quite fragile, but bare with me. I'm happy to take comments.

Here is a sample and tutorial for use of the system.

In [1]:
import sys
import logging as log
log.basicConfig(level=log.INFO,
                            format='%(asctime)s:%(funcName)s:%(levelname)s:%(message)s')
import numpy as np
import holoviews as hv
import numpy as np
import pandas as pd
hv.extension('bokeh',logo=False)

In [2]:
%cd /tmp/kpalin
## Temporary directory


/tmp/kpalin


In [3]:
import huddinge_tsne_browser.tsne_mapper as htm 
import huddinge_tsne_browser.huddinge_browser as hhb
import huddinge_tsne_browser.datashaderselect

Two main modules of the system are the `huddinge_tsne_browser.tsne_mapper` and `huddinge_tsne_browser.huddinge_browser`.  The tsne_mapper class reads the input files and possibly lays out the input kmers if they have not been laid out before.  Huddinge_browser class is more for interfacing the user.


The distribution comes with 8 mers laid out with TSNE approximating Huddinge distance. Software for calculating all pairs Huddinge distance (and producing appropriate output) is in branch `huddinge_pairs` of git repository `https://github.com/kpalin/MODER.git` and the computation can be done with command line `python huddinge_tsne_browser`.

First you need to compute 8 mer counts for some selex [experiment](https://www.ebi.ac.uk/ena/data/view/PRJEB3289)

In [4]:
f = """ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172922/fastq/HNF4A_TGACAG20NGA_AF_1.fastq.gz
ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172922/fastq/HNF4A_TGACAG20NGA_AF_2.fastq.gz
ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172922/fastq/HNF4A_TGACAG20NGA_AF_3.fastq.gz
ftp.sra.ebi.ac.uk/vol1/ERA172/ERA172922/fastq/HNF4A_TGACAG20NGA_AF_4.fastq.gz
""".split()
for i in f:
    !wget --no-clobber {i}

File ‘HNF4A_TGACAG20NGA_AF_1.fastq.gz’ already there; not retrieving.

File ‘HNF4A_TGACAG20NGA_AF_2.fastq.gz’ already there; not retrieving.

File ‘HNF4A_TGACAG20NGA_AF_3.fastq.gz’ already there; not retrieving.

File ‘HNF4A_TGACAG20NGA_AF_4.fastq.gz’ already there; not retrieving.



### Calculate kmer counts

Then calculate 8mer counts for your data. Currently only jellyfish text output is good. (Also note jellyfish needs the `--disk` option for 8 and 7 mers)

In [5]:
%%bash
K=8
for i in *.fastq.gz
do
    
    OUT=$(basename $i .fastq.gz).${K}mer_counts.jf
    echo $OUT
    if [ ! -e ${OUT} ];
    then
        zcat $i | /usr/bin/time -v jellyfish count -o $OUT --text -m ${K} -s 1M --bf-size 1G -t 16 --disk /dev/stdin
    fi
done

HNF4A_TGACAG20NGA_AF_1.8mer_counts.jf
HNF4A_TGACAG20NGA_AF_2.8mer_counts.jf
HNF4A_TGACAG20NGA_AF_3.8mer_counts.jf
HNF4A_TGACAG20NGA_AF_4.8mer_counts.jf


## Initialize layout

Initialize the TsneMapper() with default kmer layout and add the 8mer count data.

In [98]:
import huddinge_tsne_browser.tsne_mapper as htm 
import huddinge_tsne_browser.huddinge_browser as hhb
import huddinge_tsne_browser.datashaderselect
reload(huddinge_tsne_browser.datashaderselect)
reload(hhb)

reload(htm)


<module 'huddinge_tsne_browser.tsne_mapper' from '/home/kpalin/software/huddinge_tsne_browser/huddinge_tsne_browser/tsne_mapper.pyc'>

In [99]:
tsne = htm.TsneMapper()
kmer_size=8
for i in range(4):
    tsne.add_kmercounts("HNF4A_{}".format(i+1),
                        "HNF4A_TGACAG20NGA_AF_{}.{}mer_counts.jf".format(i+1,kmer_size))


2018-01-11 13:24:19,375:read_data:INFO:Read 65536 sequences.
2018-01-11 13:24:19,379:read_data:INFO:Setting embedding from input data


## Browse

Create the browser module and display the browsing window. Browsing tools are selectable top right. The main display top left shows the kmers laid out colored according to the counts loaded above.  By clicking the main display you get table of kmers in the selected rectangle top right and more detailed figure, with point wise hover tool for counts at the bottom.  The coloring criterion of the main plot can be selected from the drop down menu.  These interactive features require jupyter running in the server.


In [100]:
br=hhb.HuddingBrowser(tsne)
p = br.holoview_plot()
p

2018-01-11 13:24:19,794:__init__:INFO:Initialized DataShaderSelect


datashade Value <class 'datashader.reductions.mean'>
Value Range: 0.0 7495.0
Subscribing to selection


In [101]:
%%opts Points [tools=['box_select', 'lasso_select']]

# Declare some points
points = hv.Points(np.random.randn(1000,2 ))

# Declare points as source of selection stream
selection = streams.Selection1D(source=points)

# Write function that uses the selection indices to slice points and compute stats
def selected_info(index):
    selected = points.iloc[index]
    if index:
        label = 'Mean x, y: %.3f, %.3f' % tuple(selected.array().mean(axis=0))
    else:
        label = 'No selection'
    #print selected.dframe()
    #hv.Histogram(selected,dimension="y")
    return selected.relabel(label).opts(style=dict(color='red'))

def koe(index):
    print index
    
selection.add_subscriber(koe)
# Combine points and DynamicMap
del(selection)
points #+ hv.DynamicMap(selected_info, streams=[selection])


In [70]:
h=p.DynamicMap.III["Points"]._selection

AttributeError: 'Points' object has no attribute '_selection'

In [65]:
h._stream

AttributeError: 'Points' object has no attribute '_stream'

In [16]:
import holoviews as hv
#br.tap_zoom.hist().relabel('dimension=Counts')
x,y = np.mgrid[-50:51, -50:51] * 0.1

img = hv.Image(np.sin(x**2+y**2), bounds=(-1,-1,1,1))
hmap = hv.HoloMap({phase: img.clone(np.sin(x**2+y**2+phase))
                   for phase in np.linspace(0, np.pi*2, 6)}, kdims='Phase')
hmap.hist(num_bins=100, dimension=['x', 'y'], weight_dimension='z', mean_weighted=True)

A dataframe of selected kmers can be obtained from `br.selected` attribute:

In [9]:
br.selected.head()

TypeError: object of type 'NoneType' has no len()

In [None]:
br.selected.to_csv("selected_kmers.tsv",sep="\t")
!head selected_kmers.tsv

## Custom values in plot

You can replace the kmer counts with numbers you count yourself.

In [None]:
# Pseudocount for kmer count estimates
p_cnt = (tsne.embedding[tsne.data_dims]+1.0)

# Scale to mean count 1.0
norm_cnt = p_cnt/(p_cnt*4**-8).sum()

# log fold change per cycle
ln_fold_change = np.log(norm_cnt).diff(axis=1).drop("HNF4A_1",axis=1)

# Mean fold change weighted by number of reads
w_mean_ln_fold_change = (p_cnt*ln_fold_change ).sum(axis=1)/p_cnt.sum(axis=1)

annot = tsne.embedding.join(norm_cnt,rsuffix="_norm").join(ln_fold_change,rsuffix="_lnfold")
annot["MeanFold"] = w_mean_ln_fold_change


In [None]:
import huddinge_tsne_browser.tsne_mapper as htm 
import huddinge_tsne_browser.huddinge_browser as hhb
import huddinge_tsne_browser.datashaderselect
reload(huddinge_tsne_browser.datashaderselect)
reload(hhb)

reload(htm)


In [None]:
tsne_fold = htm.TsneMapper()
tsne_fold.set_kmer_values(annot[[x for x in annot.columns if x.endswith("_lnfold")] + ["MeanFold"]])
#tsne_fold.set_kmer_values(annot[["MeanFold"]])

br_fold = hhb.HuddingBrowser(tsne_fold)

br_fold.holoview_plot()



In [None]:
((tsne.embedding[tsne.data_dims]+1.0)*ln_fold_change)/(tsne.embedding[tsne.data_dims]+1.0).sum(axis=1)

In [None]:
br.set_kmer_values

['#313695',
 '#4575b4',
 '#74add1',
 '#abd9e9',
 '#e0f3f8',
 '#ffffbf',
 '#fee090',
 '#fdae61',
 '#f46d43',
 '#d73027',
 '#a50026']