# Mapping

The function [](`~genominterv.remapping.interval_distance`) converts coordinates of one set of
genomic intervals into distances to the closest interval in a second set.

In [1]:
#| echo: false
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%config InlineBackend.figure_format = 'svg'

## Remapping functions

The function `remap` converts coordinates a single interval into distance to the closest interval in a second set:

In [2]:
from genominterv.remapping import remap

single_interval = (300, 400)
other_intervals = [(0, 100), (1000, 1100)]

remap(single_interval, other_intervals)

[(200, 300)]

Same but retaining info about which annotation the interval is proximal to:

In [3]:
remap((300, 400), [(0, 100), (1000, 1100)], include_prox_coord=True)

[(200, 300, 0, 100)]

The function `interval_distance` converts coordinates of one set of genomic intervals into distances to the closest interval in a second set. Using `relative=True` returns distances relative to the span between the two flanking annotation intervals.

In [4]:
annot = pd.DataFrame(dict(chrom='chrX', start=[1, 20], end=[2, 25]))
annot

Unnamed: 0,chrom,start,end
0,chrX,1,2
1,chrX,20,25


In [5]:
query = pd.DataFrame(dict(chrom='chrX', start=[3, 5], end=[15, 7], some_data=['foo', 'bar'], other_data=['A', 'B']))
query

Unnamed: 0,chrom,start,end,some_data,other_data
0,chrX,3,15,foo,A
1,chrX,5,7,bar,B


In [6]:
from genominterv.remapping import interval_distance

interval_distance(query, annot)

Unnamed: 0,start,end,chrom
0,1,9.0,chrX
1,-5,-9.0,chrX
2,3,5.0,chrX


In [8]:
from genominterv.remapping import interval_distance
import genominterv
print(genominterv.__file__)
interval_distance(query, annot, relative=True)

/Users/kmt/genominterv/genominterv/__init__.py


Unnamed: 0,start,end,chrom
0,0.055556,0.5,chrX
1,-0.277778,-0.5,chrX
2,0.166667,0.277778,chrX


The most useful function for most applications is `remap_interval_data`, which does the remapping while preserving all the information in the query data frame. It also reports the start and end coordinates before remapping (ends with `*_orig`) and the coordinates of the most proximal segment in the annotation set (`*_prox`).

In [10]:
from genominterv.remapping import remap_interval_data
remap_interval_data(query, annot)

Unnamed: 0,start,end,start_prox,end_prox,chrom,start_orig,end_orig,some_data,other_data
0,1,9.0,1,2,chrX,3,15,foo,A
1,-5,-9.0,20,25,chrX,3,15,foo,A
2,3,5.0,1,2,chrX,5,7,bar,B


In [11]:
remap_interval_data(query, annot, relative=True)

Unnamed: 0,start,end,start_prox,end_prox,chrom,start_orig,end_orig,some_data,other_data
0,0.055556,0.5,1,2,chrX,3,15,foo,A
1,-0.277778,-0.5,20,25,chrX,3,15,foo,A
2,0.166667,0.277778,1,2,chrX,5,7,bar,B
