# GarNet

There are two steps to GarNet: 
- **mapping** a BED file of peaks to TF motif/gene complexes 
- **regressing** those motifs' binding affinities against gene expression. 

In [1]:
#!/usr/bin/env python3

from GarNet import map_peaks, TF_regression

garnet_file = "../example_data/hg19/hg19.garnet.pickle"

peaks = "../example_data/example_peaks/wgEncodeUWDukeDnaseA549.fdr01peaks.hg19.bed"
# peaks = "../example/example_peaks/A549_FOXA1_broadPeak.bed"
# peaks = "../example/example_peaks/JUNB_ChIP_A549.bed"
# peaks = "../example/example_peaks/CTCF_ChIP_A549.bed"
# peaks = ["../example/example_peaks/A549_FOXA1_broadPeak.bed", "../example/example_peaks/JUNB_ChIP_A549.bed", "../example/example_peaks/CTCF_ChIP_A549.bed"]


## Map Peaks

In [2]:
df = map_peaks(peaks, garnet_file)

02:10:32 - GarNet: INFO - Mapping peaks against genome from garnet-file ../example_data/hg19/hg19.garnet.pickle
02:10:32 - GarNet: INFO - Unpacking garnet file (this can take a while)... 
02:48:14 - GarNet: INFO - Constructing representation of peaks file ../example_data/example_peaks/wgEncodeUWDukeDnaseA549.fdr01peaks.hg19.bed...
02:48:14 - GarNet: INFO -   - Peaks file does not seem to have been generated by pickle, proceeding to parse...
02:48:15 - GarNet: INFO -   - Parse complete, constructing IntervalTrees...
02:48:23 - GarNet: INFO - Computing intersection of peaks with reference...
02:48:23 - GarNet: INFO - Computing intersection operation of IntervalTrees for each chromosome...


In [3]:
df.head()

Unnamed: 0,chrom,motifStart,motifEnd,motifName,motifScore,geneName,geneStart,geneEnd,peakName
0,chrX,10009681,10009687,MZF1_1-4,0.395137,WWC3,9983794,10112518,.
1,chrX,25039141,25039160,REST,0.015574,ARX,25021812,25034065,.
2,chrX,25039207,25039218,EBF1,0.029736,ARX,25021812,25034065,.
3,chrX,25039140,25039161,NRSE,0.019347,ARX,25021812,25034065,.
4,chrX,25039140,25039161,REST,0.029062,ARX,25021812,25034065,.


In [8]:
df.to_csv("../example_data/temp.tsv", sep='\t', header=True, index=False)

## TF Regression

In [2]:
import pandas as pd 

expression = "../example_data/expression/Tgfb_exp.tsv"
df = pd.read_csv('../example_data/temp.tsv', sep='\t')
df.head()

Unnamed: 0,chrom,motifStart,motifEnd,motifName,motifScore,geneName,geneStart,geneEnd,peakName
0,chrX,10009681,10009687,MZF1_1-4,0.395137,WWC3,9983794,10112518,.
1,chrX,25039141,25039160,REST,0.015574,ARX,25021812,25034065,.
2,chrX,25039207,25039218,EBF1,0.029736,ARX,25021812,25034065,.
3,chrX,25039140,25039161,NRSE,0.019347,ARX,25021812,25034065,.
4,chrX,25039140,25039161,REST,0.029062,ARX,25021812,25034065,.


In [3]:
tfs = TF_regression(df, expression)

10:36:24 - GarNet: INFO - Performing linear regression for 422 transcription factor expression profiles...
  return self.params / self.bse
  return (self.a < x) & (x < self.b)
  return (self.a < x) & (x < self.b)
  cond2 = cond0 & (x <= self.a)


### Occasionally, the motifScore might be 0, causing a divide by 0, causing the error above. Not to worry...

In [4]:
tfs

Unnamed: 0,Transcription Factor,Slope,P-Value,Targets
0,AHR,67.850000,0.375410,"[SPTBN2, EFEMP2, SYT7, STARD10, MPZL2, NRG1, S..."
1,ALX4,0.744295,0.717978,"[PRSS23, LSR, PFKFB3, SLC9A3R1, ACLY, ST3GAL5,..."
2,AML,-22.553067,0.094280,"[FHL1, LIMS1, PDGFC, GLIS3, EPB41L4B, SPDEF, L..."
3,AP-2alphaA,6.296716,0.313251,"[IRX5, PKP3, ERF, TGFB1, IL11, VIM, NRG1, TYRO..."
4,AR,-2.922142,0.414332,"[PARD6A, CTNND1, TLN2, SPINT1, RHOV, PMP22, EP..."
5,ARNT,2.214483,0.366826,"[AP1S2, CPT1A, EFEMP2, STARD10, ERF, CRB3, PPP..."
6,ATF1,11.183138,0.056749,"[CYLD, FA2H, PARD6A, MPZL2, ERF, GEM, MYC, TYR..."
7,ATF2,-0.847601,0.434626,"[CYLD, AKTIP, RELA, MPZL2, ERF, IL11, CRLF1, T..."
8,ATF2:c-Jun,0.666818,0.603021,"[CYLD, PARD6A, CCND1, MPZL2, ERF, CRLF1, TMEM5..."
9,ATF3,0.000000,,"[CDH1, CYLD, FA2H, PARD6A, CLCF1, ERF, ITGB1, ..."


### If you provide an output_dir to `TF_regression`, you can see each of the scatterplots.