## Accompaning code KNeMAP

Here we will make directly use of the computed communities on the prior network, as described in the accompnaying publication.
Other community algorithms can be applied, multiple algorithms can be found in our prior publication VOLTA (10.1093/bioinformatics/btab642).

The pre-processed expression data is provided. 

The provided code alows you to compute the KNeMAP fingerprint, based on gene deregulation and a prior network partitioning. These fingerprints can be used to compare exposures or to cluster them. Helpful functions can be found in our prior publication VOLTA (10.1093/bioinformatics/btab642).

The knemap_functions can be applied to any dataset, this notebook shows the application on the CMAP dataset (10.1126/science.1132939) as used in the accompanying publication.

In [1]:
import pandas as pd
from src import knemap_functions as knemap
import json

In [2]:
#load data


communities = pd.read_csv("data/communities.csv")
chemicals_hl60 = pd.read_csv('data/HL60_small.csv', index_col=0)
chemicals_mcf7 = pd.read_csv('data/MCF7_small.csv', index_col=0)
chemicals_pc3 = pd.read_csv('data/PC3_small.csv', index_col=0)


f = open("data/prior_gene_mapping.txt")
mapping = json.load(f)
f.close()

In [3]:
#convert the communities dataFrame into the dict object needed
#the dict object has the same format as all community outputs from VOLTA

temp = {}
for i in range(len(communities["Node ID"].to_list())):
    temp[str(communities["Node ID"].to_list()[i])] = [communities["Community ID"].to_list()[i]]
    
communities = temp

first remove all genes not present in all data sets & the prior network/ communities
adjust gene identifiers to match prior identifiers (encoded in mapping)

In [4]:
chemicals_hl60 = knemap.remove_unknown_genes(mapping, chemicals_hl60)
chemicals_mcf7 = knemap.remove_unknown_genes(mapping, chemicals_mcf7)
chemicals_pc3 = knemap.remove_unknown_genes(mapping, chemicals_pc3)

hl60_new = knemap.update_dataframe_indexes(chemicals_hl60, mapping)
mcf7_new = knemap.update_dataframe_indexes(chemicals_mcf7, mapping)
pc3_new = knemap.update_dataframe_indexes(chemicals_pc3, mapping)

hl60_new = knemap.remove_unknown_genes(communities, hl60_new)
mcf7_new = knemap.remove_unknown_genes(communities, mcf7_new)
pc3_new = knemap.remove_unknown_genes(communities, pc3_new)

original number of rows  11868
number of known genes  11442
original number of rows  11868
number of known genes  11442
original number of rows  11868
number of known genes  11442
original number of rows  11442
number of known genes  11346
original number of rows  11442
number of known genes  11346
original number of rows  11442
number of known genes  11346


Compute KNeMAP fingerprint vector for each data instance

In [5]:
fingerprint_hl60_temp = knemap.create_fingerprints(communities, hl60_new, top=100, bottom=100, add="_hl60")
fingerprint_pc3_temp = knemap.create_fingerprints(communities, pc3_new, top=100, bottom=100, add="_pc3")
fingerprint_mcf7_temp = knemap.create_fingerprints(communities, mcf7_new, top=100, bottom=100, add="_mcf7")

sort fingerprints by their key values so that all communities are in the same order
for each exposure the fingerprint is the list of its values

In [6]:
fingerprint_hl60 = knemap.sort_fingerprint_by_key(fingerprint_hl60_temp)
fingerprint_pc3 = knemap.sort_fingerprint_by_key(fingerprint_pc3_temp)
fingerprint_mcf7 = knemap.sort_fingerprint_by_key(fingerprint_mcf7_temp)

In [7]:
fingerprint_hl60

{'methazolamide_hl60': {0: 0.0,
  1: 0.0,
  2: 0.0,
  3: 0.0,
  4: 0.0,
  5: 0.005,
  6: 0.015,
  7: 0.0,
  8: 0.0,
  9: 0.0,
  10: 0.0,
  11: 0.005,
  12: 0.0,
  13: 0.005,
  14: 0.0,
  15: 0.0,
  16: 0.0,
  17: 0.005,
  18: 0.0,
  19: 0.0,
  20: 0.0,
  21: 0.0,
  22: 0.0,
  23: 0.005,
  24: 0.005,
  25: 0.005,
  26: 0.0,
  27: 0.005,
  28: 0.0,
  29: 0.005,
  30: 0.0,
  31: 0.0,
  32: 0.0,
  33: 0.0,
  34: 0.0,
  35: 0.005,
  36: 0.0,
  37: 0.0,
  38: 0.0,
  39: 0.0,
  40: 0.0,
  41: 0.0,
  42: 0.0,
  43: 0.0,
  44: 0.0,
  45: 0.0,
  46: 0.0,
  47: 0.0,
  48: 0.0,
  49: 0.0,
  50: 0.0,
  51: 0.005,
  52: 0.005,
  53: 0.005,
  54: 0.0,
  55: 0.0,
  56: 0.0,
  57: 0.005,
  58: 0.0,
  59: 0.005,
  60: 0.005,
  61: 0.005,
  62: 0.0,
  63: 0.005,
  64: 0.0,
  65: 0.005,
  66: 0.005,
  67: 0.0,
  68: 0.005,
  69: 0.0,
  70: 0.0,
  71: 0.0,
  72: 0.0,
  73: 0.005,
  74: 0.0,
  75: 0.0,
  76: 0.0,
  77: 0.0,
  78: 0.0,
  79: 0.01,
  80: 0.0,
  81: 0.0,
  82: 0.0,
  83: 0.0,
  84: 0.0,
  85: 

The fingerprints can now for example be used to compute distances between pairs or to cluster them.
Helpful functions can be found in our prior publication VOLTA (10.1093/bioinformatics/btab642).