Tutorial notebook for the `cmapBQ` package. `cmapBQ` allows for targeted retrieval of relevant gene expression data from the resources provided by The Broad Institute and LINCS Project.

In [1]:
!pip -q install --upgrade cmapBQ

In [1]:
import os
import pandas as pd
import numpy as np
import seaborn as sns
import requests

import matplotlib.pyplot as plt


In [3]:
# 인증 파일이 저장될 경로 (디렉토리 포함)
credentials_filepath = '/Users/koyunkyung/Documents/lincs_workshop/content/BQ-demo-credentials.json'

# 인증 파일이 저장된 URL
url = 'https://s3.amazonaws.com/data.clue.io/api/bq_creds/BQ-demo-credentials.json'

# URL에서 인증 파일 다운로드
response = requests.get(url)
with open(credentials_filepath, 'w') as f:
    f.write(response.text)

print(f"Credentials file saved to {credentials_filepath}")

Credentials file saved to /Users/koyunkyung/Documents/lincs_workshop/content/BQ-demo-credentials.json


In [5]:
import cmapBQ.query as cmap_query
import cmapBQ.config as cmap_config

credentials_filepath = '/Users/koyunkyung/Documents/gene_expression_lincs/content/BQ-demo-credentials.json'

# 인증 설정
cmap_config.setup_credentials(credentials_filepath)
bq_client = cmap_config.get_bq_client()

print("BigQuery client is successfully set up!")

GOOGLE_APPLICATION_CREDENTIALS not valid, check credentials parameter in ~/.cmapBQ/config.txt


SystemExit: 1

#### Table Schema Information

In [5]:
cmap_query.list_tables()

_includes_clustered_tables: <bound method TableDirectory._includes_clustered_tables of TableDirectory(compoundinfo='cmap-big-table.cmap_lincs_public_views.compoundinfo', genetic_pertinfo='cmap-big-table.cmap_lincs_public_views.genetic_pertinfo', geneinfo='cmap-big-table.cmap_lincs_public_views.geneinfo', cellinfo='cmap-big-table.cmap_lincs_public_views.cellinfo', instinfo='cmap-big-table.cmap_lincs_public_views.instinfo', siginfo='cmap-big-table.cmap_lincs_public_views.siginfo', level3='cmap-big-table.cmap_lincs_public_views.L1000_Level3_cid', level3_rid='cmap-big-table.cmap_lincs_public_views.L1000_Level3_rid', level3_landmark='cmap-big-table.cmap_lincs_public_views.L1000_Level3_landmark', level4='cmap-big-table.cmap_lincs_public_views.L1000_Level4_cid', level4_rid='cmap-big-table.cmap_lincs_public_views.L1000_Level4_rid', level4_landmark='cmap-big-table.cmap_lincs_public_views.L1000_Level4_landmark', level5='cmap-big-table.cmap_lincs_public_views.L1000_Level5_cid', level5_rid='cmap-b

In [6]:
cmap_query.get_table_info(bq_client, 'cmap-big-table.cmap_lincs_public_views.compoundinfo') 
     

Unnamed: 0,column_name,data_type
0,pert_id,STRING
1,cmap_name,STRING
2,target,STRING
3,moa,STRING
4,canonical_smiles,STRING
5,inchi_key,STRING
6,compound_aliases,STRING


In [7]:
# MoA (Mechanism of Action): 약물이나 생체 분자가 특정 생물학적 효과를 유도하는 작용 기전
#   - 표적 단백질, 생화학적 과정, 세포 또는 조직 수준의 효과, 생리학적 효과

config = cmap_config.get_default_config()
compoundinfo_table = config.tables.compoundinfo

QUERY = ( 'SELECT moa, ' 
'COUNT(DISTINCT(pert_id)) AS count ' 
'FROM `cmap-big-table.cmap_lincs_public_views.compoundinfo` ' 
'GROUP BY moa')

cmap_query.run_query(bq_client, QUERY).result().to_dataframe()

Unnamed: 0,moa,count
0,,31262
1,CAR agonist,2
2,ALK inhibitor,7
3,Akt inhibitor,13
4,BCL inhibitor,11
...,...,...
653,Telomerase reverse transcriptase expression in...,1
654,Gonadotropin releasing factor hormone receptor...,2
655,Gonadotropin releasing factor hormone receptor...,1
656,"Precursor for food preservatives, plasticizers...",1


In [8]:
import cmapBQ.query as cmap_query

cmap_query.list_tables()

_includes_clustered_tables: <bound method TableDirectory._includes_clustered_tables of TableDirectory(compoundinfo='cmap-big-table.cmap_lincs_public_views.compoundinfo', genetic_pertinfo='cmap-big-table.cmap_lincs_public_views.genetic_pertinfo', geneinfo='cmap-big-table.cmap_lincs_public_views.geneinfo', cellinfo='cmap-big-table.cmap_lincs_public_views.cellinfo', instinfo='cmap-big-table.cmap_lincs_public_views.instinfo', siginfo='cmap-big-table.cmap_lincs_public_views.siginfo', level3='cmap-big-table.cmap_lincs_public_views.L1000_Level3_cid', level3_rid='cmap-big-table.cmap_lincs_public_views.L1000_Level3_rid', level3_landmark='cmap-big-table.cmap_lincs_public_views.L1000_Level3_landmark', level4='cmap-big-table.cmap_lincs_public_views.L1000_Level4_cid', level4_rid='cmap-big-table.cmap_lincs_public_views.L1000_Level4_rid', level4_landmark='cmap-big-table.cmap_lincs_public_views.L1000_Level4_landmark', level5='cmap-big-table.cmap_lincs_public_views.L1000_Level5_cid', level5_rid='cmap-b

Level/Types of Data
---

Landmark Genes
: 전체 유전자의 발현 패턴을 포괄적으로 대변할 수 있는 유전자들로 선택됨. (L1000 데이터가 978개의 landmark 유전자 발현 데이터를 측정하는 것임.)

---

Level 1 - LXB
: raw fluorescent intensity (FI) values measured for every bead detected by Luminex scanners
FI는 형광 신호의 세기를 수치화한 것으로, 유전자 발현 수준을 측정하기 위해 사용되고 있음.
실험적 데이터의 품질 확인, 데이터 정규화 및 필터링 과정 시작하는 데 사용.


Level 2 - GEX
: gene expresion levels for landmark genes, deconvoluted from the measures FI values
deconvolution 과정 - 형광 신호에서 배경 및 비특이적 신호 제거, 순수 유전자 발현 값 분리해내는 과정
실험적 편향 제거한 정제된 발현 데이터 사용하여 특정 유전자의 발현 수준 비교


Level 3 - NORM, INF
: gene expression normalized / additional values for additional genes inferred based on noramlized values for the 978 landmark genes
약물 처리 후 유전자 발현의 상대벅 변화 분석, 다양한 조건에서 유전자 발현 비교 및 시각화


Level 4 - ZS
: Z-scores for each gene (특정 샘플의 유전자 발현 값이 전체 플레이트의 유전자 발현 분포 내에서 얼마나 표준편차 단위로 떨어져 있는지)
유전자 발현 변화의 통계적 강도와 방향을 나타냄.


Level 5 - MODZ
: replicate-collapsed z-score vectors (replicate collapse generates one differential expression vector, signature)
Level 4 데이터 기반으로 처리 조건 (replication conditions) 결합한 후 생성된 대표적 유전자 발현 시그니처 - 동일 조건에서 반복된 실험 데이터 통계적 결함
약물 재창출, MOA 추론, 유전자 네트워크 분석 등에 활용.

(Level1-2: only the 978 landmark features / Level3-5: 978 landmark + 11,350 inferred)

---

compoundinfo
: 각 행이 고유한 화합물에 대한 정보 포함 (작용 기전 MoA, 타겟 target)

instinfo
: 각 실험 샘플 수준 메타데이터. 실험의 반복별 정보 기록 (처리 시간 timepoint, 용량 dose)

siginfo
: Level 5 메타데이터 (시그니처에 대한 정보). 생물학적 활성도 지표 (Transcriptional Activity Score, Replication Correlation 등) 포함.
- Transcriptional Activity Score
    : 특정 처리(perturbagen)가 유전자 발현 수준에 미치는 영향 측정. 약물, 화합물, 유전자 조작이 세포 내 전사에 얼마나 강한 변화를 유도했는지 평가하는 생물학적 활성도의 척도. 
    높으면 약물이 유전자 발현에 강한 변화 유도했음을 의미
    - 전사 transcription
        : DNA의 유전 정보가 RNA로 복사되는 과정. 유전자 발현의 첫 단계.
- Replication Correlation
    : 동일한 실험 조건에서 반복 간의 유전자 발현 결과가 얼마나 일관성을 가지는지 
    높으면 반복 간 결과가 일관성을 가지고 있음을 의미

geneinfo
: 유전자의 이름, Emsembl ID, 유전자 ID및 유전자 유형 정보 (유전자 데이터 식별용)
- Emsembl ID
    : 유전자 식별하는 고유 ID. 생물 종별 구분 가능. 유전자와 전사체, 단백질 간의 관계도 식별 가능.

cellinfo
: 세포주에 대한 메타데이터 (세포 이름, CCLE(Cancer Cell Line Encyclopedia) 이름, 세포 계통에 대한 정보 포함)


genetic_pertinfo
: 각 행이 유전자 조작 유형 (과발현[OE], knockdown[sh], CRISPR[xpr]). 관련 유전자 ID 및 Emsembl ID 포함.


Plate Types
---

실험의 기본 단위가 되는 것이 플레이트임. L1000 실험은 384개의 well이 있는 plate 사용하여 수행됨.

- perturbagen plate: 
    aliquots of the treatment perturbagens. 처리 약물(perturbagen)의 분주된 소량 샘플.

- RNA (Lysate) plate:
    perturbagen-treated cells. 
    처리된 세포를 포함한 플레이트. (처리 조건, 세포주, 시간, 반복 횟수에 대한 정보 포함)

- detection plate:
    amplicon (derived from perturbagen-treated cell lysates) that has been hybridized to Luminex beads.
    amplicon은 증폭된 유전자 조각. Luminex bead와 결합된 amplicon 데이터를 탐지하고 유전자 발현을 측정.

experimental validation of probe pools, quality control assessments

- LITMUS plate:
    indicate CMap data quality

- ASGARD plate:
    contain a number of well annotated, bioactive compounds with well defined MOAs



In [9]:
## This query may take up to a minute
query = "SELECT COUNT(DISTINCT(sig_id)) as num_level5_sigs FROM cmap-big-table.cmap_lincs_public_views.siginfo"


# a QueryJob object is returned which is why result() and to_dataframe() are required.
cmap_query.run_query(query=query, client=bq_client).result().to_dataframe()

Unnamed: 0,num_level5_sigs
0,1202656


In [10]:

cmap_query.list_cmap_compounds(bq_client)

Unnamed: 0,cmap_name
0,L-theanine
1,L-citrulline
2,BRD-A18795974
3,BRD-A27924917
4,BRD-A35931254
...,...
33622,TAS-301
33623,goserelin-acetate
33624,triptorelin
33625,T-98475


In [11]:
cmap_query.list_cmap_targets(bq_client)

Unnamed: 0,target,count
0,,31262
1,NR1I3,3
2,ACVR1,3
3,AKT3,7
4,AKT1,10
...,...,...
886,WASL,1
887,EIF2S1,2
888,MTTP,1
889,HSD3B2,1


In [12]:
cmap_query.list_cmap_moas(bq_client)

Unnamed: 0,moa,count
0,,31262
1,CAR agonist,2
2,ALK inhibitor,7
3,Akt inhibitor,13
4,BCL inhibitor,11
...,...,...
653,Telomerase reverse transcriptase expression in...,1
654,Gonadotropin releasing factor hormone receptor...,2
655,Gonadotropin releasing factor hormone receptor...,1
656,"Precursor for food preservatives, plasticizers...",1


**cmap_cell**

In [13]:
cell_lineage = 'skin'
core_cell_lines = ['A375', 'A549', 'HCC515', 'HEPG2', 'MCF7', 'PC3', 'VCAP', 'HT29', 'HA1E']

cell_table = cmap_query.cmap_cell(
    bq_client,
    cell_iname = core_cell_lines, 
    primary_disease=None,
#    cell_lineage=cell_lineage,
    verbose=False
)

cell_table.head(10)

Unnamed: 0,cell_iname,cellosaurus_id,donor_age,donor_age_death,donor_disease_age_onset,doubling_time,growth_medium,provider_catalog_id,feature_id,cell_type,donor_ethnicity,donor_sex,donor_tumor_phase,cell_lineage,primary_disease,subtype,provider_name,growth_pattern,ccle_name,cell_alias
0,HCC515,CVCL_5136,,,,,,,,tumor,Unknown,F,Unknown,lung,lung cancer,carcinoma,,adherent,HCC515_LUNG,HCC0515
1,HA1E,,,,,60.0,MEM-ALPHA (Invitrogen A1049001) supplemented w...,,,normal,Unknown,Unknown,Unknown,kidney,normal kidney sample,normal kidney sample,,unknown,HA1E_KIDNEY,
2,A549,CVCL_0023,58.0,,,48.0,F-12K ATCC catalog # 3-24,CCL-185,c-4,tumor,Caucasian,M,Primary,lung,lung cancer,non small cell carcinoma,ATCC,adherent,A549_LUNG,A 549
3,A375,CVCL_0132,54.0,,,36.0,DMEM Invitrogen catalog # 11995-65,CRL-1619,c-127,tumor,Unknown,F,Metastatic,skin,skin cancer,melanoma,ATCC,adherent,A375_SKIN,A 375|A-375
4,HT29,CVCL_0320,44.0,,,36.0,McCoy's 5A Invitrogen catalog # 166-82,HTB-38,c-272,tumor,Caucasian,F,Primary,large_intestine,colon cancer,adenocarcinoma,ATCC,adherent,HT29_LARGE_INTESTINE,HT 29
5,HEPG2,CVCL_0027,15.0,,,84.0,EMEM ATCC catalog # 3-23,HB-8065,,tumor,Caucasian,M,Primary,liver,liver cancer,carcinoma,ATCC,adherent,HEPG2_LIVER,Hep G2|HEP-G2
6,MCF7,CVCL_0031,40.0,,,72.0,EMEM ATCC catalog # 3-23,HTB-22,c-438,tumor,Caucasian,F,Metastatic,breast,breast cancer,adenocarcinoma,ATCC,adherent,MCF7_BREAST,IBMF-7
7,PC3,CVCL_0035,62.0,,,72.0,F-12K ATCC catalog # 3-24,CRL-1435,c-214,tumor,Caucasian,M,Metastatic,prostate,prostate cancer,adenocarcinoma,ATCC,mix,PC3_PROSTATE,PC.3|PC-3
8,VCAP,CVCL_2235,,,,220.0,DMEM ATCC catalog # 3-22,,,tumor,Caucasian,M,Metastatic,prostate,prostate cancer,adenocarcinoma,ATCC,adherent,VCAP_PROSTATE,Vcap


**cmap_genes**

In [14]:
#small sample of genes
gene_symbol_list = ['EGFR', 'NR3C1', 'MDM2']
gene_id_list = [1956, 2908, 4193] 

gene_table = cmap_query.cmap_genes(
    bq_client, 
    #gene_id=gene_id_list, 
    gene_symbol=gene_symbol_list, 
    #feature_space='landmark', 
    feature_space='aig',
    #verbose=True
  )

gene_table

Unnamed: 0,gene_id,gene_symbol,ensembl_id,gene_title,gene_type,src,feature_space
0,1956,EGFR,ENSG00000146648,epidermal growth factor receptor,protein-coding,NCBI,landmark
1,2908,NR3C1,ENSG00000113580,nuclear receptor subfamily 3 group C member 1,protein-coding,NCBI,landmark
2,4193,MDM2,ENSG00000135679,MDM2 proto-oncogene,protein-coding,NCBI,best inferred


**cmap_genetic_perts**

In [15]:
#small sample of genes
gene_symbol_list = ['EGFR', 'NR3C1', 'MDM2']
gene_id_list = [1956, 2908, 4193] 

genetic_perts_table = cmap_query.cmap_genetic_perts(bq_client,
    pert_id=None,
    cmap_name=None,
    gene_id=gene_id_list,
    gene_title=None,
    verbose=True
)

genetic_perts_table.sample(10)

Table: 
 cmap-big-table.cmap_lincs_public_views.genetic_pertinfo
Query:
 SELECT * FROM cmap-big-table.cmap_lincs_public_views.genetic_pertinfo WHERE gene_id in UNNEST([1956, 2908, 4193])


Unnamed: 0,pert_id,cmap_name,pert_type,gene_id,gene_title,ensembl_id,gene_type,feature_space
112,BRDN0003789926,MDM2,trt_xpr,4193,MDM2 proto-oncogene,ENSG00000135679,protein-coding,best inferred
1,BRDN0000464999,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
13,BRDN0000554051,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
60,TRCN0000199100,EGFR,trt_sh,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
98,CMAP-HSF-MDM2,MDM2,trt_oe,4193,MDM2 proto-oncogene,ENSG00000135679,protein-coding,best inferred
12,BRDN0000553914,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
19,BRDN0000553731,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
2,BRDN0000465000,EGFR,trt_oe,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark
105,TRCN0000355725,MDM2,trt_sh,4193,MDM2 proto-oncogene,ENSG00000135679,protein-coding,best inferred
70,TRCN0000121067,EGFR,trt_sh,1956,epidermal growth factor receptor,ENSG00000146648,protein-coding,landmark


**cmap_compounds**

In [16]:
target = 'EGFR'
moa = 'EGFR inhibitor'

compound_table = cmap_query.cmap_compounds(
    bq_client,
    pert_id=None,
    cmap_name=None, 
    moa='MDM inhibitor', 
    target=None, 
    compound_aliases=None, 
    limit=None, 
    verbose=False
  )

compound_table

Unnamed: 0,pert_id,cmap_name,target,moa,canonical_smiles,inchi_key,compound_aliases
0,BRD-K84987553,MDM2-inhibitor,MDM2,MDM inhibitor,OB(O)c1ccc(cc1)C(=O)/C=C/c2ccc(I)cc2,BYMGWCQXSPGCMW-XCVCLJGOSA-N,MDM-2-INHIBITOR
1,BRD-A12230535,nutlin-3,MDM2,MDM inhibitor,COc1ccc(C2=NC(C(N2C(=O)N2CCNC(=O)C2)c2ccc(Cl)c...,BDUHCSBCVGXTJM-UHFFFAOYSA-N,NUTLIN-3A
2,BRD-K00317371,RITA,MDM2,MDM inhibitor,OCc1ccc(s1)-c1ccc(o1)-c1ccc(CO)s1,KZENBFUSKMWCJF-UHFFFAOYSA-N,rita
3,BRD-K64925568,AMG-232,MDM2,MDM inhibitor,CC(C)[C@@H](CS(=O)(=O)C(C)C)N1[C@@H]([C@H](C[C...,DRLCSJFKKILATL-YWCVFVGNSA-N,
4,BRD-K17349619,HLI-373,MDM2,MDM inhibitor,CN(C)CCCNc1c2ccccc2n(C)c2nc(=O)n(C)c(=O)c12,LNRUPMPQQGPSQT-UHFFFAOYSA-N,
5,BRD-K65924316,serdemetan,MDM2,MDM inhibitor,C(Cc1c[nH]c2ccccc12)Nc1ccc(Nc2ccncc2)cc1,CEGSUKYESLWKJP-UHFFFAOYSA-N,
6,BRD-K60219430,serdemetan,MDM2,MDM inhibitor,C(Cc1c[nH]c2ccccc12)Nc1cccc(Nc2ccncc2)c1,JCKLHFMOFAYQHE-UHFFFAOYSA-N,
7,BRD-K93095519,SJ-172550,MDM4,MDM inhibitor,CCOc1cc(cc(Cl)c1OCC(=O)OC)C=C1C(=O)N(N=C/1C)c1...,RKKFQJXGAQWHBZ-YVLHZVERSA-N,
8,BRD-A16035238,SAR405838,MDM2,MDM inhibitor,CC(C)(C)CC1NC(C(c2cccc(Cl)c2F)C11C(=O)Nc2cc(Cl...,IDKAKZRYYDCJDU-UHFFFAOYSA-N,
9,BRD-K73255294,nutlin-3,MDM2,MDM inhibitor,COc1ccc(C2=N[C@@H]([C@@H](N2C(=O)N2CCNC(=O)C2)...,BDUHCSBCVGXTJM-IZLXSDGUSA-N,


In [17]:
compound_table.cmap_name.unique()

array(['MDM2-inhibitor', 'nutlin-3', 'RITA', 'AMG-232', 'HLI-373',
       'serdemetan', 'SJ-172550', 'SAR405838'], dtype=object)

**cmap_profiles**

In [18]:
list_of_sample_ids = [
  ''
]

list_of_cmap_names = [
    'afatinib',
    'dacomitinib', 
    'dovitinib',
    'erlotinib',
    'gefitinib'
]

instinfo_table = cmap_query.cmap_profiles(
    bq_client,
    sample_id=None,
    return_fields='all', 
    cmap_name=list_of_cmap_names 
)

instinfo_table.head(10)

Unnamed: 0,bead_batch,nearest_dose,pert_dose,pert_dose_unit,pert_idose,pert_time,pert_itime,pert_time_unit,cell_mfc_name,pert_mfc_id,...,pert_type,cell_iname,id,qc_pass,dyn_range,inv_level_10,build_name,failure_mode,project_code,cmap_name
0,b18,10.0,10.0,uM,10 uM,24.0,24 h,h,MFE319,BRD-K66175015,...,trt_cp,MFE319,,1,5.90798,2889.0,,,ERBB2,afatinib
1,b18,10.0,10.0,uM,10 uM,24.0,24 h,h,HFL1,BRD-K64052750,...,trt_cp,HFL1,,1,15.9461,2663.0,,,LUNG,gefitinib
2,f2b5,2.22,2.0,uM,2.22 uM,24.0,24 h,h,HS578T,BRD-K70401845-001-04-1,...,trt_cp,HS578T,,0,5.56761,1873.5,,qc_iqr,LJP,erlotinib
3,b21,3.33,3.33333,uM,3.33 uM,24.0,24 h,h,HCC515,BRD-K66175015,...,trt_cp,HCC515,,1,16.4222,3695.0,,,PBIOA,afatinib
4,b12,10.0,10.0,uM,10 uM,48.0,48 h,h,HEK293T,BRD-K70401845,...,trt_cp,HEK293T,,1,16.5793,3606.0,,,HSF,erlotinib
5,b18,10.0,10.0,uM,10 uM,24.0,24 h,h,VMCUB1,BRD-K66175015,...,trt_cp,VMCUB1,,0,3.87084,1918.0,,dyn_range,ERBB2,afatinib
6,b21,0.04,0.041152,uM,0.04 uM,24.0,24 h,h,A549,BRD-K64052750,...,trt_cp,A549,,1,20.4613,3171.5,,,PBIOA,gefitinib
7,f2b5,0.08,0.08,uM,0.08 uM,6.0,6 h,h,MCF10A,BRD-A58767537-001-01-2,...,trt_cp,MCF10A,,0,6.65421,1780.0,,qc_iqr,LJP,afatinib
8,b18,0.12,0.1,uM,0.12 uM,24.0,24 h,h,BT474,BRD-K66175015,...,trt_cp,BT474,,1,7.49361,2345.5,,,ERBB2,afatinib
9,b33,3.33,3.33333,uM,3.33 uM,12.0,12 h,h,MCF10A.WT,BRD-K85402309,...,trt_cp,MCF10A,,1,12.3534,4614.0,,,LCP,dovitinib


**cmap_sig**

In [19]:
list_of_sig_ids = [
  ''
]

list_of_cmap_names = [
    'afatinib',
    'dacomitinib', 
    'dovitinib',
    'erlotinib',
    'gefitinib'
]


siginfo_table = cmap_query.cmap_sig(
    bq_client,
    sig_id = None, 
    cell_iname = core_cell_lines, 
    cmap_name = list_of_cmap_names,
    return_fields='priority'
)

In [20]:
siginfo_table.sample(10)

Unnamed: 0,sig_id,pert_id,cmap_name,pert_type,cell_iname,pert_itime,pert_idose,nsample,det_plates,build_name,project_code,ss_ngene,cc_q75,tas
19,ASG003_A549_24H:B16,BRD-K70401845,erlotinib,trt_cp,A549,24 h,10 uM,5,ASG003_A549_24H_X21_B40|ASG003_A549_24H_X22_B4...,,ASG,205,0.3976,0.288689
598,RAD001_MCF7_6H:BRD-K64052750-001-04-3:0.0137,BRD-K64052750,gefitinib,trt_cp,MCF7,6 h,0.01 uM,2,RAD001_MCF7_6H_X1_F1B5_DUO52HI53LO|RAD001_MCF7...,,RAD,69,0.02,0.037564
336,LJP005_MCF7_24H:M21,BRD-K70401845,erlotinib,trt_cp,MCF7,24 h,1.11 uM,3,LJP005_MCF7_24H_X1_B17|LJP005_MCF7_24H_X2_B17|...,,LJP,85,0.3611,0.177155
219,LJP006_HCC515_24H:N08,BRD-K64052750,gefitinib,trt_cp,HCC515,24 h,3.33 uM,1,LJP006_HCC515_24H_X3_B19,,LJP,0,0.0,0.0
719,DOSVAL001_HEPG2_24H:BRD-K66175015:10,BRD-K66175015,afatinib,trt_cp,HEPG2,24 h,10 uM,3,DOSVAL001_HEPG2_24H_X1_B18|DOSVAL001_HEPG2_24H...,,DOSVAL,505,0.67,0.588185
880,ASG003_MCF7_6H:M19,BRD-K66175015,afatinib,trt_cp,MCF7,6 h,10 uM,5,ASG003_MCF7_6H_X1_B41|ASG003_MCF7_6H_X2_B41|AS...,,ASG,449,0.7086,0.570367
962,ASG002_A375_24H:F18,BRD-K64052750,gefitinib,trt_cp,A375,24 h,0.12 uM,1,ASG002_A375_24H_X1_B35,,ASG,0,0.0,0.0
704,ABY001_A549_XH:BRD-K66175015:2.5:24,BRD-K66175015,afatinib,trt_cp,A549,24 h,2.5 uM,3,ABY001_A549_XH_X1_B15,,ABY,240,0.28,0.262129
679,PBIOA014_HCC515_24H:C02,BRD-K64052750,gefitinib,trt_cp,HCC515,24 h,3.33 uM,3,PBIOA014_HCC515_24H_X1_B21|PBIOA014_HCC515_24H...,,PBIOA,341,0.49,0.413338
685,PBIOA014_HEPG2_24H:C02,BRD-K64052750,gefitinib,trt_cp,HEPG2,24 h,3.33 uM,3,PBIOA014_HEPG2_24H_X1_B21|PBIOA014_HEPG2_24H_X...,,PBIOA,124,0.38,0.219499


**cmap_matrix**

In [21]:
list_of_sig_ids = list(siginfo_table.sample(10)['sig_id'])
list_of_sample_ids = list(instinfo_table.sample(10)['sample_id'])

data = cmap_query.cmap_matrix(
    bq_client,
    cid=list_of_sig_ids,
    feature_space='landmark',
    data_level='level5'
)

data.data_df

Running query ... (1/1)
Total bytes processed: 225.6MiB
Total bytes billed: 225.6MiB
Pivoting Dataframes to GCT objects
Complete


cid,ABY001_HEPG2_XH:BRD-K66175015:10:3,ASG003_A549_6H:M19,CPD002_MCF7_6H:BRD-K64052750-001-07-6:10,LJP002_MCF7_6H:BRD-A58767537-001-01-2:0.08,LJP005_A549_24H:M19,LJP006_HCC515_24H:G23,LJP007_HA1E_24H:G11,PBIOA021_MCF7_24H:B09,REP.A015_PC3_24H:F21,REP.A018_PC3_24H:H17
rid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
10007,0.540840,8.310253,-0.757865,1.465134,-0.2489,0.3439,-0.20825,0.083920,0.221447,0.358831
1001,-0.889796,1.863513,-0.552089,0.493927,-1.4091,1.2563,0.07780,-0.860864,-0.011650,-2.675952
10013,1.194222,2.081213,0.771277,0.056501,-0.1043,0.4046,0.42825,-0.513505,-0.521240,-0.402096
10038,-0.177832,1.308670,-0.331695,0.623025,-0.7946,-0.7691,0.08430,-1.112166,0.033910,0.602061
10046,0.387298,0.636002,0.306061,-0.758560,-0.1004,10.0000,-0.36995,0.126641,-0.648677,-0.821133
...,...,...,...,...,...,...,...,...,...,...
994,-4.077174,1.388879,-0.255321,-0.052926,-2.0047,-0.1075,0.57185,-0.312149,0.229657,-1.517709
9943,0.302782,-0.750202,-0.444833,-0.215579,-0.5232,-1.4021,0.04130,-0.677258,0.220788,-0.232304
9961,0.425509,0.561366,-0.244685,-0.559313,-0.7307,1.2469,0.53575,-0.491613,1.453854,0.131459
998,-0.267433,-0.356068,0.754056,-0.450552,-1.4049,-1.5909,-0.14070,0.182923,1.008909,1.914315
