### Exploring COLOMBOS annotation dataset 

In this notebook we're going to extract information from the COLOMBOS annotation dataset. Specifically, we want to know what are the experimental conditions for each of the columns in the expression matrix. Let's get started. 

In [1]:
import pandas as pd
import numpy as np 
import panel as pn
import tqdm

pn.extension()

There are two annotation datasets for each organism in the COLOMBOS compendia: reference annotation and test annotation. We're going to use the one that is most exhaustive one, the test annotation. It is a tab-separated file, so we can load it with pandas. 

In [105]:
# Load data
df_test = pd.read_csv('../data/colombos_mtube_testannot_20151029.txt', sep = '\t')

In [56]:
# Take a look 
df_test.head()

Unnamed: 0,ContrastName,TestAnnotation
0,GSM28219.ch1-vs-GSM28219.ch2,MEDIUM.7H9:1
1,GSM28219.ch1-vs-GSM28219.ch2,DMSO:13uL
2,GSM28219.ch1-vs-GSM28219.ch2,STRAIN.H37Rv:1
3,GSM28219.ch1-vs-GSM28219.ch2,TEMPERATURE:37°C
4,GSM28219.ch1-vs-GSM28219.ch2,OD650:0.3


In [114]:
%load_ext blackcellmagic

In [119]:
print(
    "Most descriptive contrasts: ",
    "\n",
    df_test.ContrastName.value_counts().head(),
    "\n",
)

print(
    "Least annotated contrasts: ",
    "\n",
    df_test.ContrastName.value_counts().tail(),
    "\n",
)

print(
    "Most abundant experimental conditions",
    "\n",
    df_test.TestAnnotation.value_counts().head(),
    "\n",
)

print(
    "Least abundant exp. conditions: ",
    "\n",
    df_test.TestAnnotation.value_counts().tail(8),
    "\n",
)

Most descriptive contrasts:  
 GSM27905.ch1-vs-GSM27905.ch2    8
GSM27888.ch1-vs-GSM27888.ch2    8
GSM27896.ch1-vs-GSM27896.ch2    8
GSM27891.ch1-vs-GSM27891.ch2    8
GSM27883.ch1-vs-GSM27883.ch2    8
Name: ContrastName, dtype: int64 

Least annotated contrasts:  
 GSM91348.ch2-vs-GSM91342.ch2        3
GSM796812.ch1-vs-GSM796812.ch2      2
GSM1349746.ch1-vs-GSM1349744.ch1    2
GSM1349745.ch1-vs-GSM1349744.ch1    2
GSM796811.ch1-vs-GSM796811.ch2      2
Name: ContrastName, dtype: int64 

Most abundant experimental conditions 
 MEDIUM.7H9:1            851
STRAIN.H37Rv:1          745
TEMPERATURE:37°C        662
GROWTH.EXPONENTIAL:1    378
OD650:0.3               210
Name: TestAnnotation, dtype: int64 

Least abundant exp. conditions:  
 VALINOMYCIN:10uM       1
ARP4:4ug/ml            1
ISONIAZID:0.2ug/ml     1
ANTIMYCIN:12.5ug/ml    1
DINITROPHENOL:0.5mM    1
PA1:50ug/ml            1
AMIKACIN:10ug/ml       1
GSNO:0.02mM            1
Name: TestAnnotation, dtype: int64 



Okay, we can see that the descriptions are very heteregenous, hence the tidy format. It would be nice if we could directly retrieve all of the descriptions for a given index mapping to a column in the expression dataset, let's see how many unique contrasts we have. 

In [124]:
n_unique_contrasts = df_test.ContrastName.unique().shape[0]
print(f'Unique contrasts: {n_unique_contrasts}')

Unique contrasts: 1098


Great! We can see that the number of unique contrasts matches exactly with the number of columns in the expression dataset (see `transcriptome_exploratory_data_analysis.ipynb`). We can therefore play around with this data in order to get the descriptions for each experiment. It is not clear how to do it in such a way that we get all of the descriptions in one go. 

In [169]:
df_test.tail(10)

Unnamed: 0,ContrastName,TestAnnotation
5697,GSM812986.ch1-vs-GSM812986.ch2,MEDIUM.7H9:1
5698,GSM812986.ch1-vs-GSM812986.ch2,GROWTH.EXPONENTIAL:1
5699,GSM812986.ch1-vs-GSM812986.ch2,STRAIN.H37Rv:1
5700,GSM812986.ch1-vs-GSM812986.ch2,OD600:0.7
5701,GSM812986.ch1-vs-GSM812986.ch2,LINEZOLID:0.03ug/ml
5702,GSM812985.ch1-vs-GSM812985.ch2,MEDIUM.7H9:1
5703,GSM812985.ch1-vs-GSM812985.ch2,GROWTH.EXPONENTIAL:1
5704,GSM812985.ch1-vs-GSM812985.ch2,STRAIN.H37Rv:1
5705,GSM812985.ch1-vs-GSM812985.ch2,OD600:0.7
5706,GSM812985.ch1-vs-GSM812985.ch2,LINEZOLID:0.03ug/ml


In [168]:
unique_contrasts[-1]

'GSM812985.ch1-vs-GSM812985.ch2'

In [125]:
unique_contrasts = df_test.ContrastName.unique()

In [126]:
unique_contrasts[0]

'GSM28219.ch1-vs-GSM28219.ch2'

In [133]:
# Get annots for first contrast
df_test[df_test['ContrastName'] == unique_contrasts[0]].TestAnnotation.to_list()

['MEDIUM.7H9:1',
 'DMSO:13uL',
 'STRAIN.H37Rv:1',
 'TEMPERATURE:37°C',
 'OD650:0.3',
 'ASCIDIDEMIN:1ug/ml']

In [135]:
# Join as single string 
'_'.join(df_test[df_test['ContrastName'] == unique_contrasts[0]].TestAnnotation.to_list())

'MEDIUM.7H9:1_DMSO:13uL_STRAIN.H37Rv:1_TEMPERATURE:37°C_OD650:0.3_ASCIDIDEMIN:1ug/ml'

In [140]:
# Get list of annotations
annots = [
    "_".join(df_test[df_test["ContrastName"] == con].TestAnnotation.to_list())
    for con in unique_contrasts
]

100%|██████████| 1098/1098 [00:00<00:00, 1677.27it/s]


In [142]:
df_annot = pd.DataFrame(
    {
        'contrast_name': unique_contrasts,
         'annotation': annots
    }
)

In [192]:
# Get the first annot according to expression matrix
df_annot[df_annot['contrast_name'] == 'GSM28219.ch1-vs-GSM28219.ch2']

Unnamed: 0,contrast_name,annotation
0,GSM28219.ch1-vs-GSM28219.ch2,MEDIUM.7H9:1_DMSO:13uL_STRAIN.H37Rv:1_TEMPERAT...


In [193]:
# Get the last annot according to expression matrix
df_annot[df_annot['contrast_name']== 'GSM812985.ch1-vs-GSM812985.ch2']

Unnamed: 0,contrast_name,annotation
1097,GSM812985.ch1-vs-GSM812985.ch2,MEDIUM.7H9:1_GROWTH.EXPONENTIAL:1_STRAIN.H37Rv...


We can see that the expression data and this annotation order matches. 

Let's add the header from the colombos expression set. Note-to-self: the method to get this header dataset was to manually select the header and then transpose it in pandas...

In [190]:
header = pd.read_csv('../data/header_exp_colombos.csv')

In [191]:
header.head()

Unnamed: 0,contrast_name,Test description,Reference description,Experiment_id,Data source,Platform
0,GSM28219.ch1-vs-GSM28219.ch2,test_GSE1642_ASCI,ref_GSE1642,GSE1642,GEO,GPL1343
1,GSM28217.ch1-vs-GSM28217.ch2,test_GSE1642_121940-1,ref_GSE1642,GSE1642,GEO,GPL1343
2,GSM28218.ch1-vs-GSM28218.ch2,test_GSE1642_111891-1,ref_GSE1642,GSE1642,GEO,GPL1343
3,GSM28220.ch1-vs-GSM28220.ch2,test_GSE1642_CLOFA10,ref_GSE1642,GSE1642,GEO,GPL1343
4,GSM28224.ch1-vs-GSM28224.ch2,test_GSE1642_THIORI50,ref_GSE1642,GSE1642,GEO,GPL1343


In [194]:
df_annot.head()

Unnamed: 0,contrast_name,annotation
0,GSM28219.ch1-vs-GSM28219.ch2,MEDIUM.7H9:1_DMSO:13uL_STRAIN.H37Rv:1_TEMPERAT...
1,GSM28217.ch1-vs-GSM28217.ch2,MEDIUM.7H9:1_DMSO:13uL_STRAIN.H37Rv:1_TEMPERAT...
2,GSM28218.ch1-vs-GSM28218.ch2,MEDIUM.7H9:1_DMSO:13uL_STRAIN.H37Rv:1_TEMPERAT...
3,GSM28220.ch1-vs-GSM28220.ch2,MEDIUM.7H9:1_DMSO:50uL_STRAIN.H37Rv:1_TEMPERAT...
4,GSM28224.ch1-vs-GSM28224.ch2,MEDIUM.7H9:1_DMSO:50uL_STRAIN.H37Rv:1_TEMPERAT...


In [196]:
# Confirm we have the same entries 
np.all(header.contrast_name.values == df_annot.contrast_name.values)

True

Let's merge them ! 

In [197]:
df_annot_ = pd.merge(df_annot, header, on = 'contrast_name', how = 'inner')

In [198]:
df_annot_.head()

Unnamed: 0,contrast_name,annotation,Test description,Reference description,Experiment_id,Data source,Platform
0,GSM28219.ch1-vs-GSM28219.ch2,MEDIUM.7H9:1_DMSO:13uL_STRAIN.H37Rv:1_TEMPERAT...,test_GSE1642_ASCI,ref_GSE1642,GSE1642,GEO,GPL1343
1,GSM28217.ch1-vs-GSM28217.ch2,MEDIUM.7H9:1_DMSO:13uL_STRAIN.H37Rv:1_TEMPERAT...,test_GSE1642_121940-1,ref_GSE1642,GSE1642,GEO,GPL1343
2,GSM28218.ch1-vs-GSM28218.ch2,MEDIUM.7H9:1_DMSO:13uL_STRAIN.H37Rv:1_TEMPERAT...,test_GSE1642_111891-1,ref_GSE1642,GSE1642,GEO,GPL1343
3,GSM28220.ch1-vs-GSM28220.ch2,MEDIUM.7H9:1_DMSO:50uL_STRAIN.H37Rv:1_TEMPERAT...,test_GSE1642_CLOFA10,ref_GSE1642,GSE1642,GEO,GPL1343
4,GSM28224.ch1-vs-GSM28224.ch2,MEDIUM.7H9:1_DMSO:50uL_STRAIN.H37Rv:1_TEMPERAT...,test_GSE1642_THIORI50,ref_GSE1642,GSE1642,GEO,GPL1343


In [200]:
df_annot_.tail()

Unnamed: 0,contrast_name,annotation,Test description,Reference description,Experiment_id,Data source,Platform
1093,GSM787707.ch1-vs-GSM787707.ch2,MEDIUM.7H9:1_STRAIN.H37Rv:1_LUPULONE:10μg/mL,test_GSE31732_LUPULONE,ref_GSE31732,GSE31732,GEO,GPL11396
1094,GSM796812.ch1-vs-GSM796812.ch2,STRAIN.H37Rv:1_5-CHLORO-8-HYDROXYQUINOLINE:1,test_GSE32157_HYDROXYQUINOLINE,ref_GSE32157,GSE32157,GEO,GPL14570
1095,GSM796811.ch1-vs-GSM796811.ch2,STRAIN.H37Rv:1_5-CHLORO-8-HYDROXYQUINOLINE:1,test_GSE32157_HYDROXYQUINOLINE,ref_GSE32157,GSE32157,GEO,GPL14570
1096,GSM812986.ch1-vs-GSM812986.ch2,MEDIUM.7H9:1_GROWTH.EXPONENTIAL:1_STRAIN.H37Rv...,test_GSE32718_LINEZOLID,ref_GSE32718,GSE32718,GEO,GPL10895
1097,GSM812985.ch1-vs-GSM812985.ch2,MEDIUM.7H9:1_GROWTH.EXPONENTIAL:1_STRAIN.H37Rv...,test_GSE32718_LINEZOLID,ref_GSE32718,GSE32718,GEO,GPL10895


In [216]:
df_annot_[df_annot_['annotation'].str.contains('CHOL')]

Unnamed: 0,contrast_name,annotation,Test description,Reference description,Experiment_id,Data source,Platform
565,GSM351166.ch2-vs-GSM351166.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_H37Rv_Tween_24h_1mg/ml_Cholesterol,ref_GSE13978_H37Rv_Tween_24h,GSE13978,GEO,GPL4057
566,GSM351164.ch2-vs-GSM351164.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_H37Rv_Tween_3h_1mg/ml_Cholesterol,ref_GSE13978_H37Rv_Tween_3h,GSE13978,GEO,GPL4057
567,GSM351165.ch2-vs-GSM351165.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_H37Rv_Tween_3h_1mg/ml_Cholesterol,ref_GSE13978_H37Rv_Tween_3h,GSE13978,GEO,GPL4057
568,GSM350580.ch2-vs-GSM350580.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_H37Rv_Tween_3h_1mg/ml_Cholesterol,ref_GSE13978_H37Rv_Tween_3h,GSE13978,GEO,GPL4057
569,GSM351167.ch2-vs-GSM351167.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_H37Rv_Tween_24h_1mg/ml_Cholesterol,ref_GSE13978_H37Rv_Tween_24h,GSE13978,GEO,GPL4057
570,GSM351168.ch2-vs-GSM351168.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_H37Rv_Tween_24h_1mg/ml_Cholesterol,ref_GSE13978_H37Rv_Tween_24h,GSE13978,GEO,GPL4057
571,GSM351208.ch2-vs-GSM351208.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_CDC1551_KstR_mutant_Tween_1mg/ml...,ref_GSE13978_CDC1551_Tween_1mg/ml_Cholesterol,GSE13978,GEO,GPL4057
575,GSM351277.ch2-vs-GSM351277.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_CDC1551_KstR_mutant_Tween_1mg/ml...,ref_GSE13978_CDC1551_Tween_1mg/ml_Cholesterol,GSE13978,GEO,GPL4057
576,GSM351278.ch2-vs-GSM351278.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...,test_GSE13978_CDC1551_KstR_mutant_Tween_1mg/ml...,ref_GSE13978_CDC1551_Tween_1mg/ml_Cholesterol,GSE13978,GEO,GPL4057


In [213]:
df_annot_[df_annot_['annotation'].str.contains('CHOL')]['Test description'].values[-1]

'test_GSE13978_CDC1551_KstR_mutant_Tween_1mg/ml_Cholesterol'

In [215]:
df_annot_[df_annot_['annotation'].str.contains('CHOL')]['Reference description'].values[-1]

'ref_GSE13978_CDC1551_Tween_1mg/ml_Cholesterol'

Super neat! We can see that this set of experiments were published [in this paper](https://pubmed.ncbi.nlm.nih.gov/19822655/), where researchers show the relationship between a thiolase fadA5 on cholesterol consumption in the context of virulence. 

Let's export the dataset...

In [209]:
df_annot_.to_csv('../data/experiment_annotation_master_colombos_v0.csv', index = False)

In [159]:
# Showcase selection using string matching
df_annot[df_annot.annotation.str.contains('CHOLESTEROL')]

Unnamed: 0,contrast_name,annotation
565,GSM351166.ch2-vs-GSM351166.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...
566,GSM351164.ch2-vs-GSM351164.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...
567,GSM351165.ch2-vs-GSM351165.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...
568,GSM350580.ch2-vs-GSM350580.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...
569,GSM351167.ch2-vs-GSM351167.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...
570,GSM351168.ch2-vs-GSM351168.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...
571,GSM351208.ch2-vs-GSM351208.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...
575,GSM351277.ch2-vs-GSM351277.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...
576,GSM351278.ch2-vs-GSM351278.ch1,POLYSORBATE80:1%_MEDIUM.7H9:1_GROWTH.EXPONENTI...


In [158]:
df_annot.shape

(1098, 2)

In [149]:
# Samples with pH perturbation
df_annot[df_annot.annotation.str.contains('pH')].head()

Unnamed: 0,contrast_name,annotation
166,GSM27883.ch1-vs-GSM27883.ch2,MEDIUM.7H9:1_pH:5.6_STRAIN.H37Rv:1_TEMPERATURE...
167,GSM27884.ch1-vs-GSM27884.ch2,MEDIUM.7H9:1_pH:5.6_STRAIN.H37Rv:1_TEMPERATURE...
168,GSM27885.ch1-vs-GSM27885.ch2,MEDIUM.7H9:1_pH:5.6_STRAIN.H37Rv:1_TEMPERATURE...
169,GSM27887.ch1-vs-GSM27887.ch2,MEDIUM.7H9:1_pH:5.6_STRAIN.H37Rv:1_TEMPERATURE...
170,GSM27886.ch1-vs-GSM27886.ch2,MEDIUM.7H9:1_pH:5.6_STRAIN.H37Rv:1_TEMPERATURE...


In [161]:
# Samples with mutations
df_annot[df_annot.annotation.str.contains('MUT')]

Unnamed: 0,contrast_name,annotation
884,GSM216629.ch2-vs-GSM216629.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
885,GSM216630.ch2-vs-GSM216630.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
886,GSM216631.ch2-vs-GSM216631.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
887,GSM216632.ch2-vs-GSM216632.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
888,GSM216633.ch2-vs-GSM216633.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
889,GSM216634.ch2-vs-GSM216634.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
890,GSM216636.ch2-vs-GSM216636.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
891,GSM216635.ch2-vs-GSM216635.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
893,GSM216638.ch2-vs-GSM216638.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...
894,GSM216639.ch2-vs-GSM216639.ch1,MEDIUM.MM:1_STRAIN.H37Rv:1_FeCl3:50µM_ideR_Rv2...


In [163]:
# Samples with mutations
df_annot[df_annot.annotation.str.contains('DELETION')].shape

(139, 2)

In [180]:
# Very ugly string manipulation to get some unique experimental sets
np.unique(''.join((list(df_annot.annotation.unique()))).split(':'))[:10]

array(['+1GROWTH.EXPONENTIAL', '+1MACROPHAGE.THP-1', '+1MEDIUM.7H9',
       '+1MEDIUM.MM', '+1OXYGEN', '+1POLYSORBATE80', '+1STRAIN.CDC1551',
       '+1STRAIN.H37Rv', '+1_GROWTH.EXPONENTIAL', '+1_MEDIUM.DMEM'],
      dtype='<U35')

In this sense we can directly mine for interesting conditions in our expression set. In what follows we'll showcase another approach using interactive widgets using Panel. 

In [57]:
# Add a dummy count for pivoting
df_test['count_']= 1

In [58]:

pivot = pd.pivot_table(
    data =df_test,
    index = 'ContrastName',
    columns = 'TestAnnotation',
    values = 'count_',
    aggfunc='sum',
    fill_value = 0
)

In [59]:
pivot.head()

TestAnnotation,5-CHLORO-8-HYDROXYQUINOLINE:1,5-CHLOROPYRAZINAMIDE:40ug/ml,5-CHLOROPYRAZINAMIDE:80ug/ml,ALVEOLAR.A549:+1,AMIKACIN:10ug/ml,AMIKACIN:5ug/ml,AMPICILLIN:0.2ug/ml,ANHYDROTETRACYCLINE:100ng/ml,ANTIMYCIN:12.5ug/ml,ANTIMYCIN:25ug/ml,...,pstA1_Rv0930.DELETION:+1,pzaA.OVEREXPRESSION:1,regX3_Rv0491.DELETION:+1,sigE_Rv1221.DELETION:+1,sigE_Rv1221.DELETION:1,sigH_Rv3223c.DELETION:+1,sigH_Rv3223c.DELETION:1,whiB4_Rv3681c.DELETION:+1,whiB5_Rv0022c.OVEREXPRESSION:+1,zur_Rv2359.DELETION:+1
ContrastName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
GSM1003225.ch1-vs-GSM1003224.ch1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
GSM1003226.ch1-vs-GSM1003224.ch1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
GSM1004776.ch1-vs-GSM1004775.ch1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
GSM1004777.ch1-vs-GSM1004775.ch1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
GSM1004778.ch1-vs-GSM1004775.ch1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [60]:
# Check that at most there is a one  (binary table)
(pivot > 1).sum().sum()

0

Cool, we can then select values. 

In [61]:
pivot.shape

(1098, 361)

Now, in the rows of this matrix we have the total number of experiments, and in the columns

In [69]:
pivot.shape

(1098, 361)

In [90]:
row_slider = pn.widgets.LiteralInput(
    name="exp number", value = 0, type = int
)

In [91]:
@pn.depends(row_slider.param.value)
def get_annot(ix): 
    
    # Get the experimental description of sample i
    return pivot.iloc[ix][pivot.iloc[ix] == 1]

In [92]:
widgets = pn.Column(
    row_slider
)

In [102]:
title = '## Get experimental details per row of pivot table'

In [103]:
pn.Column(title, get_annot, widgets)

In [104]:
pivot.iloc[0][pivot.iloc[0] == 1]

TestAnnotation
GROWTH.EXPONENTIAL:1    1
MEDIUM.7H9:1            1
STRAIN.H37Rv:1          1
Name: GSM1003225.ch1-vs-GSM1003224.ch1, dtype: int64

In [77]:
pivot.reset_index()[pivot.reset_index()['ContrastName'] == 'GSM350580.ch2-vs-GSM350580.ch1']

TestAnnotation,ContrastName,5-CHLORO-8-HYDROXYQUINOLINE:1,5-CHLOROPYRAZINAMIDE:40ug/ml,5-CHLOROPYRAZINAMIDE:80ug/ml,ALVEOLAR.A549:+1,AMIKACIN:10ug/ml,AMIKACIN:5ug/ml,AMPICILLIN:0.2ug/ml,ANHYDROTETRACYCLINE:100ng/ml,ANTIMYCIN:12.5ug/ml,...,pstA1_Rv0930.DELETION:+1,pzaA.OVEREXPRESSION:1,regX3_Rv0491.DELETION:+1,sigE_Rv1221.DELETION:+1,sigE_Rv1221.DELETION:1,sigH_Rv3223c.DELETION:+1,sigH_Rv3223c.DELETION:1,whiB4_Rv3681c.DELETION:+1,whiB5_Rv0022c.OVEREXPRESSION:+1,zur_Rv2359.DELETION:+1
713,GSM350580.ch2-vs-GSM350580.ch1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [80]:
pivot_ = pivot.reset_index()

In [84]:
text_condition_widget = pn.widgets.TextInput(name='Condition', value='CHOLESTEROL.TIME:1440min')

In [86]:
text_condition_widget

In [87]:
@pn.depends(text_condition_widget.param.value)
def get_indices_for_experiment(experimental_condition): 
    return pivot_[pivot_[experimental_condition]==1].index.to_list()

In [98]:
title_2 = '### Get indices for a experimental condition'

In [99]:
pivot_.head()

TestAnnotation,ContrastName,5-CHLORO-8-HYDROXYQUINOLINE:1,5-CHLOROPYRAZINAMIDE:40ug/ml,5-CHLOROPYRAZINAMIDE:80ug/ml,ALVEOLAR.A549:+1,AMIKACIN:10ug/ml,AMIKACIN:5ug/ml,AMPICILLIN:0.2ug/ml,ANHYDROTETRACYCLINE:100ng/ml,ANTIMYCIN:12.5ug/ml,...,pstA1_Rv0930.DELETION:+1,pzaA.OVEREXPRESSION:1,regX3_Rv0491.DELETION:+1,sigE_Rv1221.DELETION:+1,sigE_Rv1221.DELETION:1,sigH_Rv3223c.DELETION:+1,sigH_Rv3223c.DELETION:1,whiB4_Rv3681c.DELETION:+1,whiB5_Rv0022c.OVEREXPRESSION:+1,zur_Rv2359.DELETION:+1
0,GSM1003225.ch1-vs-GSM1003224.ch1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,GSM1003226.ch1-vs-GSM1003224.ch1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,GSM1004776.ch1-vs-GSM1004775.ch1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,GSM1004777.ch1-vs-GSM1004775.ch1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,GSM1004778.ch1-vs-GSM1004775.ch1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [101]:
pn.Column(title_2, text_condition_widget, get_indices_for_experiment)

In [100]:
pn.Column(title, get_annot, widgets)