## Garvan institute data download

This data is not available directly from geo. Links are encoded in html tables, and we have a list of filenames we want.

This notebook finds the links for the filenames we want, and also finds cell type+subtype information surrounding those links.

We download those files into `garvan/`, and export `cell_type_in_each_sample.tsv` with cell type information for the garvan data as well as for other subdirectories.

In [1]:
# %load ~/ipyhead
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
import seaborn as sns

In [2]:
import urllib2
from bs4 import BeautifulSoup

# http://stackoverflow.com/questions/15517483/how-to-extract-urls-from-a-html-page-in-python
def get_all_links(url):
    conn = urllib2.urlopen(url)
    html = conn.read()

    soup = BeautifulSoup(html)
    return extract_links_from_soup(soup)
def extract_links_from_soup(soup):
    links = soup.find_all('a')
    
    links = [tag.get('href', None) for tag in links]
    return [l for l in links if l is not None]

In [3]:
# http://www.cell.com/cms/attachment/2009216772/2031846530/mmc1.pdf

to_scan = [
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/index.html', 'dendritic'),
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/macrophages/index.html', 'macrophage'),
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/Tcells/index.html', 'Tcell'),
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/mastcell/index.html', 'Mastcell'),
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/Bcells/index.html', 'Bcell'),
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/eosinophils/index.html', 'eosinophils'),
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/neutrophils/index.html','neutrophils'),
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/basophils/index.html','basophils'),
('http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/RA/index.html', 'synoviocytes'),
]



In [4]:
filenames = ['A_TS_RN_CD19+_U133A_130104.CEL',
'A_TS_2CD19+_U133A_180204.CEL',
'A_LW_imDC_U133A_200503.CEL',
'A_SZ_imDC2_U133A_250603.CEL',
'A_LW_DC6hLPS_U133A_200503.CEL',
'A_SZ_6hLPSDC2_U133A_250603.CEL',
'A_SZ_6hLPS_HCDC2_U133A.CEL',
'A_LW_DC6hLPS_HC_U133A.CEL',
'A_LW_DC48hLPS_U133A_200503.CEL',
'A_SZ_48hLPSDC2_U133A_250603.CEL',
'A_LW_DC48hLPS_HC_U133A.CEL',
'A_SZ_48hLPS_HCDC2_U133A.CEL',
'A_MF_ControlEosinophil.CEL',
'A_MF_2hrEosinophils.CEL',
'A_LW_macroctrl_U133A_130503.CEL',
'A_SZ_mac2cont_U133A_240603.CEL',
'A_LW_macro_LPS_U133A_130503.CEL',
'A_SZ_mac2LPS_U133A_240603.CEL',
'A_LW_macro_LPS_HC_U133A.CEL',
'A_SZ_mac2LPS_HC_U133A.CEL',
'A_MF_ControlMASTCELL_U133A.CEL',
'A_LW_mastcellctrl_U133A.CEL',
'A_MF_IgEMASTCELL_U133A.CEL',
'A_LW_mastcellIgE_U133A.CEL',
'A_LW_neutrophil_U133A.CEL',
'A_MF_neutrophils_U133A.CEL',
'A_TS_MSNeutroLPS_U133A.CEL',
'A_TS_RN_gdTcells_U133A.CEL',
'A_TS_RN_gdTcellsREP_A.CEL',
'A_MF_CCR7+_U133A_190202.CEL',
'A_MF_CCR7-_U133A_190202.CEL',
'A_MF_TH1human_U133A_290502.CEL',
'A_TS_TC_Th1_U133A_141003.CEL',
'A_MF_TH2human_U133A_290502.CEL',
'A_TS_TC_Th2_U133A_141003.CEL',
'A_LW_CD57+_U133A_030603.CEL',
'A_LW_CD57+_U133A_121102.CEL',
'A_MF_CD57+_U133A_010502.CEL',]

Per MSK paper, need to remove:

* T gamma delta cells: filenames with `gdTcells`
* T follicular helper cells: filenames with `CD57+`

In [5]:
dels = []
for f in filenames:
    if 'gdtcells' in f.lower() or 'cd57+' in f.lower():
        dels.append(f)
        print f
print dels
for d in dels:
    filenames.remove(d)

A_TS_RN_gdTcells_U133A.CEL
A_TS_RN_gdTcellsREP_A.CEL
A_LW_CD57+_U133A_030603.CEL
A_LW_CD57+_U133A_121102.CEL
A_MF_CD57+_U133A_010502.CEL
['A_TS_RN_gdTcells_U133A.CEL', 'A_TS_RN_gdTcellsREP_A.CEL', 'A_LW_CD57+_U133A_030603.CEL', 'A_LW_CD57+_U133A_121102.CEL', 'A_MF_CD57+_U133A_010502.CEL']


In [6]:
len(to_scan), len(filenames)

(9, 33)

In [7]:
urllib2.unquote('A_TS_2CD19%2B_U133A_180204.CEL')

'A_TS_2CD19+_U133A_180204.CEL'

In [21]:
matches = []
for url,_ in to_scan:
    for l in get_all_links(url):
        for f in filenames:
            if f.lower() in urllib2.unquote(l).lower():
                matches.append(url.replace('index.html', '') + l)
print matches

['http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/6hrLPS/exp1/A_LW_DC6hLPS_U133A_200503.CEL', 'http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/6hrLPS/exp2/A_SZ_6hLPSDC2_U133A_250603.CEL', 'http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/48hrLPS/exp1/A_LW_DC48hLPS_U133A_200503.CEL', 'http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/48hrLPS/exp2/A_SZ_48hLPSDC2_U133A_250603.CEL', 'http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/6hrLPS%2BHC/exp1/A_LW_DC6hLPS_HC_U133A.CEL', 'http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/6hrLPS%2BHC/exp2/A_SZ_6hLPS_HCDC2_U133A.CEL', 'http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/48hrLPS%2BHC/exp1/A_LW_DC48hLPS_HC_U133A.CEL', 'http://linkage.garvan.unsw.

In [23]:
assert len(matches) == len(filenames), len(matches)

In [24]:
pd.Series(matches).to_csv('urls.txt', index=None)

Now do: `cd garvan && wget -i ../urls.txt`

## Go back and reparse html to get cell type/subtype info for each sample

In [8]:
# flattening arbitrarily nested lists
# http://stackoverflow.com/a/10824420/130164

nests = [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]

def flatten(container):
    for i in container:
        if isinstance(i, (list,tuple)):
            for j in flatten(i):
                yield j
        else:
            yield i

print list(flatten(nests))

[1, 2, 3, 4, 5, 'hi', 6, 7, 'hello']


In [9]:
# scrape class names as well
files_in_class = [] # format for dataframe
for url,cell_type in to_scan:
    print '*' * 60
    print url, cell_type
    conn = urllib2.urlopen(url)
    html = conn.read()
    soup = BeautifulSoup(html)
    rows = soup.findAll('tr')
    
    rows_belonging_to_each_class = []
    next_header_row = -1
    
    for ix, tr in enumerate(rows):
        cols = tr.findAll('td')
        if (next_header_row == -1 and 'rowspan' in cols[0].attrs and int(cols[0].attrs['rowspan']) >= 2) \
        or (next_header_row == ix): # find the header rows for classes
            if 'rowspan' not in cols[0].attrs:
                print ix, 'no rowspan, dying'
                break
            print ix, 'new class:', cols[0].text
            next_header_row = ix + int(cols[0].attrs['rowspan'])
            rows_belonging_to_each_class.append([cell_type, cols[0].text, ix, next_header_row])
    
#     for i in range(len(rows_belonging_to_each_class)-1):
#         rows_belonging_to_each_class[i].append(rows_belonging_to_each_class[i+1][1])
        
#     rows_belonging_to_each_class[-1].append(len(rows))
    print rows_belonging_to_each_class
    
    for typ, c, start, end in rows_belonging_to_each_class:
        files_found = list(flatten([extract_links_from_soup(r) for r in rows[start:end]]))
        print len(files_found), files_found
        print
        files_found = [f.rsplit('/', 1)[-1] for f in files_found] # get filenames (via SO)
        files_in_class.extend([(typ, c,f) for f in files_found])

************************************************************
http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/index.html dendritic
3 new class: 6hr LPS
7 new class: 48hr LPS
11 new class: 6hr LPS + HC
15 new class: 48hr LPS + HC
19 new class: Immature
23 new class: -BAFF
27 new class: +BAFF
[['dendritic', u'6hr LPS', 3, 7], ['dendritic', u'48hr LPS', 7, 11], ['dendritic', u'6hr LPS + HC', 11, 15], ['dendritic', u'48hr LPS + HC', 15, 19], ['dendritic', u'Immature', 19, 23], ['dendritic', u'-BAFF', 23, 27], ['dendritic', u'+BAFF', 27, 31]]
8 ['6hrLPS/exp1/A_LW_DC6hLPS_U133A_200503.DAT', '6hrLPS/exp1/A_TS_SZ6hrLPSDC_1_U133B.DAT', '6hrLPS/exp1/A_LW_DC6hLPS_U133A_200503.CEL', '6hrLPS/exp1/A_TS_SZ6hrLPSDC_1_U133B.CEL', '6hrLPS/exp2/A_SZ_6hLPSDC2_U133A_250603.DAT', '6hrLPS/exp2/A_TS_SZ6hrLPSDC_2_U133B.DAT', '6hrLPS/exp2/A_SZ_6hLPSDC2_U133A_250603.CEL', '6hrLPS/exp2/A_TS_SZ6hrLPSDC_2_U133B.CEL']

8 ['48hrLPS/exp1/A_LW_DC48hLPS_U133A_200503.DAT', '48hr

In [10]:
files_in_class

[('dendritic', u'6hr LPS', 'A_LW_DC6hLPS_U133A_200503.DAT'),
 ('dendritic', u'6hr LPS', 'A_TS_SZ6hrLPSDC_1_U133B.DAT'),
 ('dendritic', u'6hr LPS', 'A_LW_DC6hLPS_U133A_200503.CEL'),
 ('dendritic', u'6hr LPS', 'A_TS_SZ6hrLPSDC_1_U133B.CEL'),
 ('dendritic', u'6hr LPS', 'A_SZ_6hLPSDC2_U133A_250603.DAT'),
 ('dendritic', u'6hr LPS', 'A_TS_SZ6hrLPSDC_2_U133B.DAT'),
 ('dendritic', u'6hr LPS', 'A_SZ_6hLPSDC2_U133A_250603.CEL'),
 ('dendritic', u'6hr LPS', 'A_TS_SZ6hrLPSDC_2_U133B.CEL'),
 ('dendritic', u'48hr LPS', 'A_LW_DC48hLPS_U133A_200503.DAT'),
 ('dendritic', u'48hr LPS', 'A_LW_DC48LPS_U133B_270503.DAT'),
 ('dendritic', u'48hr LPS', 'A_LW_DC48hLPS_U133A_200503.CEL'),
 ('dendritic', u'48hr LPS', 'A_LW_DC48LPS_U133B_270503.CEL'),
 ('dendritic', u'48hr LPS', 'A_SZ_48hLPSDC2_U133A_250603.DAT'),
 ('dendritic', u'48hr LPS', 'A_TS_SZDC48LPS_U133B.DAT'),
 ('dendritic', u'48hr LPS', 'A_SZ_48hLPSDC2_U133A_250603.CEL'),
 ('dendritic', u'48hr LPS', 'A_TS_SZDC48LPS_U133B.CEL'),
 ('dendritic', u'6hr LPS +

In [11]:
files_in_class = pd.DataFrame(files_in_class)
files_in_class.columns = ['type', 'subtype', 'filename']
files_in_class.head()

Unnamed: 0,type,subtype,filename
0,dendritic,6hr LPS,A_LW_DC6hLPS_U133A_200503.DAT
1,dendritic,6hr LPS,A_TS_SZ6hrLPSDC_1_U133B.DAT
2,dendritic,6hr LPS,A_LW_DC6hLPS_U133A_200503.CEL
3,dendritic,6hr LPS,A_TS_SZ6hrLPSDC_1_U133B.CEL
4,dendritic,6hr LPS,A_SZ_6hLPSDC2_U133A_250603.DAT


In [13]:
files_in_class['filename'] = files_in_class['filename'].apply(urllib2.unquote)

In [14]:
files_in_class['filename_low'] = files_in_class.filename.str.lower()
files_in_class.head()

Unnamed: 0,type,subtype,filename,filename_low
0,dendritic,6hr LPS,A_LW_DC6hLPS_U133A_200503.DAT,a_lw_dc6hlps_u133a_200503.dat
1,dendritic,6hr LPS,A_TS_SZ6hrLPSDC_1_U133B.DAT,a_ts_sz6hrlpsdc_1_u133b.dat
2,dendritic,6hr LPS,A_LW_DC6hLPS_U133A_200503.CEL,a_lw_dc6hlps_u133a_200503.cel
3,dendritic,6hr LPS,A_TS_SZ6hrLPSDC_1_U133B.CEL,a_ts_sz6hrlpsdc_1_u133b.cel
4,dendritic,6hr LPS,A_SZ_6hLPSDC2_U133A_250603.DAT,a_sz_6hlpsdc2_u133a_250603.dat


In [15]:
# filter to filenames we use
selected = files_in_class[files_in_class['filename_low'].isin([f.lower() for f in filenames])]
print selected.shape
selected.head()

(33, 4)


Unnamed: 0,type,subtype,filename,filename_low
2,dendritic,6hr LPS,A_LW_DC6hLPS_U133A_200503.CEL,a_lw_dc6hlps_u133a_200503.cel
6,dendritic,6hr LPS,A_SZ_6hLPSDC2_U133A_250603.CEL,a_sz_6hlpsdc2_u133a_250603.cel
10,dendritic,48hr LPS,A_LW_DC48hLPS_U133A_200503.CEL,a_lw_dc48hlps_u133a_200503.cel
14,dendritic,48hr LPS,A_SZ_48hLPSDC2_U133A_250603.CEL,a_sz_48hlpsdc2_u133a_250603.cel
17,dendritic,6hr LPS + HC,A_LW_DC6hLPS_HC_U133A.CEL,a_lw_dc6hlps_hc_u133a.cel


In [16]:
selected.groupby(['type', 'subtype']).size()

type         subtype      
Bcell        CD19+            2
Mastcell     Contrl           2
             IgE              2
Tcell        CCR7+            1
             CCR7-            1
             Th1              2
             Th2              2
dendritic    48hr LPS         2
             48hr LPS + HC    2
             6hr LPS          2
             6hr LPS + HC     2
             Immature         2
eosinophils  Control          1
             PMA              1
macrophage   LPS              2
             LPS + HC         2
             contorl          2
neutrophils  Contrl           2
             LPS              1
dtype: int64

In [17]:
assert len(set(selected.filename_low.unique()) - set([f.lower() for f in filenames])) == 0

In [19]:
assert len(set([f.lower() for f in filenames]) - set(selected.filename_low.unique())) == 0

### add info for other subdirs

the desired list of categories is (from https://docs.google.com/spreadsheets/d/1gUmc6RoTVcpXBsj5Ypc2RKX3-wPrbVhznQjLj6hJkOw/edit#gid=750136295):

```
aDC
B cells
Blood vessels
CD8 T cells
Cytotoxic cells
DC
Eosinophils
iDC
Lymph vessels
Macrophages
Mast cells
Neutrophils
NK CD56bright cells
NK CD56dim cells
NK cells
Normal mucosa
pDC
SW480 cancer cells
T cells
T helper cells
Tcm
Tem
TFH
Tgd
Th1 cells
Th17 cells
Th2 cells
TReg
```


In [21]:
import glob

In [58]:
# collect new info here
extra = []
import ntpath

In [59]:
extra.extend([('Tcell', 'CD8', ntpath.basename(x), None) for x in glob.glob('CD8_GSE6740/*.cel*')])

In [60]:
extra.extend([('SW480 cancer cells', None, ntpath.basename(x), None) for x in glob.glob('coloncancer*/*.cel*')])

In [61]:
# http://www.ebi.ac.uk/arrayexpress/experiments/E-MEXP-380/samples/
extra.extend([('NK cells', 'CD56dim', ntpath.basename(x), None) for x in glob.glob('NK_E-MEXP-380/*0210*.CEL*')])
extra.extend([('NK cells', 'CD56dim', ntpath.basename(x), None) for x in glob.glob('NK_E-MEXP-380/*0248*.CEL*')])
extra.extend([('NK cells', 'CD56bright', ntpath.basename(x), None) for x in glob.glob('NK_E-MEXP-380/*0211*.CEL*')])
extra.extend([('NK cells', 'CD56bright', ntpath.basename(x), None) for x in glob.glob('NK_E-MEXP-380/*0249*.CEL*')])

In [62]:
extra.extend([('Normal mucosa', 'Colon', ntpath.basename(x), None) for x in glob.glob('normalcolon*/*.cel*')])

In [63]:
extra

[('Tcell', 'CD8', 'GSM155229.CEL.gz', None),
 ('Tcell', 'CD8', 'GSM155232.CEL.gz', None),
 ('Tcell', 'CD8', 'GSM155234.CEL.gz', None),
 ('Tcell', 'CD8', 'GSM155236.CEL.gz', None),
 ('Tcell', 'CD8', 'GSM155238.CEL.gz', None),
 ('SW480 cancer cells', None, 'GSM21712.CEL.gz', None),
 ('SW480 cancer cells', None, 'GSM21713.CEL.gz', None),
 ('SW480 cancer cells', None, 'GSM21714.CEL.gz', None),
 ('NK cells', 'CD56dim', 'MHH0210HG_U133A.CEL', None),
 ('NK cells', 'CD56dim', 'MHH0248HG-U133A.CEL', None),
 ('NK cells', 'CD56bright', 'MHH0211HG_U133A.CEL', None),
 ('NK cells', 'CD56bright', 'MHH0249HG-U133A.CEL', None),
 ('Normal mucosa', 'Colon', 'CSS_COLON_N_1.CEL', None),
 ('Normal mucosa', 'Colon', 'CSS_COLON_N_10.CEL', None),
 ('Normal mucosa', 'Colon', 'CSS_COLON_N_11.CEL', None),
 ('Normal mucosa', 'Colon', 'CSS_COLON_N_12.CEL', None),
 ('Normal mucosa', 'Colon', 'CSS_COLON_N_13.CEL', None),
 ('Normal mucosa', 'Colon', 'CSS_COLON_N_14.CEL', None),
 ('Normal mucosa', 'Colon', 'CSS_COLON_N

In [64]:
all_bindea_samples = pd.concat([selected, pd.DataFrame(extra,columns=selected.columns)])
print all_bindea_samples.shape
all_bindea_samples.head()

(67, 4)


Unnamed: 0,type,subtype,filename,filename_low
2,dendritic,6hr LPS,A_LW_DC6hLPS_U133A_200503.CEL,a_lw_dc6hlps_u133a_200503.cel
6,dendritic,6hr LPS,A_SZ_6hLPSDC2_U133A_250603.CEL,a_sz_6hlpsdc2_u133a_250603.cel
10,dendritic,48hr LPS,A_LW_DC48hLPS_U133A_200503.CEL,a_lw_dc48hlps_u133a_200503.cel
14,dendritic,48hr LPS,A_SZ_48hLPSDC2_U133A_250603.CEL,a_sz_48hlpsdc2_u133a_250603.cel
17,dendritic,6hr LPS + HC,A_LW_DC6hLPS_HC_U133A.CEL,a_lw_dc6hlps_hc_u133a.cel


### done. export

In [65]:
all_bindea_samples.to_csv('cell_type_in_each_sample.tsv', sep='\t')

In [66]:
all_bindea_samples.fillna(0).groupby(['type', 'subtype']).size() # groupby throws out nulls

type                subtype      
Bcell               CD19+             2
Mastcell            Contrl            2
                    IgE               2
NK cells            CD56bright        2
                    CD56dim           2
Normal mucosa       Colon            22
SW480 cancer cells  0                 3
Tcell               CCR7+             1
                    CCR7-             1
                    CD8               5
                    Th1               2
                    Th2               2
dendritic           48hr LPS          2
                    48hr LPS + HC     2
                    6hr LPS           2
                    6hr LPS + HC      2
                    Immature          2
eosinophils         Control           1
                    PMA               1
macrophage          LPS               2
                    LPS + HC          2
                    contorl           2
neutrophils         Contrl            2
                    LPS               1
dtype:

In [68]:
df=all_bindea_samples

## classifying the types further

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Hematopoiesis_simple.svg/2000px-Hematopoiesis_simple.svg.png" width="75%" />

Notes:

* mature DC = activated.
* Tcm: T central memory -- express CCR7
* Tem: T effector memory -- do not express CCR7
* T helper cells encompass Th1, Th2, Th17. They express CD4 -- thus called CD4+.
* cytotoxic cells: CD8+ T cell, NK cells
* blood vessels: same as myeloid cells, which give monocytes, macrophages, dendritic cells, neutrophils, eosinophils (myeloid is innate immune sys)
* lymph vessels: same as lymphoid cells, meaning T,B,NK (lymphoid is adaptive immune sys)
* dendritic cells are either lymphoid or myeloid???
>  CD8alpha+ DCs and CD8alpha- DCs are
considered as lymphoid- and myeloidderived,
respectively, because CD8a1 but
not CD8a2 splenic DCs were generated
from lymphoid CD4low precursors, devoid
of myeloid reconstitution potential. Although
CD8a2 DCs were first described
as negative for CD4, our results demonstrate
that approximately 70% of them are
CD41. (http://www.bloodjournal.org/content/bloodjournal/96/7/2511.full.pdf?sso-checked=true)
    * but http://linkage.garvan.unsw.edu.au/public/microarrays/Arthritis_Inflammation/human/dentritic/index.html says they were generated from monocytes, so these dendritic samples are myeloid

**Where are the following???**

* pDC: dednritic cells CD123+ CD303+ CD304+ CD11c- CD14-
* Th17
* TReg (regulatory T cells, which express CD4, Foxp3, CD25)

-----

Let's produce a list of samples for each type.

In [81]:
Thelper=df[(df['type']=='Tcell') & (df['subtype'].isin(['Th1','Th2', 'Th17']))]
myeloid=df[(df['type'].isin(['Mastcell','eosinophils', 'macrophage', 'neutrophils', 'dendritic']))]
lymphoid=df[(df['type'].isin(['Tcell','Bcell', 'NK cells']))]

In [75]:
Bcell = df[df['type']=='Bcell']
Tcell=df[df['type']=='Tcell']
Tcm=df[(df['type']=='Tcell') & (df['subtype']=='CCR7+')]
Tem=df[(df['type']=='Tcell') & (df['subtype']=='CCR7-')]
Tkiller=df[(df['type']=='Tcell') & (df['subtype']=='CD8')]


Unnamed: 0,type,subtype,filename,filename_low
0,Tcell,CD8,GSM155229.CEL.gz,
1,Tcell,CD8,GSM155232.CEL.gz,
2,Tcell,CD8,GSM155234.CEL.gz,
3,Tcell,CD8,GSM155236.CEL.gz,
4,Tcell,CD8,GSM155238.CEL.gz,
