In [1]:
import sys
sys.path.append("..")

%load_ext autoreload
%autoreload 1
%aimport pyfantom.parse_ontology

import pandas as pd
from itertools import repeat
import re
from orangecontrib.bio.ontology import OBOOntology
import logging

from pyfantom.parse_ontology import *
pd.set_option('display.max_colwidth', -1)

In [2]:
def enable_logging(): 
    !rm '../data/process_sample_descriptions.log'
    logger = logging.getLogger()
    logger.setLevel(logging.DEBUG)
    fh = logging.FileHandler('../data/process_sample_descriptions.log')
    fh.setLevel(logging.DEBUG)
    formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
    fh.setFormatter(formatter)
    logger.handlers[0].setLevel(logging.WARNING) # adjust stream handler
    logger.handlers[0].propagate = True
    logger.addHandler(fh)

### Read column headers from fantom5 data. 
Read the column headers and extract sample information from it. 

In [6]:
!ls ../data

column_vars.txt      ff-phase2-140729.obo
delimiter_nodes.tsv  hg19.cage_peak_phase1and2combined_tpm_ann.osc.txt


In [7]:
!grep "^##ColumnVariables" ../data/hg19.cage_peak_phase1and2combined_tpm_ann.osc.txt | cut -d"=" -f2 | head

CAGE peak id
short form of the description below. Common descriptions in the long descriptions has been omited
description of the CAGE peak
transcript which 5end is the nearest to the the CAGE peak
entrezgene (genes) id associated with the transcript
hgnc (gene symbol) id associated with the transcript
uniprot (protein) id associated with the transcript
tpm of 293SLAM rinderpest infection, 00hr, biol_rep1.CNhs14406.13541-145H4
tpm of 293SLAM rinderpest infection, 00hr, biol_rep2.CNhs14407.13542-145H5
tpm of 293SLAM rinderpest infection, 00hr, biol_rep3.CNhs14408.13543-145H6
cut: write error: Broken pipe


In [8]:
!grep "^##ColumnVariables" ../data/hg19.cage_peak_phase1and2combined_tpm_ann.osc.txt | cut -d"=" -f2 | tail -n+8 | head

tpm of 293SLAM rinderpest infection, 00hr, biol_rep1.CNhs14406.13541-145H4
tpm of 293SLAM rinderpest infection, 00hr, biol_rep2.CNhs14407.13542-145H5
tpm of 293SLAM rinderpest infection, 00hr, biol_rep3.CNhs14408.13543-145H6
tpm of 293SLAM rinderpest infection, 06hr, biol_rep1.CNhs14410.13544-145H7
tpm of 293SLAM rinderpest infection, 06hr, biol_rep2.CNhs14411.13545-145H8
tpm of 293SLAM rinderpest infection, 06hr, biol_rep3.CNhs14412.13546-145H9
tpm of 293SLAM rinderpest infection, 12hr, biol_rep1.CNhs14413.13547-145I1
tpm of 293SLAM rinderpest infection, 12hr, biol_rep2.CNhs14414.13548-145I2
tpm of 293SLAM rinderpest infection, 12hr, biol_rep3.CNhs14415.13549-145I3
tpm of 293SLAM rinderpest infection, 24hr, biol_rep1.CNhs14416.13550-145I4
tail: write error: Broken pipe
tail: write error
cut: write error: Broken pipe


In [9]:
!grep "^##ColumnVariables" ../data/hg19.cage_peak_phase1and2combined_tpm_ann.osc.txt | cut -d"=" -f2 | tail -n+8 > ../data/column_vars.txt

In [10]:
sample_infos = !cat ../data/column_vars.txt

In [11]:
sample_infos[:5]

['tpm of 293SLAM rinderpest infection, 00hr, biol_rep1.CNhs14406.13541-145H4',
 'tpm of 293SLAM rinderpest infection, 00hr, biol_rep2.CNhs14407.13542-145H5',
 'tpm of 293SLAM rinderpest infection, 00hr, biol_rep3.CNhs14408.13543-145H6',
 'tpm of 293SLAM rinderpest infection, 06hr, biol_rep1.CNhs14410.13544-145H7',
 'tpm of 293SLAM rinderpest infection, 06hr, biol_rep2.CNhs14411.13545-145H8']

### Retreiving Information from the ontoloty. 

The column headers are difficult to parse (inconsistent commata, etc.). 
We found an ontology on the fantom5 web page. [1]

First, we check, if all the ids from the column headers appear in the ontology. 

[1] http://fantom.gsc.riken.jp/5/datafiles/latest/extra/Ontology/ff-phase2-140729.obo.txt

In [12]:
OBO_ID_REGEX = re.compile(r'CNhs\d+.(\w+)-(\w+)')

In [13]:
for info_line in sample_infos:
    ff_id = "-".join(OBO_ID_REGEX.search(info_line).groups())
    res = !grep {ff_id} ../data/ff-phase2-140729.obo.txt 
    assert len(res) > 0

that seems to be the case...

#### Try out the Orange Bioinformatics Python library for parsing and manipulating ontologies

In [14]:
obo = OBOOntology()
obo.load(open("../data/ff-phase2-140729.obo"))

In [15]:
print(obo.term("FF:1394-42H2").tags())

[('id', 'FF:1394-42H2', None, None), ('name', 'lung, neonate N30, rep1', None, None), ('namespace', 'FANTOM5', None, None), ('subset', 'phase1', None, None), ('subset', 'phase2', None, None), ('subset', 'update022', None, None), ('is_a', 'EFO:0002091', None, 'biological replicate'), ('is_a', 'FF:0011489', None, 'mouse lung- neonate N30 sample')]


In [16]:
obo.term("FF:1394-42H2").name

'lung, neonate N30, rep1'

#### All 'samples' are annotated as some sort of 'sample' in the ontology: 

In [17]:
sample = "FF:0000001" # most general sample id 
for info_line in sample_infos[:2]: 
    ff_id = "FF:" + "-".join(OBO_ID_REGEX.search(info_line).groups())
    ids = [term.id for term in obo.super_terms(ff_id)]
    assert sample in ids

## Preliminary checks positive $\to$ Let's get started!
Using the ontology, we at least don't run into massive comma-parsing trouble again. 

I wrote the python module `parse_ontology.py`. It takes care of
* make sure that there is no inconsistent information between ontology and the sample name (aka. `sample_infos`)
* if information is missing in the ontology, complement it with information from the sample name.
* In such a case, write an entry to `annot_notes`, that we can improve the ontology later on. 

#### Replicates
* There are technical and biological replicates. 
* Technical replicates have the same obo_id (`FF:?????-?????`), but different library ids (`CNhs??????`). 
* The library id is unique:

* We can identify biological replicates from the sample name with the keywords `biol_rep`, `rep` and `donor`. Having different donors is also a way of having biological replicates. 

As this would be redundant information, we would expect no sample to have both 'biol_rep' and 'donor' in the sample name: 

In [18]:
[x for x in sample_infos if get_donor(x) is not None and get_biol_replicate(x) is not None ]

[]

That seems to be the case. 

### Do the parsing and store the processed information as `csv` files. 

In [30]:
enable_logging()

There is additional information stored in the supplementary information table from the FANTOM5 Paper. 
We retrieve the 'sample_type' information additionally from this data source: 

In [22]:
si_table = pd.read_excel("../data/fantom5-S1.xlsx", sheetname=1)
pd.set_option('display.max_colwidth', 40)
si_table.set_index("Library_id", inplace=True)
si_table.columns

Index(['Sample type', 'species', 'description', 'supplier', 'sample id',
       'Catalog number', 'external URL', 'lot number', 'donor(cell lot)',
       'sex', 'age', 'RIN', 'Q20 mapped tags', 'fraction under robust DPI',
       'Number of peaks called', 'Number of 5' EST/cDNA supported peaks',
       'Fraction peaks corresponding to known 5' end',
       'RIKEN Yokohama ethics application', 'marker check',
       'used for peak calling', 'used for expression analysis',
       'top 3 most correlated samples'],
      dtype='object')

In [23]:
def get_sample_si(sample_info):
    info_si = {
        "lib_id": get_lib_id(sample_info),
        "obo_id": get_obo_id(sample_info)
    }
    try:
        info_si["sample_type"] = si_table.loc[info_si["lib_id"]]["Sample type"]
    except KeyError:
        info_si["sample_type"] = None
    return info_si

#### Run the python module
`annot_notes` will contain an entry for every piece of information that is missing in the ontology. 
`annotations` contains the collated information from the three data sources
* ontolgy
* column names in the data table
* supplementary information of the FANTOM5 Paper

In [24]:
annot_notes = []
annotations = []
for sample_info in sample_infos: 
    logging.info("Processing Sample: '{}'".format(sample_info))
    info_n = process_sample_name(sample_info)
    info_o = process_sample_ontology(obo, sample_info)
    info_si = get_sample_si(sample_info)
    annotations.append(merge_sample_info(info_n, info_o, info_si, annot_notes))

INFO:root:Processing Sample: 'tpm of 293SLAM rinderpest infection, 00hr, biol_rep1.CNhs14406.13541-145H4'
INFO:root:CNhs14406: Parsing information from sample name
INFO:root:CNhs14406: Searching Ontology
INFO:root:CNhs14406: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of 293SLAM rinderpest infection, 00hr, biol_rep2.CNhs14407.13542-145H5'
INFO:root:CNhs14407: Parsing information from sample name
INFO:root:CNhs14407: Searching Ontology
INFO:root:CNhs14407: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of 293SLAM rinderpest infection, 00hr, biol_rep3.CNhs14408.13543-145H6'
INFO:root:CNhs14408: Parsing information from sample name
INFO:root:CNhs14408: Searching Ontology
INFO:root:CNhs14408: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology.

INFO:root:CNhs14476: Parsing information from sample name
INFO:root:CNhs14476: Searching Ontology
INFO:root:CNhs14476: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of ARPE-19 EMT induced with TGF-beta and TNF-alpha, 01hr00min, biol_rep3.CNhs14477.13639-147A3'
INFO:root:CNhs14477: Parsing information from sample name
INFO:root:CNhs14477: Searching Ontology
INFO:root:CNhs14477: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of ARPE-19 EMT induced with TGF-beta and TNF-alpha, 01hr20min, biol_rep1.CNhs14478.13640-147A4'
INFO:root:CNhs14478: Parsing information from sample name
INFO:root:CNhs14478: Searching Ontology
INFO:root:CNhs14478: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of ARPE-19 EMT induced with TGF-beta and TNF-alpha, 01hr20min, biol_rep2.CNhs14479.13641-147A5'
INFO:root:CNhs14479: Parsing information from sample name
INFO:r

INFO:root:CNhs14519: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of ARPE-19 EMT induced with TGF-beta and TNF-alpha, 06hr00min, biol_rep2.CNhs14520.13665-147D2'
INFO:root:CNhs14520: Parsing information from sample name
INFO:root:CNhs14520: Searching Ontology
INFO:root:CNhs14520: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of ARPE-19 EMT induced with TGF-beta and TNF-alpha, 06hr00min, biol_rep3.CNhs14522.13666-147D3'
INFO:root:CNhs14522: Parsing information from sample name
INFO:root:CNhs14522: Searching Ontology
INFO:root:CNhs14522: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of ARPE-19 EMT induced with TGF-beta and TNF-alpha, 07hr00min, biol_rep1.CNhs14523.13667-147D4'
INFO:root:CNhs14523: Parsing information from sample name
INFO:root:CNhs14523: Searching Ontology
INFO:root:CNhs14523: Merging Information
DEBUG:root:biol_rep: mi

INFO:root:CNhs12069: Searching Ontology
INFO:root:CNhs12069: Merging Information
INFO:root:Processing Sample: 'tpm of Adipocyte - subcutaneous, donor1.CNhs12494.11259-116F8'
INFO:root:CNhs12494: Parsing information from sample name
INFO:root:CNhs12494: Searching Ontology
INFO:root:CNhs12494: Merging Information
INFO:root:Processing Sample: 'tpm of Adipocyte - subcutaneous, donor2.CNhs11371.11336-117F4'
INFO:root:CNhs11371: Parsing information from sample name
INFO:root:CNhs11371: Searching Ontology
INFO:root:CNhs11371: Merging Information
INFO:root:Processing Sample: 'tpm of Adipocyte - subcutaneous, donor3.CNhs12017.11408-118E4'
INFO:root:CNhs12017: Parsing information from sample name
INFO:root:CNhs12017: Searching Ontology
INFO:root:CNhs12017: Merging Information
INFO:root:Processing Sample: 'tpm of Adipocyte differentiation, day04, donor1.CNhs12516.13019-139D4'
INFO:root:CNhs12516: Parsing information from sample name
INFO:root:CNhs12516: Searching Ontology
INFO:root:CNhs12516: Mer

DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of Aortic smooth muscle cell response to FGF2, 00hr30min, biol_rep2 (LK8).CNhs13360.12742-135I6'
INFO:root:CNhs13360: Parsing information from sample name
INFO:root:CNhs13360: Searching Ontology
INFO:root:CNhs13360: Merging Information
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of Aortic smooth muscle cell response to FGF2, 00hr30min, biol_rep3 (LK9).CNhs13569.12840-137B5'
INFO:root:CNhs13569: Parsing information from sample name
INFO:root:CNhs13569: Searching Ontology
INFO:root:CNhs13569: Merging Information
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of Aortic smooth muscle cell response to FGF2, 00hr45min, biol_rep1 (LK10).CNhs13343.12645-134G8'
INFO:root:CNhs13343: Parsing information from sample name
INFO:root:CNhs13343: Searching Ontology
INFO:root:CNhs13343: Merging Information
DEBUG:root:sample_type: missing in ontology. 
INFO:

INFO:root:Processing Sample: 'tpm of Aortic smooth muscle cell response to IL1b, 00hr15min, biol_rep1 (LK34).CNhs13350.12653-134H7'
INFO:root:CNhs13350: Parsing information from sample name
INFO:root:CNhs13350: Searching Ontology
INFO:root:CNhs13350: Merging Information
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of Aortic smooth muscle cell response to IL1b, 00hr15min, biol_rep2 (LK35).CNhs13370.12751-136A6'
INFO:root:CNhs13370: Parsing information from sample name
INFO:root:CNhs13370: Searching Ontology
INFO:root:CNhs13370: Merging Information
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of Aortic smooth muscle cell response to IL1b, 00hr15min, biol_rep3 (LK36).CNhs13578.12849-137C5'
INFO:root:CNhs13578: Parsing information from sample name
INFO:root:CNhs13578: Searching Ontology
INFO:root:CNhs13578: Merging Information
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of Aortic smoot

INFO:root:CNhs13586: Parsing information from sample name
INFO:root:CNhs13586: Searching Ontology
INFO:root:CNhs13586: Merging Information
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of Astrocyte - cerebellum, donor1.CNhs11321.11500-119F6'
INFO:root:CNhs11321: Parsing information from sample name
INFO:root:CNhs11321: Searching Ontology
INFO:root:CNhs11321: Merging Information
INFO:root:Processing Sample: 'tpm of Astrocyte - cerebellum, donor2.CNhs12081.11580-120F5'
INFO:root:CNhs12081: Parsing information from sample name
INFO:root:CNhs12081: Searching Ontology
INFO:root:CNhs12081: Merging Information
INFO:root:Processing Sample: 'tpm of Astrocyte - cerebellum, donor3.CNhs12117.11661-122F5'
INFO:root:CNhs12117: Parsing information from sample name
INFO:root:CNhs12117: Searching Ontology
INFO:root:CNhs12117: Merging Information
INFO:root:Processing Sample: 'tpm of Astrocyte - cerebral cortex, donor1.CNhs10864.11235-116D2'
INFO:root:CNhs10864: Parsing 

INFO:root:CNhs13489: Merging Information
INFO:root:Processing Sample: 'tpm of CD14+ monocytes - treated with B-glucan, donor3.CNhs13495.11889-125D8'
INFO:root:CNhs13495: Parsing information from sample name
INFO:root:CNhs13495: Searching Ontology
INFO:root:CNhs13495: Merging Information
INFO:root:Processing Sample: 'tpm of CD14+ monocytes - treated with BCG, donor1.CNhs13465.11860-125A6'
INFO:root:CNhs13465: Parsing information from sample name
INFO:root:CNhs13465: Searching Ontology
INFO:root:CNhs13465: Merging Information
INFO:root:Processing Sample: 'tpm of CD14+ monocytes - treated with BCG, donor2.CNhs13475.11870-125B7'
INFO:root:CNhs13475: Parsing information from sample name
INFO:root:CNhs13475: Searching Ontology
INFO:root:CNhs13475: Merging Information
INFO:root:Processing Sample: 'tpm of CD14+ monocytes - treated with BCG, donor3.CNhs13543.11880-125C8'
INFO:root:CNhs13543: Parsing information from sample name
INFO:root:CNhs13543: Searching Ontology
INFO:root:CNhs13543: Mergin

INFO:root:Processing Sample: 'tpm of CD14-CD16+ Monocytes, donor3.CNhs13548.11911-125G3'
INFO:root:CNhs13548: Parsing information from sample name
INFO:root:CNhs13548: Searching Ontology
INFO:root:CNhs13548: Merging Information
INFO:root:Processing Sample: 'tpm of CD19+ B Cells (pluriselect), donor090309, donation1.CNhs12177.12189-129B2'
INFO:root:CNhs12177: Parsing information from sample name
INFO:root:CNhs12177: Searching Ontology
INFO:root:CNhs12177: Merging Information
INFO:root:Processing Sample: 'tpm of CD19+ B Cells (pluriselect), donor090309, donation2.CNhs12179.12194-129B7'
INFO:root:CNhs12179: Parsing information from sample name
INFO:root:CNhs12179: Searching Ontology
INFO:root:CNhs12179: Merging Information
INFO:root:Processing Sample: 'tpm of CD19+ B Cells (pluriselect), donor090309, donation3.CNhs12181.12199-129C3'
INFO:root:CNhs12181: Parsing information from sample name
INFO:root:CNhs12181: Searching Ontology
INFO:root:CNhs12181: Merging Information
INFO:root:Processin

INFO:root:Processing Sample: 'tpm of CD4+CD25+CD45RA- memory regulatory T cells, donor3.CNhs13538.11908-125F9'
INFO:root:CNhs13538: Parsing information from sample name
INFO:root:CNhs13538: Searching Ontology
INFO:root:CNhs13538: Merging Information
INFO:root:Processing Sample: 'tpm of CD4+CD25-CD45RA+ naive conventional T cells expanded, donor1.CNhs13202.11791-124B9'
INFO:root:CNhs13202: Parsing information from sample name
INFO:root:CNhs13202: Searching Ontology
INFO:root:CNhs13202: Merging Information
INFO:root:Processing Sample: 'tpm of CD4+CD25-CD45RA+ naive conventional T cells expanded, donor2.CNhs13813.11913-125G5'
INFO:root:CNhs13813: Parsing information from sample name
INFO:root:CNhs13813: Searching Ontology
INFO:root:CNhs13813: Merging Information
INFO:root:Processing Sample: 'tpm of CD4+CD25-CD45RA+ naive conventional T cells expanded, donor3.CNhs13814.11917-125G9'
INFO:root:CNhs13814: Parsing information from sample name
INFO:root:CNhs13814: Searching Ontology
INFO:root:C

INFO:root:CNhs14425: Searching Ontology
INFO:root:CNhs14425: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of COBL-a rinderpest infection, 12hr, biol_rep2.CNhs14426.13560-146A5'
INFO:root:CNhs14426: Parsing information from sample name
INFO:root:CNhs14426: Searching Ontology
INFO:root:CNhs14426: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of COBL-a rinderpest infection, 12hr, biol_rep3.CNhs14427.13561-146A6'
INFO:root:CNhs14427: Parsing information from sample name
INFO:root:CNhs14427: Searching Ontology
INFO:root:CNhs14427: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of COBL-a rinderpest infection, 24hr, biol_rep1.CNhs14428.13562-146A7'
INFO:root:CNhs14428: Parsing information from sample name
I

INFO:root:CNhs12020: Merging Information
INFO:root:Processing Sample: 'tpm of Chondrocyte - re diff, donor2.CNhs11373.11339-117F7'
INFO:root:CNhs11373: Parsing information from sample name
INFO:root:CNhs11373: Searching Ontology
INFO:root:CNhs11373: Merging Information
INFO:root:Processing Sample: 'tpm of Chondrocyte - re diff, donor3.CNhs12021.11411-118E7'
INFO:root:CNhs12021: Parsing information from sample name
INFO:root:CNhs12021: Searching Ontology
INFO:root:CNhs12021: Merging Information
INFO:root:Processing Sample: 'tpm of Ciliary Epithelial Cells, donor1.CNhs10871.11242-116D9'
INFO:root:CNhs10871: Parsing information from sample name
INFO:root:CNhs10871: Searching Ontology
INFO:root:CNhs10871: Merging Information
INFO:root:Processing Sample: 'tpm of Ciliary Epithelial Cells, donor2.CNhs11966.11323-117D9'
INFO:root:CNhs11966: Parsing information from sample name
INFO:root:CNhs11966: Searching Ontology
INFO:root:CNhs11966: Merging Information
INFO:root:Processing Sample: 'tpm of 

INFO:root:CNhs12010: Merging Information
INFO:root:Processing Sample: 'tpm of Endothelial Cells - Vein, donor1.CNhs12497.11267-116G7'
INFO:root:CNhs12497: Parsing information from sample name
INFO:root:CNhs12497: Searching Ontology
INFO:root:CNhs12497: Merging Information
INFO:root:Processing Sample: 'tpm of Endothelial Cells - Vein, donor2.CNhs11377.11344-117G3'
INFO:root:CNhs11377: Parsing information from sample name
INFO:root:CNhs11377: Searching Ontology
INFO:root:CNhs11377: Merging Information
INFO:root:Processing Sample: 'tpm of Endothelial Cells - Vein, donor3.CNhs12026.11416-118F3'
INFO:root:CNhs12026: Parsing information from sample name
INFO:root:CNhs12026: Searching Ontology
INFO:root:CNhs12026: Merging Information
INFO:root:Processing Sample: 'tpm of Eosinophils, donor1.CNhs12547.12244-129H3'
INFO:root:CNhs12547: Parsing information from sample name
INFO:root:CNhs12547: Searching Ontology
INFO:root:CNhs12547: Merging Information
INFO:root:Processing Sample: 'tpm of Eosinop

INFO:root:Processing Sample: 'tpm of Fibroblast - Gingival, donor10 (periodontitis).CNhs14135.11928-125I2'
INFO:root:CNhs14135: Parsing information from sample name
INFO:root:CNhs14135: Searching Ontology
INFO:root:CNhs14135: Merging Information
INFO:root:Processing Sample: 'tpm of Fibroblast - Gingival, donor2.CNhs11961.11318-117D4'
INFO:root:CNhs11961: Parsing information from sample name
INFO:root:CNhs11961: Searching Ontology
INFO:root:CNhs11961: Merging Information
INFO:root:Processing Sample: 'tpm of Fibroblast - Gingival, donor3.CNhs12006.11394-118C8'
INFO:root:CNhs12006: Parsing information from sample name
INFO:root:CNhs12006: Searching Ontology
INFO:root:CNhs12006: Merging Information
INFO:root:Processing Sample: 'tpm of Fibroblast - Gingival, donor4 (GFH2).CNhs10848.11222-116B7'
INFO:root:CNhs10848: Parsing information from sample name
INFO:root:CNhs10848: Searching Ontology
INFO:root:CNhs10848: Merging Information
INFO:root:Processing Sample: 'tpm of Fibroblast - Gingival, 

INFO:root:CNhs12399: Parsing information from sample name
INFO:root:CNhs12399: Searching Ontology
INFO:root:CNhs12399: Merging Information
INFO:root:Processing Sample: 'tpm of Fibroblast - skin dystrophia myotonica, donor3.CNhs11913.11560-120D3'
INFO:root:CNhs11913: Parsing information from sample name
INFO:root:CNhs11913: Searching Ontology
INFO:root:CNhs11913: Merging Information
INFO:root:Processing Sample: 'tpm of Fibroblast - skin normal, donor1 (nuclear fraction).CNhs12403.14323-155E3'
INFO:root:CNhs12403: Parsing information from sample name
INFO:root:CNhs12403: Searching Ontology
INFO:root:CNhs12403: Merging Information
INFO:root:Processing Sample: 'tpm of Fibroblast - skin normal, donor1.CNhs11351.11553-120C5'
INFO:root:CNhs11351: Parsing information from sample name
INFO:root:CNhs11351: Searching Ontology
INFO:root:CNhs11351: Merging Information
INFO:root:Processing Sample: 'tpm of Fibroblast - skin normal, donor2 (nuclear fraction).CNhs12582.14302-155B9'
INFO:root:CNhs12582:

INFO:root:CNhs12825: Searching Ontology
INFO:root:CNhs12825: Merging Information
INFO:root:Processing Sample: 'tpm of H9 Embryoid body cells, melanocytic induction, day00, biol_rep3 (H9EB-3 d0).CNhs12908.12823-136I6'
INFO:root:CNhs12908: Parsing information from sample name
INFO:root:CNhs12908: Searching Ontology
INFO:root:CNhs12908: Merging Information
INFO:root:Processing Sample: 'tpm of H9 Embryoid body cells, melanocytic induction, day01, biol_rep1 (H9EB-1 d1).CNhs12823.12628-134E9'
INFO:root:CNhs12823: Parsing information from sample name
INFO:root:CNhs12823: Searching Ontology
INFO:root:CNhs12823: Merging Information
INFO:root:Processing Sample: 'tpm of H9 Embryoid body cells, melanocytic induction, day01, biol_rep2 (H9EB-2 d1).CNhs12826.12726-135G8'
INFO:root:CNhs12826: Parsing information from sample name
INFO:root:CNhs12826: Searching Ontology
INFO:root:CNhs12826: Merging Information
INFO:root:Processing Sample: 'tpm of H9 Embryoid body cells, melanocytic induction, day01, bio

INFO:root:CNhs12834: Merging Information
INFO:root:Processing Sample: 'tpm of H9 Embryoid body cells, melanocytic induction, day24, biol_rep3 (H9EB-3 d24).CNhs12916.12832-137A6'
INFO:root:CNhs12916: Parsing information from sample name
INFO:root:CNhs12916: Searching Ontology
INFO:root:CNhs12916: Merging Information
INFO:root:Processing Sample: 'tpm of H9 Embryoid body cells, melanocytic induction, day27, biol_rep1 (H9EB-1 d27).CNhs12902.12637-134F9'
INFO:root:CNhs12902: Parsing information from sample name
INFO:root:CNhs12902: Searching Ontology
INFO:root:CNhs12902: Merging Information
INFO:root:Processing Sample: 'tpm of H9 Embryoid body cells, melanocytic induction, day27, biol_rep2 (H9EB-2 d27).CNhs12835.12735-135H8'
INFO:root:CNhs12835: Parsing information from sample name
INFO:root:CNhs12835: Searching Ontology
INFO:root:CNhs12835: Merging Information
INFO:root:Processing Sample: 'tpm of H9 Embryoid body cells, melanocytic induction, day27, biol_rep3 (H9EB-3 d27).CNhs12917.12833-1

INFO:root:CNhs13656: Merging Information
INFO:root:Processing Sample: 'tpm of HES3-GFP Embryonic Stem cells, cardiomyocytic induction, day04, biol_rep2.CNhs13716.13343-143D4'
INFO:root:CNhs13716: Parsing information from sample name
INFO:root:CNhs13716: Searching Ontology
INFO:root:CNhs13716: Merging Information
INFO:root:Processing Sample: 'tpm of HES3-GFP Embryonic Stem cells, cardiomyocytic induction, day04, biol_rep3.CNhs13728.13355-143E7'
INFO:root:CNhs13728: Parsing information from sample name
INFO:root:CNhs13728: Searching Ontology
INFO:root:CNhs13728: Merging Information
INFO:root:Processing Sample: 'tpm of HES3-GFP Embryonic Stem cells, cardiomyocytic induction, day05, biol_rep1.CNhs13657.13332-143C2'
INFO:root:CNhs13657: Parsing information from sample name
INFO:root:CNhs13657: Searching Ontology
INFO:root:CNhs13657: Merging Information
INFO:root:Processing Sample: 'tpm of HES3-GFP Embryonic Stem cells, cardiomyocytic induction, day05, biol_rep2.CNhs13717.13344-143D5'
INFO:r

INFO:root:CNhs12347: Parsing information from sample name
INFO:root:CNhs12347: Searching Ontology
INFO:root:CNhs12347: Merging Information
INFO:root:Processing Sample: 'tpm of Hep-2 cells mock treated, biol_rep1.CNhs13479.11898-125E8'
INFO:root:CNhs13479: Parsing information from sample name
INFO:root:CNhs13479: Searching Ontology
INFO:root:CNhs13479: Merging Information
INFO:root:Processing Sample: 'tpm of Hep-2 cells mock treated, biol_rep2.CNhs13500.11899-125E9'
INFO:root:CNhs13500: Parsing information from sample name
INFO:root:CNhs13500: Searching Ontology
INFO:root:CNhs13500: Merging Information
INFO:root:Processing Sample: 'tpm of Hep-2 cells mock treated, biol_rep3.CNhs13501.11900-125F1'
INFO:root:CNhs13501: Parsing information from sample name
INFO:root:CNhs13501: Searching Ontology
INFO:root:CNhs13501: Merging Information
INFO:root:Processing Sample: 'tpm of Hep-2 cells treated with Streptococci strain 5448, biol_rep1.CNhs13477.11890-125D9'
INFO:root:CNhs13477: Parsing inform

INFO:root:CNhs12789: Parsing information from sample name
INFO:root:CNhs12789: Searching Ontology
INFO:root:CNhs12789: Merging Information
INFO:root:Processing Sample: 'tpm of K562 erythroblastic leukemia response to hemin, 01hr00min, biol_rep1.CNhs12462.13083-140B5'
INFO:root:CNhs12462: Parsing information from sample name
INFO:root:CNhs12462: Searching Ontology
INFO:root:CNhs12462: Merging Information
INFO:root:Processing Sample: 'tpm of K562 erythroblastic leukemia response to hemin, 01hr00min, biol_rep2.CNhs12689.13149-140I8'
INFO:root:CNhs12689: Parsing information from sample name
INFO:root:CNhs12689: Searching Ontology
INFO:root:CNhs12689: Merging Information
INFO:root:Processing Sample: 'tpm of K562 erythroblastic leukemia response to hemin, 01hr00min, biol_rep3.CNhs12790.13215-141H2'
INFO:root:CNhs12790: Parsing information from sample name
INFO:root:CNhs12790: Searching Ontology
INFO:root:CNhs12790: Merging Information
INFO:root:Processing Sample: 'tpm of K562 erythroblastic 

INFO:root:Processing Sample: 'tpm of K562 erythroblastic leukemia response to hemin, 24hr, biol_rep1.CNhs12471.13093-140C6'
INFO:root:CNhs12471: Parsing information from sample name
INFO:root:CNhs12471: Searching Ontology
INFO:root:CNhs12471: Merging Information
INFO:root:Processing Sample: 'tpm of K562 erythroblastic leukemia response to hemin, 24hr, biol_rep2.CNhs12699.13159-141A9'
INFO:root:CNhs12699: Parsing information from sample name
INFO:root:CNhs12699: Searching Ontology
INFO:root:CNhs12699: Merging Information
INFO:root:Processing Sample: 'tpm of K562 erythroblastic leukemia response to hemin, 24hr, biol_rep3.CNhs12801.13225-141I3'
INFO:root:CNhs12801: Parsing information from sample name
INFO:root:CNhs12801: Searching Ontology
INFO:root:CNhs12801: Merging Information
INFO:root:Processing Sample: 'tpm of K562 erythroblastic leukemia response to hemin, day02, biol_rep1.CNhs12472.13094-140C7'
INFO:root:CNhs12472: Parsing information from sample name
INFO:root:CNhs12472: Searchi

INFO:root:CNhs13102: Parsing information from sample name
INFO:root:CNhs13102: Searching Ontology
INFO:root:CNhs13102: Merging Information
INFO:root:Processing Sample: 'tpm of Lymphatic Endothelial cells response to VEGFC, 00hr45min, biol_rep2 (MM XIV - 4).CNhs13160.12385-131E9'
INFO:root:CNhs13160: Parsing information from sample name
INFO:root:CNhs13160: Searching Ontology
INFO:root:CNhs13160: Merging Information
INFO:root:Processing Sample: 'tpm of Lymphatic Endothelial cells response to VEGFC, 00hr45min, biol_rep3 (MM XXII - 4).CNhs13279.12507-133A5'
INFO:root:CNhs13279: Parsing information from sample name
INFO:root:CNhs13279: Searching Ontology
INFO:root:CNhs13279: Merging Information
INFO:root:Processing Sample: 'tpm of Lymphatic Endothelial cells response to VEGFC, 01hr00min, biol_rep1 (MM XIX - 5).CNhs13103.12264-130A5'
INFO:root:CNhs13103: Parsing information from sample name
INFO:root:CNhs13103: Searching Ontology
INFO:root:CNhs13103: Merging Information
INFO:root:Processing

INFO:root:CNhs13503: Searching Ontology
INFO:root:CNhs13503: Merging Information
INFO:root:Processing Sample: 'tpm of acute myeloid leukemia (FAB M4) cell line:HNT-34.CNhs13504.10831-111D3'
INFO:root:CNhs13504: Parsing information from sample name
INFO:root:CNhs13504: Searching Ontology
INFO:root:CNhs13504: Merging Information
INFO:root:Processing Sample: 'tpm of acute myeloid leukemia (FAB M4eo) cell line:EoL-1.CNhs13056.10832-111D4'
INFO:root:CNhs13056: Parsing information from sample name
INFO:root:CNhs13056: Searching Ontology
INFO:root:CNhs13056: Merging Information
INFO:root:Processing Sample: 'tpm of acute myeloid leukemia (FAB M4eo) cell line:EoL-3.CNhs13057.10833-111D5'
INFO:root:CNhs13057: Parsing information from sample name
INFO:root:CNhs13057: Searching Ontology
INFO:root:CNhs13057: Merging Information
INFO:root:Processing Sample: 'tpm of acute myeloid leukemia (FAB M5) cell line:NOMO-1.CNhs13050.10764-110E8'
INFO:root:CNhs13050: Parsing information from sample name
INFO:r

INFO:root:CNhs11889: Parsing information from sample name
INFO:root:CNhs11889: Searching Ontology
INFO:root:CNhs11889: Merging Information
INFO:root:Processing Sample: 'tpm of aorta, adult, pool1.CNhs11760.10052-101G7'
INFO:root:CNhs11760: Parsing information from sample name
INFO:root:CNhs11760: Searching Ontology
INFO:root:CNhs11760: Merging Information
INFO:root:Processing Sample: 'tpm of appendix, adult.CNhs12842.10189-103D9'
INFO:root:CNhs12842: Parsing information from sample name
INFO:root:CNhs12842: Searching Ontology
INFO:root:CNhs12842: Merging Information
INFO:root:Processing Sample: 'tpm of argyrophil small cell carcinoma cell line:TC-YIK.CNhs11725.10589-108D4'
INFO:root:CNhs11725: Parsing information from sample name
INFO:root:CNhs11725: Searching Ontology
INFO:root:CNhs11725: Merging Information
INFO:root:Processing Sample: 'tpm of artery, adult.CNhs12843.10190-103E1'
INFO:root:CNhs12843: Parsing information from sample name
INFO:root:CNhs12843: Searching Ontology
INFO:ro

INFO:root:CNhs12323: Merging Information
INFO:root:Processing Sample: 'tpm of cerebellum, adult, pool1.CNhs11795.10083-102B2'
INFO:root:CNhs11795: Parsing information from sample name
INFO:root:CNhs11795: Searching Ontology
INFO:root:CNhs11795: Merging Information
INFO:root:Processing Sample: 'tpm of cerebellum, newborn, donor10223.CNhs14075.10357-105E6'
INFO:root:CNhs14075: Parsing information from sample name
INFO:root:CNhs14075: Searching Ontology
INFO:root:CNhs14075: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of cerebral meninges, adult.CNhs12840.10188-103D8'
INFO:root:CNhs12840: Parsing information from sample name
INFO:root:CNhs12840: Searching Ontology
INFO:root:CNhs12840: Merging Information
INFO:root:Processing Sample: 'tpm of cerebrospinal fluid, donor2.CNhs13437.10294-104G6'
INFO:root:CNhs13437: Parsing information from sample name
INFO:root:CNhs13437: Searching Ontology
INFO:root:CNhs13437: Merging Information
INFO:root:

INFO:root:CNhs11050: Merging Information
INFO:root:Processing Sample: 'tpm of cord blood derived cell line:COBL-a untreated.CNhs11045.10449-106F8'
INFO:root:CNhs11045: Parsing information from sample name
INFO:root:CNhs11045: Searching Ontology
INFO:root:CNhs11045: Merging Information
INFO:root:Processing Sample: 'tpm of corpus callosum, adult, pool1.CNhs10649.10042-101F6'
INFO:root:CNhs10649: Parsing information from sample name
INFO:root:CNhs10649: Searching Ontology
INFO:root:CNhs10649: Merging Information
INFO:root:Processing Sample: 'tpm of cruciate ligament, donor2.CNhs13439.10295-104G7'
INFO:root:CNhs13439: Parsing information from sample name
INFO:root:CNhs13439: Searching Ontology
INFO:root:CNhs13439: Merging Information
INFO:root:Processing Sample: 'tpm of diaphragm, fetal, donor1.CNhs11779.10069-101I6'
INFO:root:CNhs11779: Parsing information from sample name
INFO:root:CNhs11779: Searching Ontology
INFO:root:CNhs11779: Merging Information
INFO:root:Processing Sample: 'tpm of

INFO:root:CNhs13441: Parsing information from sample name
INFO:root:CNhs13441: Searching Ontology
INFO:root:CNhs13441: Merging Information
INFO:root:Processing Sample: 'tpm of eye - vitreous humor, donor1.CNhs13440.10268-104D7'
INFO:root:CNhs13440: Parsing information from sample name
INFO:root:CNhs13440: Searching Ontology
INFO:root:CNhs13440: Merging Information
INFO:root:Processing Sample: 'tpm of eye, fetal, donor1.CNhs11762.10054-101G9'
INFO:root:CNhs11762: Parsing information from sample name
INFO:root:CNhs11762: Searching Ontology
INFO:root:CNhs11762: Merging Information
INFO:root:Processing Sample: 'tpm of fibrosarcoma cell line:HT-1080.CNhs11860.10758-110E2'
INFO:root:CNhs11860: Parsing information from sample name
INFO:root:CNhs11860: Searching Ontology
INFO:root:CNhs11860: Merging Information
INFO:root:Processing Sample: 'tpm of fibrous histiocytoma cell line:GCT TIB-223.CNhs11842.10711-109H9'
INFO:root:CNhs11842: Parsing information from sample name
INFO:root:CNhs11842: Sea

INFO:root:CNhs14219: Searching Ontology
INFO:root:CNhs14219: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of hIPS, biol_rep1.CNhs14214.14380-156B6'
INFO:root:CNhs14214: Parsing information from sample name
INFO:root:CNhs14214: Searching Ontology
INFO:root:CNhs14214: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of hIPS, biol_rep2.CNhs14215.14381-156B7'
INFO:root:CNhs14215: Parsing information from sample name
INFO:root:CNhs14215: Searching Ontology
INFO:root:CNhs14215: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of hIPS, biol_rep3.CNhs14216.14382-156B8'
INFO:root:CNhs14216: Parsing information from sample name
INFO:root:CNhs14216: Searching Ontology
INFO:root:CNhs14216: Merging Information
DEBUG:r

DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of iPS differentiation to neuron, control donor C11-CRL2429, day12, rep2.CNhs13824.13427-144D7'
INFO:root:CNhs13824: Parsing information from sample name
INFO:root:CNhs13824: Searching Ontology
INFO:root:CNhs13824: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of iPS differentiation to neuron, control donor C11-CRL2429, day12, rep3.CNhs14051.13431-144E2'
INFO:root:CNhs14051: Parsing information from sample name
INFO:root:CNhs14051: Searching Ontology
INFO:root:CNhs14051: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of iPS differentiation to neuron, control donor C11-CRL2429, day18, rep1.CNhs13916.13424-144D4'
INFO:root:CNhs13916: Parsing information from sample name
INFO:ro

INFO:root:CNhs14057: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of iPS differentiation to neuron, down-syndrome donor C11-CCL54, day12, rep1.CNhs13832.13447-144F9'
INFO:root:CNhs13832: Parsing information from sample name
INFO:root:CNhs13832: Searching Ontology
INFO:root:CNhs13832: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of iPS differentiation to neuron, down-syndrome donor C11-CCL54, day12, rep2.CNhs13845.13451-144G4'
INFO:root:CNhs13845: Parsing information from sample name
INFO:root:CNhs13845: Searching Ontology
INFO:root:CNhs13845: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:sample_type: missing in ontology. 
INFO:root:Processing Sample: 'tpm of iPS differentiation to neuron, down-syndrome donor C11-CCL54, day12, rep3.CNhs14058.13455-144G8'
INFO:root:CN

INFO:root:CNhs11277: Merging Information
INFO:root:Processing Sample: 'tpm of large cell lung carcinoma cell line:NCI-H460.CNhs12806.10839-111E2'
INFO:root:CNhs12806: Parsing information from sample name
INFO:root:CNhs12806: Searching Ontology
INFO:root:CNhs12806: Merging Information
INFO:root:Processing Sample: 'tpm of large cell non-keratinizing squamous carcinoma cell line:SKG-II-SF.CNhs11825.10692-109F8'
INFO:root:CNhs11825: Parsing information from sample name
INFO:root:CNhs11825: Searching Ontology
INFO:root:CNhs11825: Merging Information
INFO:root:Processing Sample: 'tpm of left atrium, adult, donor1.CNhs11790.10079-102A7'
INFO:root:CNhs11790: Parsing information from sample name
INFO:root:CNhs11790: Searching Ontology
INFO:root:CNhs11790: Merging Information
INFO:root:Processing Sample: 'tpm of left ventricle, adult, donor1.CNhs11789.10078-102A6'
INFO:root:CNhs11789: Parsing information from sample name
INFO:root:CNhs11789: Searching Ontology
INFO:root:CNhs11789: Merging Inform

INFO:root:CNhs13796: Parsing information from sample name
INFO:root:CNhs13796: Searching Ontology
INFO:root:CNhs13796: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of medial frontal gyrus, adult, donor10252.CNhs12310.10150-102I6'
INFO:root:CNhs12310: Parsing information from sample name
INFO:root:CNhs12310: Searching Ontology
INFO:root:CNhs12310: Merging Information
INFO:root:Processing Sample: 'tpm of medial frontal gyrus, adult, donor10258.CNhs14221.10368-105F8'
INFO:root:CNhs14221: Parsing information from sample name
INFO:root:CNhs14221: Searching Ontology
INFO:root:CNhs14221: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of medial frontal gyrus, newborn, donor10223.CNhs14069.10352-105E1'
INFO:root:CNhs14069: Parsing information from sample name
INFO:root:CNhs14069: Searching Ontology
INFO:root:CNhs14069: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
INFO:root:Pro

INFO:root:CNhs12376: Parsing information from sample name
INFO:root:CNhs12376: Searching Ontology
INFO:root:CNhs12376: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal precursor cell - ovarian cancer left ovary, donor4.CNhs13094.11836-124G9'
INFO:root:CNhs13094: Parsing information from sample name
INFO:root:CNhs13094: Searching Ontology
INFO:root:CNhs13094: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal precursor cell - ovarian cancer metastasis, donor1.CNhs12374.11758-123H3'
INFO:root:CNhs12374: Parsing information from sample name
INFO:root:CNhs12374: Searching Ontology
INFO:root:CNhs12374: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal precursor cell - ovarian cancer metastasis, donor2.CNhs13093.11835-124G8'
INFO:root:CNhs13093: Parsing information from sample name
INFO:root:CNhs13093: Searching Ontology
INFO:root:CNhs13093: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal precursor cell - ovarian c

INFO:root:CNhs13600: Parsing information from sample name
INFO:root:CNhs13600: Searching Ontology
INFO:root:CNhs13600: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal stem cells (adipose derived), adipogenic induction, 01hr40min, biol_rep2.CNhs13601.13248-142B8'
INFO:root:CNhs13601: Parsing information from sample name
INFO:root:CNhs13601: Searching Ontology
INFO:root:CNhs13601: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal stem cells (adipose derived), adipogenic induction, 01hr40min, biol_rep3.CNhs13602.13249-142B9'
INFO:root:CNhs13602: Parsing information from sample name
INFO:root:CNhs13602: Searching Ontology
INFO:root:CNhs13602: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal stem cells (adipose derived), adipogenic induction, 02hr00min, biol_rep1.CNhs13603.13250-142C1'
INFO:root:CNhs13603: Parsing information from sample name
INFO:root:CNhs13603: Searching Ontology
INFO:root:CNhs13603: Merging Information
INFO:root

INFO:root:CNhs13630: Searching Ontology
INFO:root:CNhs13630: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal stem cells (adipose derived), adipogenic induction, day14, biol_rep1.CNhs13338.13277-142F1'
INFO:root:CNhs13338: Parsing information from sample name
INFO:root:CNhs13338: Searching Ontology
INFO:root:CNhs13338: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal stem cells (adipose derived), adipogenic induction, day14, biol_rep2.CNhs13631.13278-142F2'
INFO:root:CNhs13631: Parsing information from sample name
INFO:root:CNhs13631: Searching Ontology
INFO:root:CNhs13631: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal stem cells (adipose derived), adipogenic induction, day14, biol_rep3.CNhs13632.13279-142F3'
INFO:root:CNhs13632: Parsing information from sample name
INFO:root:CNhs13632: Searching Ontology
INFO:root:CNhs13632: Merging Information
INFO:root:Processing Sample: 'tpm of mesenchymal stem cells (adipose derived), 

INFO:root:CNhs11729: Parsing information from sample name
INFO:root:CNhs11729: Searching Ontology
INFO:root:CNhs11729: Merging Information
INFO:root:Processing Sample: 'tpm of myxofibrosarcoma cell line:NMFH-1.CNhs11821.10684-109E9'
INFO:root:CNhs11821: Parsing information from sample name
INFO:root:CNhs11821: Searching Ontology
INFO:root:CNhs11821: Merging Information
INFO:root:Processing Sample: 'tpm of nasal epithelial cells, donor1, tech_rep1.CNhs12589.12226-129F3'
INFO:root:CNhs12589: Parsing information from sample name
INFO:root:CNhs12589: Searching Ontology
INFO:root:CNhs12589: Merging Information
DEBUG:root:tech_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of nasal epithelial cells, donor1, tech_rep2.CNhs12554.12226-129F3'
INFO:root:CNhs12554: Parsing information from sample name
INFO:root:CNhs12554: Searching Ontology
INFO:root:CNhs12554: Merging Information
DEBUG:root:tech_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of nasal epithelial cells,

INFO:root:CNhs11835: Parsing information from sample name
INFO:root:CNhs11835: Searching Ontology
INFO:root:CNhs11835: Merging Information
INFO:root:Processing Sample: 'tpm of osteosarcoma cell line:143B/TK^(-)neo^(R).CNhs11279.10510-107D6'
INFO:root:CNhs11279: Parsing information from sample name
INFO:root:CNhs11279: Searching Ontology
INFO:root:CNhs11279: Merging Information
INFO:root:Processing Sample: 'tpm of osteosarcoma cell line:HS-Os-1.CNhs11290.10558-107I9'
INFO:root:CNhs11290: Parsing information from sample name
INFO:root:CNhs11290: Searching Ontology
INFO:root:CNhs11290: Merging Information
INFO:root:Processing Sample: 'tpm of ovary, adult, pool1.CNhs10626.10020-101D2'
INFO:root:CNhs10626: Parsing information from sample name
INFO:root:CNhs10626: Searching Ontology
INFO:root:CNhs10626: Merging Information
INFO:root:Processing Sample: 'tpm of pagetoid sarcoma cell line:Hs 925.T.CNhs11856.10732-110B3'
INFO:root:CNhs11856: Parsing information from sample name
INFO:root:CNhs118

INFO:root:CNhs12529: Merging Information
INFO:root:Processing Sample: 'tpm of prostate cancer cell line:DU145.CNhs11260.10490-107B4'
INFO:root:CNhs11260: Parsing information from sample name
INFO:root:CNhs11260: Searching Ontology
INFO:root:CNhs11260: Merging Information
INFO:root:Processing Sample: 'tpm of prostate cancer cell line:PC-3.CNhs11243.10439-106E7'
INFO:root:CNhs11243: Parsing information from sample name
INFO:root:CNhs11243: Searching Ontology
INFO:root:CNhs11243: Merging Information
INFO:root:Processing Sample: 'tpm of prostate, adult, pool1.CNhs10628.10022-101D4'
INFO:root:CNhs10628: Parsing information from sample name
INFO:root:CNhs10628: Searching Ontology
INFO:root:CNhs10628: Merging Information
INFO:root:Processing Sample: 'tpm of putamen, adult, donor10196.CNhs12324.10176-103C5'
INFO:root:CNhs12324: Parsing information from sample name
INFO:root:CNhs12324: Searching Ontology
INFO:root:CNhs12324: Merging Information
INFO:root:Processing Sample: 'tpm of putamen, adul

INFO:root:Processing Sample: 'tpm of skin, fetal, donor1.CNhs11774.10065-101I2'
INFO:root:CNhs11774: Parsing information from sample name
INFO:root:CNhs11774: Searching Ontology
INFO:root:CNhs11774: Merging Information
INFO:root:Processing Sample: 'tpm of small cell cervical cancer cell line:HCSC-1.CNhs11885.10800-110I8'
INFO:root:CNhs11885: Parsing information from sample name
INFO:root:CNhs11885: Searching Ontology
INFO:root:CNhs11885: Merging Information
INFO:root:Processing Sample: 'tpm of small cell gastrointestinal carcinoma cell line:ECC10.CNhs11736.10610-108F7'
INFO:root:CNhs11736: Parsing information from sample name
INFO:root:CNhs11736: Searching Ontology
INFO:root:CNhs11736: Merging Information
INFO:root:Processing Sample: 'tpm of small cell lung carcinoma cell line:DMS 144.CNhs12808.10841-111E4'
INFO:root:CNhs12808: Parsing information from sample name
INFO:root:CNhs12808: Searching Ontology
INFO:root:CNhs12808: Merging Information
INFO:root:Processing Sample: 'tpm of small

INFO:root:Processing Sample: 'tpm of synovial sarcoma cell line:HS-SY-II.CNhs11244.10441-106E9'
INFO:root:CNhs11244: Parsing information from sample name
INFO:root:CNhs11244: Searching Ontology
INFO:root:CNhs11244: Merging Information
INFO:root:Processing Sample: 'tpm of temporal lobe, adult, pool1.CNhs10637.10031-101E4'
INFO:root:CNhs10637: Parsing information from sample name
INFO:root:CNhs10637: Searching Ontology
INFO:root:CNhs10637: Merging Information
INFO:root:Processing Sample: 'tpm of temporal lobe, fetal, donor1, tech_rep1.CNhs11772.10063-101H9'
INFO:root:CNhs11772: Parsing information from sample name
INFO:root:CNhs11772: Searching Ontology
INFO:root:CNhs11772: Merging Information
DEBUG:root:biol_rep: missing in ontology. 
DEBUG:root:tech_rep: missing in ontology. 
INFO:root:Processing Sample: 'tpm of temporal lobe, fetal, donor1, tech_rep2.CNhs12996.10063-101H9'
INFO:root:CNhs12996: Parsing information from sample name
INFO:root:CNhs12996: Searching Ontology
INFO:root:CNhs1

INFO:root:CNhs10654: Searching Ontology
INFO:root:CNhs10654: Merging Information
INFO:root:Processing Sample: 'tpm of trachea, adult, pool1.CNhs10635.10029-101E2'
INFO:root:CNhs10635: Parsing information from sample name
INFO:root:CNhs10635: Searching Ontology
INFO:root:CNhs10635: Merging Information
INFO:root:Processing Sample: 'tpm of trachea, fetal, donor1.CNhs11766.10058-101H4'
INFO:root:CNhs11766: Parsing information from sample name
INFO:root:CNhs11766: Searching Ontology
INFO:root:CNhs11766: Merging Information
INFO:root:Processing Sample: 'tpm of transitional cell carcinoma cell line:Hs 769.T.CNhs11837.10707-109H5'
INFO:root:CNhs11837: Parsing information from sample name
INFO:root:CNhs11837: Searching Ontology
INFO:root:CNhs11837: Merging Information
INFO:root:Processing Sample: 'tpm of transitional-cell carcinoma cell line:5637.CNhs10735.10418-106C4'
INFO:root:CNhs10735: Parsing information from sample name
INFO:root:CNhs10735: Searching Ontology
INFO:root:CNhs10735: Merging 

The log output indicates, that there are inconsistencies between the supplementary information and the ontology when it comes to the sample type. Moreover, the ontology specifies more than one sample type for some samples. 

In [25]:
annotations_df = pd.DataFrame(annotations)
annot_notes_df = pd.DataFrame(annot_notes)

In [26]:
annotations_df.tail()

Unnamed: 0,biol_rep,donor,lib_id,name,name_orig,obo_id,sample_type,tech_rep,time
1824,True,,CNhs11676,"uterus, adult, pool1","tpm of uterus, adult, pool1.CNhs1167...",FF:10100-102D1,tissue,False,
1825,True,donor1,CNhs11763,"uterus, fetal, donor1","tpm of uterus, fetal, donor1.CNhs117...",FF:10055-101H1,tissue,False,
1826,True,,CNhs12854,"vagina, adult, rep1","tpm of vagina, adult.CNhs12854.10204...",FF:10204-103F6,tissue,False,
1827,True,,CNhs12844,"vein, adult, rep1","tpm of vein, adult.CNhs12844.10191-1...",FF:10191-103E2,tissue,False,
1828,True,,CNhs11813,xeroderma pigentosum b cell line:XPL...,tpm of xeroderma pigentosum b cell l...,FF:10563-108A5,cell line,False,


In [27]:
annot_notes_df.tail()

Unnamed: 0,field_name,lib_id,new_value,obo_id
542,biol_rep,CNhs14223,donor10258,FF:10370-105G1
543,tech_rep,CNhs14223,tech_rep1,FF:10370-105G1
544,biol_rep,CNhs14551,donor10258,FF:10370-105G1
545,tech_rep,CNhs14551,tech_rep2,FF:10370-105G1
546,biol_rep,CNhs14084,donor10223,FF:10366-105F6


The number of informations we ammended to the ontology: 

In [28]:
len(annot_notes_df[annot_notes_df.field_name == "biol_rep"])

278

### Write everything to a csv file

In [29]:
annotations_df.to_csv("../data/column_vars.processed.csv")
annot_notes_df.to_csv("../data/annotation_notes.csv")