This notebook is looking at metadata from SRA. I am wanting to develop a programmatic approach to download metadata information, but needs to be consistent with Zhenxia and Miegs analysis. I am also wanting to figure exactly how Zhenxia and the Miegs did this.

There are two programmatic approaches:
* SRAdb
* Biopython

In [1]:
import pandas as pd
pd.set_option('display.max_columns', 999)

# SRAdb

SRAdb is nice because it parses everything into a table, but the counts don't match with my expectations. 

In [2]:
import rpy2
%load_ext rpy2.ipython

## Download SQLite database

In [None]:
%%bash
cd ../../output/
if [ ! -e SRAmetadb.sqlite ]; then 
    wget 'http://dl.dropbox.com/u/51653511/SRAmetadb.sqlite.gz'
    gunzip SRAmetadb.sqlite.gz
fi

## Query database

In [4]:
%%R 
library(SRAdb)
sra_con = dbConnect(SQLite(), '../../output/SRAmetadb.sqlite')
rs = getSRA(search_terms='"Drosophila melanogaster"[Organism]', out_types=c('run', 'sample'), sra_con=sra_con)

## Print number of rows

In [7]:
%R dim(rs)

array([10356,    21], dtype=int32)

# Biopyton

Biopython's counts match what I expect, but the results are in raw XML and need a parser. 

In [4]:
from Bio import Entrez
Entrez.email = 'justin.fear@nih.gov'

## Get List of Records

In [5]:
handle = Entrez.esearch(db='sra', term='"Drosophila melanogaster"[Orgn]')
records = Entrez.read(handle)
records['Count']

'22344'

## Get Full XML of Subset

In [11]:
h = Entrez.efetch(db='sra', id=records['IdList'][:1])
r = h.read()

print(r)

<?xml version="1.0" ?>
<EXPERIMENT_PACKAGE_SET>
<EXPERIMENT_PACKAGE><EXPERIMENT alias="E-MTAB-1976:C1" accession="ERX330955" broker_name="ArrayExpress"><IDENTIFIERS><PRIMARY_ID>ERX330955</PRIMARY_ID><SUBMITTER_ID namespace="UNIVERSITY OF MANCHESTER">E-MTAB-1976:C1</SUBMITTER_ID><SUBMITTER_ID namespace="University of Manchester">E-MTAB-1976:C1</SUBMITTER_ID></IDENTIFIERS><TITLE>AB SOLiD 4 System sequencing; Deignan</TITLE><STUDY_REF refname="E-MTAB-1976" refcenter="University of Manchester" accession="ERP004143"><IDENTIFIERS><PRIMARY_ID>ERP004143</PRIMARY_ID><SUBMITTER_ID namespace="University of Manchester">E-MTAB-1976</SUBMITTER_ID></IDENTIFIERS></STUDY_REF><DESIGN><DESIGN_DESCRIPTION>Deignan</DESIGN_DESCRIPTION><SAMPLE_DESCRIPTOR refname="E-MTAB-1976:C1" accession="ERS362580" refcenter="University of Manchester"><IDENTIFIERS><PRIMARY_ID>ERS362580</PRIMARY_ID><EXTERNAL_ID namespace="BioSample">SAMEA2225913</EXTERNAL_ID><SUBMITTER_ID namespace="University of Manchester">E-MTAB-1976:C1<

# SRA Website

The gold standard is downloading results directly from SRA's website. This is what Zhenxia and the Miegs did. I still want to verify how they made their tables.

In [6]:
# see ../../data/SRA/README for info about where this file came from
dfSRA = pd.read_csv('../../data/SRA/SraRunTable.txt', sep='\t')
dfSRA.set_index('Run_s', inplace=True)
print(dfSRA.shape)
dfSRA.head()


(13993, 29)


Unnamed: 0_level_0,Assay_Type_s,AssemblyName_s,BioProject_s,BioSample_s,Center_Name_s,Experiment_s,InsertSize_l,LibraryLayout_s,LibrarySelection_s,LibrarySource_s,Library_Name_s,LoadDate_s,MBases_l,MBytes_l,Platform_s,ReleaseDate_s,SRA_Sample_s,SRA_Study_s,Sample_Name_s,Sex_s,genotype_s,source_name_s,strain_s,tissue_s,Consent_s,Organism_s,g1k_analysis_group_s,g1k_pop_code_s,source_s
Run_s,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1
SRR000909,FL-cDNA,<not provided>,PRJNA74619,SAMN00000114,WUGSC,SRX000185,0,SINGLE,cDNA,TRANSCRIPTOMIC,2305754913,2012-01-19,0,0,LS454,2008-04-09,SRS000346,SRP000119,SRA000293_sample,<not provided>,<not provided>,<not provided>,<not provided>,<not provided>,public,Drosophila melanogaster,<not provided>,<not provided>,<not provided>
SRR001337,EST,<not provided>,PRJNA107113,SAMN00000196,GEO,SRX000316,0,SINGLE,RANDOM,TRANSCRIPTOMIC,WT_females_beta-eliminated,2014-05-25,54,362,ILLUMINA,2008-06-27,SRS000430,SRP000181,GSM278573,<not provided>,<not provided>,<not provided>,<not provided>,<not provided>,public,Drosophila melanogaster,<not provided>,<not provided>,<not provided>
SRR001338,EST,<not provided>,PRJNA107113,SAMN00000208,GEO,SRX000328,0,SINGLE,RANDOM,TRANSCRIPTOMIC,IR_non-beta-eliminated,2014-05-25,115,796,ILLUMINA,2008-06-27,SRS000442,SRP000181,GSM278707,<not provided>,<not provided>,<not provided>,<not provided>,<not provided>,public,Drosophila melanogaster,<not provided>,<not provided>,<not provided>
SRR001339,EST,<not provided>,PRJNA107113,SAMN00000207,GEO,SRX000327,0,SINGLE,RANDOM,TRANSCRIPTOMIC,WT_females_non-beta-eliminated,2014-05-25,144,1008,ILLUMINA,2008-06-27,SRS000441,SRP000181,GSM278706,<not provided>,<not provided>,<not provided>,<not provided>,<not provided>,public,Drosophila melanogaster,<not provided>,<not provided>,<not provided>
SRR001340,EST,<not provided>,PRJNA107113,SAMN00000198,GEO,SRX000318,0,SINGLE,RANDOM,TRANSCRIPTOMIC,IR_beta-eliminated,2014-05-25,63,425,ILLUMINA,2008-06-27,SRS000432,SRP000181,GSM278575,<not provided>,<not provided>,<not provided>,<not provided>,<not provided>,public,Drosophila melanogaster,<not provided>,<not provided>,<not provided>


In [14]:
dfSRA.columns

Index(['Assay_Type_s', 'AssemblyName_s', 'BioProject_s', 'BioSample_s',
       'Center_Name_s', 'Experiment_s', 'InsertSize_l', 'LibraryLayout_s',
       'LibrarySelection_s', 'LibrarySource_s', 'Library_Name_s', 'LoadDate_s',
       'MBases_l', 'MBytes_l', 'Platform_s', 'ReleaseDate_s', 'Run_s',
       'SRA_Sample_s', 'SRA_Study_s', 'Sample_Name_s', 'Sex_s', 'genotype_s',
       'source_name_s', 'strain_s', 'tissue_s', 'Consent_s', 'Organism_s',
       'g1k_analysis_group_s', 'g1k_pop_code_s', 'source_s'],
      dtype='object')

In [19]:
# List of tissue types, note that there is some cleaning that needs done
dfSRA.tissue_s.unique()

array(['<not provided>', 'whole organism', 'heads', 'head', 'testis',
       'ovary', 'whole fly', 'testes', 'ovaries', 'mixed-stage embryos',
       'female heads', 'embryo-derived cell-line', 'Adult ovaries',
       'Male body', 'CNS-derived cell-line', 'dorsal mesothoracic disc',
       'antenna disc-derived cell-line', 'ventral prothoracic disc',
       'Female heads', 'fat body', 'midgut', 'Ovary', 'Whole flies',
       'brain', 'whole flies', 'fly', 'Immortalized cells',
       'Adult fly heads', 'salivary gland', 'male whole body',
       'female whole body', 'central nervous system', 'whole brains',
       'mushroom bodies', 'Head', 'Digestive System', 'Accessory Gland',
       'Carcass', 'Imaginal Disc', 'Central Nervous System', 'Fat',
       'Ovaries', 'Testes', 'Salivary Gland', 'neuron', 'muscle',
       'whole-embryo', 'eggs', 'whole body without testis', 'whole body',
       'Brain', 'dissected testis', 'ovary stem cells',
       'embryonic/larval hemocyte', 'Heads', 'Wi

# Zhen-Xia's List

In [20]:
# Zhenxia gave me this dataset, it is names `.xls` but it is really a `tsv`.
dfZ = pd.read_table('../../data/zhenxia/metadata_quality_control.xls')
dfZ.set_index('SRR', inplace=True)
print(dfZ.shape)
dfZ.head()

(12719, 57)


Unnamed: 0_level_0,BioSample,ReleaseDate,LoadDate,avgLength,size_MB,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,Platform,Model,SRAStudy,BioProject,Sample,SampleName,CenterName,Submission,sex,dev_stage,tissue,genotype,cell_type,ExperimentTitle,StudyTitle,sample_type,Total_reads,Unmapped_reads,non.unique,unique,Non.splice_reads,Splice_reads,ambiguous,no_feature,reads_mapped2genes,reads_mapped2intergenic,reads_mapped2ERCC,mRNA,ncRNA,pseudogene,rRNA,snoRNA,snRNA,tRNA,exp_genes,layout,failed,F,R,TIN_median,median_coverage_1.20,median_coverage_21.40,median_coverage_41.60,median_coverage_61.80,median_coverage_81.100,correlation
SRR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1
DRR001177,SAMD00016337,2011-12-11,2014-05-31,36,832,DRX000774,SRig10098,OTHER,size fractionation,OTHER,SINGLE,ILLUMINA,Illumina Genome Analyzer IIx,DRP000426,PRJDB2780,DRS000734,SAMD00016337,RIKEN_OSC,DRA000418,male,embryo,whole_organism,AGO2- and negative control-IP short RNA-seq in...,S2,SRig10098,Chromatin associated RNAi components take part...,cell_line,38551396,24102998,11711140,2737258,2737191,67,63300,542516,1687592,443596,254,432466,1182063,54793,5219,9665,165,3221,13691,SINGLE,0.1378,0.4379,0.4243,6.25,0.080266,0.079628,0.216993,0.341371,0.604689,0.693731
DRR001178,SAMD00016336,2011-12-11,2014-05-31,36,791,DRX000775,SRig10099,OTHER,size fractionation,OTHER,SINGLE,ILLUMINA,Illumina Genome Analyzer IIx,DRP000426,PRJDB2780,DRS000735,SAMD00016336,RIKEN_OSC,DRA000418,male,embryo,whole_organism,AGO2- and negative control-IP short RNA-seq in...,S2,SRig10099,Chromatin associated RNAi components take part...,cell_line,36613554,24010803,9923336,2679415,2679391,24,98459,515561,1643139,422006,250,657744,889254,56460,4831,30900,509,3441,13634,SINGLE,0.1688,0.4294,0.4018,7.945317,0.064579,0.08103,0.143947,0.199425,0.726678,0.693217
DRR014222,SAMD00012584,2015-03-23,2015-03-23,101,10621,DRX012753,PCR2085,EST,cDNA,TRANSCRIPTOMIC,SINGLE,ILLUMINA,Illumina HiSeq 2500,DRP001250,PRJDB2557,DRS012553,DRS012553,ASGRC-WM,DRA001188,unknown,adult,whole_organism,Clontech Catalog number: 636222 Lot number: 10...,mix,SMART method,mRNA 5 end sequencing,whole_organism,164160619,66074703,3403887,94682029,73561370,21120659,8827565,279985,85422641,151838,0,64520581,3803807,14974,17058229,9806,13724,1520,15046,SINGLE,0.0103,0.9856,0.0041,33.202892,0.513654,0.034357,0.005381,0.068175,0.212674,0.79812
DRR014223,SAMD00012584,2015-03-23,2015-03-23,101,10314,DRX012754,PCR2072,EST,cDNA,TRANSCRIPTOMIC,SINGLE,ILLUMINA,Illumina HiSeq 2500,DRP001250,PRJDB2557,DRS012553,DRS012553,ASGRC-WM,DRA001188,unknown,adult,whole_organism,Clontech Catalog number: 636222 Lot number: 10...,mix,Ligation method,mRNA 5 end sequencing,whole_organism,151593644,31527578,10224830,109841236,38175966,71665270,6497407,181333,103064996,97500,0,102778950,252968,5790,15696,1402,10024,166,14014,SINGLE,0.1035,0.8922,0.0043,17.325735,0.66361,0.096432,0.011929,0.004769,0.032353,0.657597
DRR014224,SAMD00012584,2015-03-23,2015-03-23,126,12218,DRX012755,PCR2123,EST,cDNA,TRANSCRIPTOMIC,SINGLE,ILLUMINA,Illumina HiSeq 2500,DRP001250,PRJDB2557,DRS012553,DRS012553,ASGRC-WM,DRA001188,unknown,adult,whole_organism,Clontech Catalog number: 636222 Lot number: 10...,mix,CapSMART and Non-CapSMART method,mRNA 5 end sequencing,whole_organism,150847202,63307153,3283707,84256342,61657329,22599013,7208452,163524,76794434,89932,0,50509969,9533770,8403,16719380,6275,16301,336,14664,SINGLE,0.0066,0.9926,0.0009,34.692736,0.64522,0.08864,0.004934,0.0959,0.30974,0.785452


# Jean and Danielle's

In [23]:
# I got this table from Jean and Danielle
dfj = pd.read_table('../../data/jean/Mapping_QC/dd_any.Data_Summary.runs.txt', skiprows=2, low_memory=False, index_col=0)
print(dfj.shape)
dfj.head()

(9368, 291)


Unnamed: 0_level_0,Unnamed: 1,Genes significantly expressed,% fragments on strand plus of genes,Unnamed: 4,RunId,Run accession in SRA,Title,Other title,Sorting_title 1,Sorting title 2,Sample (summarized from biosample or manual),Stage and experimental summary,System or Tissue,"Laboratory, Author when available, submission date",Species,Project,Project description,Reference,Unnamed: 19,Sequencing protocol,Design,Sequencing platform and Machine type,Experiment type,Sequencing date,Submission date,Library preparation,RNA Integrity Number,Details,Corresponding microarray,"Spots, bases, average read length (in SRA)",Single or Paired end,Unnamed: 32,Average read length (nt),Fragment multiplicity,Million distinct sequences,Million raw reads,Megabases sequenced,%A,%T,%G,%C,%N,%GC,Unnamed: 44,Exit adaptor of read 1,Exit adaptor of read 2,PolyA index : fragments per million with at least 3 A13 motifs shifted by at least 6 bases,Telomere index : fragments per million with at least 3 TTAGGG/CCCTAA motifs,Telomere index allowing G/C substitution,Noise in telomere index : FPM with at least 3 AATCCC/GGGATT motifs,Unnamed: 51,Million reads aligned on all targets per CPU hour,Maximum RAM used (GB),Number of blocks,Unnamed: 55,Million reads aligned on any target,Average length aligned per read (nt),Mb aligned on any target,%Reads aligned on any target,%Mb aligned on any target,% length aligned on average,Unnamed: 62,Raw reads,Well mapped reads,Reads mapping uniquely to a single locus (genomic site) and maximum 1 gene,"Reads mapping uniquely to a single locus, but to 2 genes in antisense",Reads mapping uniquely to a single locus,Reads mapping to 2 to 9 sites,Reads rejected because they map to 10 sites or more,Reads rejected by alignment quality filter,Reads rejected before alignment because of low entropy (<= 16 bp),Reads with no insert (adaptor only),Reads unaligned from half aligned fragments,"Reads of good quality from unaligned fragments (missing genome, microbiome or mapping difficulty)",% well mapped reads,% reads mapping uniquely to a single locus (genomic site) and maximum 1 gene,"% reads mapping uniquely to a single locus, but to 2 genes in antisense",% reads mapping uniquely to a single locus,% reads mapping to 2 to 9 sites,% reads rejected because they map to 10 sites or more,% reads rejected by alignment quality filter,% reads rejected before alignment because of low entropy (<= 16 bp),% reads with no insert (adaptor only),% reads unaligned from half aligned fragments,"% reads of good quality from unaligned fragments (missing genome, microbiome or mapping difficulty)",Unnamed: 86,Reads aligned in PhiX DNA spike-in,Reads aligned in ERCC RNA spike-in,Reads aligned in ribosomal RNA,Reads aligned in mitochondria,Reads aligned in other small genes,Reads aligned in AceView,Reads aligned in RefSeq,Reads aligned in Encode,Reads aligned in Magic,Reads aligned in genome,Reads aligned in any previously annotated transcripts,"Reads inside introns, new exons and intergenic",Reads aligned in microbiome,Reads aligned in viruses,Reads aligned on the imaginary genome (mapping specificity control),Reads supporting known exons-junctions,Unnamed: 103,% Reads aligned in PhiX DNA spike-in,% Reads aligned in ERCC RNA spike-in,% Reads aligned in ribosomal RNA,% Reads aligned in mitochondria,% Reads aligned in other small genes,% Reads aligned in AceView,% Reads aligned in RefSeq,% Reads aligned in Encode,% Reads aligned in Magic,% Reads aligned in genome,% Reads aligned in any previously annotated transcripts,"% Reads inside introns, new exons and intergenic",% Reads aligned in microbiome,% Reads aligned in viruses,% Reads aligned on the imaginary genome (mapping specificity control),% Reads supporting known exons-junctions,Unnamed: 120,Mb aligned in PhiX DNA spike-in,Mb aligned in ERCC RNA spike-in,Mb aligned in ribosomal RNA,Mb aligned in mitochondria,Mb aligned in other small genes,Mb aligned in AceView,Mb aligned in RefSeq,Mb aligned in Encode,Mb aligned in Magic,Mb aligned in genome,Mb aligned in any previously annotated transcripts,"Mb inside introns, new exons and intergenic",Mb aligned in microbiome,Mb aligned in viruses,Mb aligned on the imaginary genome (mapping specificity control),Mb supporting known exons-junctions,Unnamed: 137,Number of aligned_fragments,Compatible pairs,Non compatible pairs,Paired end fragments with only one read aligned,% Compatible pairs,% Non compatible pairs,% Paired end fragments with only one read aligned,1% fragments in library are shorter than (nt),5% fragments in library are shorter than (nt),mode of fragment lengths,median fragment lengths,average of fragment lengths,5% fragments in library are longer than (nt),1% fragments in library are longer than (nt),Unnamed: 152,Average length after clipping adaptors and barcodes (nt),Average length aligned (nt),Average length aligned in read 1 (nt),Average length aligned in read 2 (paired end),Average length aligned on RefSeq (nt),Average length aligned on EBI (nt),Average length aligned on AceView (nt),Average length aligned on magic (nt),Average length aligned on Other (nt),Average length aligned on Genome (nt),Average length aligned on Mitochondria (nt),Average length aligned on rRNA (nt),Average length aligned on Chloroplast (nt),Average length aligned on small RNAs (nt),Average length aligned on DNA spikeIn (nt),Average length aligned on RNA spikeIn (nt),Average length aligned on Imaginary genome specificity control (nt),Unnamed: 170,Average % length aligned (nt),Average % length aligned in read 1 (nt),Average % length aligned in read 2 (paired end),Average % length aligned on RefSeq,Average % length aligned on EBI,Average % length aligned on AceView,Average % length aligned on magic,Average % length aligned on Other,Average % length aligned on Genome,Average % length aligned on Mitochondria,Average % length aligned on rRNA,Average length aligned on Chloroplast,Average length aligned on small RNAs,Average length aligned on DNA spikeIn,Average length aligned on RNA spikeIn,Average length aligned on Imaginary genome specificity control,Unnamed: 187,Reads mapping on plus strand of rRNA,Reads mapping on minus strand of rRNA,Reads mapping on both strands of rRNA,% reads mapping on plus strand of rRNA,% reads mapping on minus strand of rRNA,% reads mapping on both strand of rRNA,Reads mapping on plus strand of RefSeq,Reads mapping on minus strand of RefSeq,Reads mapping on both strands of RefSeq,% reads mapping on plus strand of RefSeq,% reads mapping on minus strand of RefSeq,% reads mapping on both strand of RefSeq,Reads mapping on plus strand of EBI,Reads mapping on minus strand of EBI,Reads mapping on both strands of EBI,% reads mapping on plus strand of EBI,% reads mapping on minus strand of EBI,% reads mapping on both strand of EBI,Reads mapping on plus strand of av,Reads mapping on minus strand of av,Reads mapping on both strands of av,% reads mapping on plus strand of av,% reads mapping on minus strand of av,% reads mapping on both strand of av,Reads mapping on plus strand of genome,Reads mapping on minus strand of genome,Reads mapping on both strands of genome,% reads mapping on plus strand of genome,% reads mapping on minus strand of genome,% reads mapping on both strand of genome,Reads mapping on plus strand of SpikeIn,Reads mapping on minus strand of SpikeIn,Reads mapping on both strands of SpikeIn,% reads mapping on plus strand of SpikeIn,% reads mapping on minus strand of SpikeIn,% reads mapping on both strand of SpikeIn,Unnamed: 224,Megabases uniquely aligned and used for mismatch counts,Total number of mismatches,Mismatches per kb aligned,Transitions,Transversions,"1, 2 or 3 bp insertions not in polymers","1, 2 or 3 bp deletions not in polymers","1, 2 or 3 bp insertions in polymers","1, 2 or 3 bp deletions in polymers",A>G,T>C,G>A,C>T,A>T,T>A,G>C,C>G,A>C,T>G,G>T,C>A,Single Insertions,Single Deletions,Double insertions,Double deletions,Triple insertions,Triple deletions,Transitions per kb,Transversions per kb,Insertions per kb,Deletions per kb,% transitions,% transversions,% insertions not in polymers,% deletions not in polymers,% insertions in polymers,% deletions in polymers,Unnamed: 262,"Accessible transcript length, estimated on transcripts longer than 8kb, usually limited by the 3' biais in poly-A selected experiments",Number of well expressed transcripts longer than 8kb,Average coverage cumulated over these transcripts,Unnamed: 266,Intergenic kb,Intergenic density,Unnamed: 269,Zero index,Low index,Cross over index,Genes touched,Genes with reliable index,Genes with index >= 10,Genes with index >= 12,Genes with index >= 15,Genes with index >= 18,Genes expressed in at least one run,RefSeq genes significantly expressed in at least one run,RefSeq genes significantly expressed in majority of runs,ENCODE genes significantly expressed in at least one run,ENCODE genes significantly expressed in majority of runs,AceView genes significantly expressed in at least one run,AceView genes significantly expressed in majority of runs,AceView transcripts significantly expressed in at least one run,Mb_aligned,Mb_in_genes,Mb_in_genes_with_GeneId_minus_high_genes,Mb_in_high_genes,Intergenic
### Run,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1
SRR332342,,10772.0,51.057,,SRR332342,SRR332342,"Wholeflies w1118 Female R3 Illumina, 5 day mat...",,RF1pw,,"Wholeflies w1118 Female R3 Illumina, 5 day mat...",Adult,Whole_organism,"Malone JH, Cho DY, Mattiuzzo NR, Artieri CG, J...",Drosophila melanogaster (taxid:7227),RNA-Seq on Illumina platform of Drosophila Dro...,RNA-Seq reads of DrosDel deficiency (Df/+) and...,PMID: 22531030 : Mediation of Drosophila autos...,,"RNA, Unspecified_RNA, RNA-Seq, cDNA",,"Illumina, Genome Analyzer II","sraRNA, sraUnspecified_RNA, RNA-Seq, cDNA",,2012-04-05,,,,,"14403905 spots, 518540580 bases, average read ...",Single-end,,f1:36,2.6,5.534,14.404,518.471,25.21,24.26,25.59,24.94,0.01,50.52,,agatcggaagagct,,69.29,0.0,0.0,0.0,,35.51,5,2,,13.201,35.17,464.254,91.65,89.54,97.69,,14403900,13200970,12808340,117307,12925647,275322,24910,107511,1933,0,0,1068577,91.65,88.92,0.814,89.74,1.91,0.173,0.746,0.0134,0.0,0,7.42,,5,42,5723251,86672,0,0,0,7061319,0,6859694,12869513,331452,0,0,25,279034,,3.47e-05,0.000292,39.73,0.602,0,0,0,49.024,0,47.62,89.35,2.3,0,0,0.000174,1.94,,0.0,0.0,203.06,3.08,0.0,0.0,0.0,246.61,0.0,241.21,452.69,11559.0,0.0,0.0,0.0,,,,-,-,-,-,-,-,,,,,,,,,35.72,35.17,35.17,,,34.92,,,,35.16,35.53,35.48,,,,,,,98.46,98.46,,,98.03,,,,98.71,98.83,98.97,,,,,,,2954663.0,2768588.0,0.0,51.63,48.37,0.0,,,,,,,3541738.0,3395040.0,124541.0,50.157,48.08,1.76,,,,,,,3431228.0,3428466.0,0.0,50.0201,49.9799,0.0,,,,,,,,439,587469,1.3382,287857,281897,410,4186,7043,6076,90084,80492,51011,66270,32086,38624,18388,21903,59697,36659,30693,43847,2210,8138,5228,1988,15,136,0.65571,0.64213,0.00093,0.00954,49.0,47.99,0.07,0.71,1.2,1.03,,4800.0,34.0,345.0,,377,0.0107,,9.3,9.4,9.9,12795.0,10772.0,10193.0,8737.0,3973.0,428.0,10772.0,,,10772.0,10772.0,10772.0,10772.0,,464.254,246.611,229.655,0.0,377
SRR1577515,,10038.0,50.92,,SRR1577515,SRR1577515,"Df(2L)ED50001/+ female rep1, whole animal, w[1...",,RF2pw,,"Df(2L)ED50001/+ female rep1, whole animal, w[1...",Adult,Whole_organism,"Lee H, Russell S, Oliver B, NIDDK, NIH, Bethes...",Drosophila melanogaster (taxid:7227),Expression profiling pooled Drosophila melanog...,We performed mRNA transcriptional profiling on...,,,"RNA, Unspecified_RNA, RNA-Seq, cDNA",Flies were homogenized in 500 ul of RNAlater s...,"Illumina, HiSeq 2000","sraRNA, sraUnspecified_RNA, RNA-Seq, cDNA",,,,,,,"6196373 spots, 470924348 bases, average read l...",Single-end,,f1:76,1.78,3.477,6.196,470.92,24.38,24.17,25.62,25.64,0.19,51.26,,,,110.87,0.0,0.0,0.16,,14.32,8,1,,6.162,74.71,460.342,99.44,97.75,98.3,,6196373,6161529,5884460,123491,6007951,153578,5852,460,63,9605,0,18864,99.438,94.97,1.99,96.96,2.48,0.0944,0.00742,0.00102,0.155,0,0.304,,6890,19275,408392,319951,0,0,0,5293878,0,4912715,6040827,113812,0,0,7,582715,,0.111,0.311,6.59,5.16,0,0,0,85.44,0,79.28,97.49,1.84,0,0,0.000113,9.4,,0.52,1.45,30.79,24.06,0.0,0.0,0.0,395.17,0.0,368.0,451.42,8406.0,0.0,0.0,0.0,,,,-,-,-,-,-,-,,,,,,,,,75.82,74.71,74.71,,,74.65,,,,74.91,75.21,75.39,,,74.79,75.35,,,98.54,98.54,,,98.5,,,,98.83,99.026,99.223,,,98.45,99.184,,,206934.0,201458.0,0.0,50.67,49.33,0.0,,,,,,,2631597.0,2536482.0,125799.0,49.71,47.91,2.38,,,,,,,2457799.0,2454916.0,0.0,50.0293,49.9707,0.0,9657.0,9618.0,0.0,50.101,49.899,0.0,,432,1141232,2.64174,625138,485026,1068,5549,10684,13767,170205,175628,139466,139839,44288,55822,46203,55142,46710,84563,74624,77674,8028,15504,2809,2959,915,853,1.44708,1.12275,0.00247,0.01284,54.78,42.5,0.09,0.49,0.94,1.21,,4350.0,42.0,528.0,,339,0.0096,,9.7,9.4,10.1,12244.0,10038.0,9500.0,8213.0,3195.0,447.0,10038.0,,,10038.0,10038.0,10038.0,10038.0,,460.342,395.168,371.811,0.0,339
SRR1577516,,10168.0,50.879,,SRR1577516,SRR1577516,"Df(2L)ED50001/+ female rep2, whole animal, w[1...",,RF2pw,,"Df(2L)ED50001/+ female rep2, whole animal, w[1...",Adult,Whole_organism,"Lee H, Russell S, Oliver B, NIDDK, NIH, Bethes...",Drosophila melanogaster (taxid:7227),Expression profiling pooled Drosophila melanog...,We performed mRNA transcriptional profiling on...,,,"RNA, Unspecified_RNA, RNA-Seq, cDNA",Flies were homogenized in 500 ul of RNAlater s...,"Illumina, HiSeq 2000","sraRNA, sraUnspecified_RNA, RNA-Seq, cDNA",,,,,,,"6560264 spots, 498580064 bases, average read l...",Single-end,,f1:76,1.86,3.525,6.56,498.577,24.72,24.52,25.24,25.33,0.19,50.57,,,,106.25,0.0,0.0,0.46,,13.34,8,1,,6.492,74.72,485.088,98.96,97.29,98.32,,6560264,6492023,6190697,158716,6349413,142610,6527,434,43,37200,0,24037,98.96,94.37,2.42,96.79,2.17,0.0995,0.00662,0.000655,0.567,0,0.366,,5731,23670,465968,392353,0,0,0,5478360,0,5089242,6359524,126768,0,0,10,623010,,0.0874,0.361,7.1,5.98,0,0,0,83.51,0,77.58,96.94,1.93,0,0,0.000152,9.5,,0.43,1.78,35.13,29.51,0.0,0.0,0.0,408.94,0.0,381.22,475.3,9363.0,0.0,0.0,0.0,,,,-,-,-,-,-,-,,,,,,,,,75.81,74.72,74.72,,,74.65,,,,74.91,75.21,75.39,,,74.77,75.33,,,98.56,98.56,,,98.51,,,,98.84,99.026,99.237,,,98.42,99.158,,,233089.0,232879.0,0.0,50.0225,49.9775,0.0,,,,,,,2705334.0,2611876.0,161150.0,49.382,47.68,2.94,,,,,,,2547121.0,2542121.0,0.0,50.0491,49.9509,0.0,11831.0,11839.0,0.0,49.9831,50.0169,0.0,,454,1171627,2.58068,640940,493785,1183,6558,12604,16557,177008,183103,139647,141182,44785,56011,45952,56248,49660,88413,73955,78761,9641,18363,3042,3777,1104,975,1.41176,1.08763,0.00261,0.01444,54.71,42.15,0.1,0.56,1.08,1.41,,4400.0,45.0,608.0,,381,0.0108,,9.7,9.5,10.1,12420.0,10168.0,9643.0,8217.0,3102.0,438.0,10168.0,,,10168.0,10168.0,10168.0,10168.0,,485.088,408.939,375.547,8.19641,381
SRR1577403,,10319.0,50.694,,SRR1577403,SRR1577403,"Df(2L)ED5878/+ female rep1, whole animal, w[11...",,RF3pw,,"Df(2L)ED5878/+ female rep1, whole animal, w[11...",Adult,Whole_organism,"Lee H, Russell S, Oliver B, NIDDK, NIH, Bethes...",Drosophila melanogaster (taxid:7227),Expression profiling pooled Drosophila melanog...,We performed mRNA transcriptional profiling on...,,,"RNA, Unspecified_RNA, RNA-Seq, cDNA",Flies were homogenized in 500 ul of RNAlater s...,"Illumina, HiSeq 2000","sraRNA, sraUnspecified_RNA, RNA-Seq, cDNA",,,,,,,"5748503 spots, 436886228 bases, average read l...",Single-end,,f1:76,1.96,2.938,5.749,436.867,25.39,25.14,24.73,24.73,0.01,49.46,,,,174.31,0.0,0.0,0.17,,14.1,6,1,,5.7,74.95,427.194,99.16,97.79,98.62,,5748503,5700078,5483738,74298,5558036,142042,8362,484,252,20449,0,18878,99.158,95.39,1.29,96.69,2.47,0.145,0.00842,0.00438,0.356,0,0.328,,9455,26809,38325,744335,0,0,0,4756622,0,4445117,5566025,124598,0,0,3,522264,,0.164,0.466,0.667,12.95,0,0,0,82.75,0,77.33,96.83,2.17,0,0,5.22e-05,9.09,,0.71,2.03,2.89,56.2,0.0,0.0,0.0,356.12,0.0,334.04,417.23,9248.0,0.0,0.0,0.0,,,,-,-,-,-,-,-,,,,,,,,,75.82,74.95,74.95,,,74.87,,,,75.15,75.5,75.43,,,75.38,75.58,,,98.85,98.85,,,98.79,,,,99.156,99.408,99.289,,,99.236,99.474,,,19377.0,18948.0,0.0,50.56,49.44,0.0,,,,,,,2372479.0,2307496.0,76647.0,49.877,48.51,1.61,,,,,,,2220549.0,2224568.0,0.0,49.9548,50.0452,0.0,13321.0,13488.0,0.0,49.689,50.311,0.0,,402,1037681,2.5813,588815,406331,1151,6837,10067,24480,152444,184047,124325,127999,39249,44398,43001,49927,37868,66661,63349,61878,8966,26842,1385,3518,867,957,1.46471,1.01077,0.00286,0.01701,56.74,39.16,0.11,0.66,0.97,2.36,,3150.0,60.0,483.0,,329,0.0093,,9.9,10.1,10.4,12433.0,10319.0,10041.0,8690.0,3334.0,481.0,10319.0,,,10319.0,10319.0,10319.0,10319.0,,427.194,356.124,337.417,0.0,329
SRR1577404,,10381.0,50.399,,SRR1577404,SRR1577404,"Df(2L)ED5878/+ female rep2, whole animal, w[11...",,RF3pw,,"Df(2L)ED5878/+ female rep2, whole animal, w[11...",Adult,Whole_organism,"Lee H, Russell S, Oliver B, NIDDK, NIH, Bethes...",Drosophila melanogaster (taxid:7227),Expression profiling pooled Drosophila melanog...,We performed mRNA transcriptional profiling on...,,,"RNA, Unspecified_RNA, RNA-Seq, cDNA",Flies were homogenized in 500 ul of RNAlater s...,"Illumina, HiSeq 2000","sraRNA, sraUnspecified_RNA, RNA-Seq, cDNA",,,,,,,"7623975 spots, 579422100 bases, average read l...",Single-end,,f1:76,2.23,3.423,7.624,579.418,25.63,25.12,24.45,24.79,0.01,49.24,,,,102.83,0.0,0.0,0.13,,15.49,8,1,,6.931,74.96,519.522,90.91,89.66,98.63,,7623975,6930835,6662658,108637,6771295,159540,9583,600,54,622986,0,59917,90.91,87.39,1.42,88.82,2.09,0.126,0.00787,0.000708,8.17,0,0.786,,14382,44969,483772,776271,0,0,0,5448639,0,5110327,6752733,163720,0,0,7,603352,,0.189,0.59,6.35,10.18,0,0,0,71.47,0,67.03,88.57,2.15,0,0,9.18e-05,7.91,,1.08,3.4,36.57,58.59,0.0,0.0,0.0,407.79,0.0,383.86,506.27,12167.0,0.0,0.0,0.0,,,,-,-,-,-,-,-,,,,,,,,,75.82,74.96,74.96,,,74.84,,,,75.11,75.47,75.59,,,75.41,75.55,,,98.87,98.87,,,98.75,,,,99.103,99.368,99.5,,,99.276,99.447,,,243757.0,240015.0,0.0,50.387,49.613,0.0,,,,,,,2690006.0,2647449.0,111184.0,49.37,48.59,2.04,,,,,,,2554700.0,2555627.0,0.0,49.99093,50.00907,0.0,22251.0,22718.0,0.0,49.481,50.519,0.0,,489,1241053,2.53794,672545,510945,1508,11826,15243,28986,179191,207480,140849,145025,49932,57949,51290,61015,46625,86882,78678,78574,12395,33659,3367,5844,989,1309,1.37535,1.04488,0.00308,0.02418,54.19,41.17,0.12,0.95,1.23,2.34,,3200.0,64.0,614.0,,391,0.0111,,9.7,9.8,10.2,12643.0,10381.0,9988.0,8549.0,3210.0,434.0,10381.0,,,10381.0,10381.0,10381.0,10381.0,,519.522,407.787,385.491,0.0,391


In [26]:
for c in dfj.columns:
    print(c)

Unnamed: 1
Genes significantly expressed
% fragments on strand plus of genes
Unnamed: 4
RunId 
Run accession in SRA
Title
Other title
Sorting_title 1
Sorting title 2
Sample (summarized from biosample or manual)
Stage and experimental summary
System or Tissue
Laboratory, Author when available, submission date
Species
Project
Project description
Reference
Unnamed: 19
Sequencing protocol
Design
Sequencing platform and Machine type
Experiment type
Sequencing date
Submission date
Library preparation
RNA Integrity Number
Details
Corresponding microarray
Spots, bases, average read length (in SRA)
Single or Paired end
Unnamed: 32
Average read length (nt)
Fragment multiplicity
Million distinct sequences
Million raw reads
Megabases sequenced
%A
%T
%G
%C
%N
%GC
Unnamed: 44
Exit adaptor of read 1
Exit adaptor of read 2
PolyA index : fragments per million with at least 3 A13 motifs shifted by at least 6 bases
Telomere index : fragments per million with at least 3 TTAGGG/CCCTAA motifs
Telomere index

# Compare Z vs J

In [27]:
zList = dfZ.BioSample.unique()

In [35]:
print(Entrez.einfo().read())

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD einfo 20130322//EN" "http://eutils.ncbi.nlm.nih.gov/eutils/dtd/20130322/einfo.dtd">
<eInfoResult>
<DbList>

	<DbName>pubmed</DbName>
	<DbName>protein</DbName>
	<DbName>nuccore</DbName>
	<DbName>nucleotide</DbName>
	<DbName>nucgss</DbName>
	<DbName>nucest</DbName>
	<DbName>structure</DbName>
	<DbName>genome</DbName>
	<DbName>annotinfo</DbName>
	<DbName>assembly</DbName>
	<DbName>bioproject</DbName>
	<DbName>biosample</DbName>
	<DbName>blastdbinfo</DbName>
	<DbName>books</DbName>
	<DbName>cdd</DbName>
	<DbName>clinvar</DbName>
	<DbName>clone</DbName>
	<DbName>gap</DbName>
	<DbName>gapplus</DbName>
	<DbName>grasp</DbName>
	<DbName>dbvar</DbName>
	<DbName>gene</DbName>
	<DbName>gds</DbName>
	<DbName>geoprofiles</DbName>
	<DbName>homologene</DbName>
	<DbName>medgen</DbName>
	<DbName>mesh</DbName>
	<DbName>ncbisearch</DbName>
	<DbName>nlmcatalog</DbName>
	<DbName>omim</DbName>
	<DbName>orgtrack</DbName>
	<DbName

In [54]:
r = Entrez.efetch(db='biosample', id=zList[:10].tolist(), retmode='xml')

In [55]:
res = Entrez.parse(r, validate=False)

In [56]:
for b in res:
    pass

ValueError: The XML file does not represent a list. Please use Entrez.read instead of Entrez.parse