# General Description

The `Bio.Entrez` module functionality maps well to the E-Utils programming utilities for interacting with the Entrez system which itself is interface for interacting with the various NCBI databases. Inspecting this module, gives a good understanding of how E-Utils work. I will focus on E-utils for the PubMed database.

In [1]:
import sys
print(sys.version)

3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]


In [2]:
from Bio import Entrez
import pandas as pd

## Variables

`Bio.Entrez` requires that you set two variables: `email` and `tool`. The eUtils API doesn't like it when you hit it more than 3 times per second. ([See here.](http://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen)) You can get blocked if you do it more often. To ensure you're not blocked, announce yourself to the API by including your email and tool variables in every request. `Bio.Entrez` (the python module) would include these variable for you in the subsequent requsts to E-Utils. You would provide them to `Bio.Entrez` as follows:

In [3]:
Entrez.email = 'firas.wehbe@vanderbilt.edu'
Entrez.tool = 'BioPython Entrez Demo'

## Methods

### EInfo

EInfo is a tool for giving you information about the Entrez databases. First check out this example of how in general `Bio.Entrez` works to handle returned results (hint: they are returned as a stream that you can parse userself using the `.read()` method or it would parse it for you using the `Bio.Entrez.read()` method.

In [4]:
info_handler = Entrez.einfo()
# If you type info_handler.read() you can see the returned XML. Instead I'm parsing it using the built-in parser.
info_records = Entrez.read( info_handler ) 
info_records

{'DbList': ['pubmed', 'protein', 'nuccore', 'nucleotide', 'nucgss', 'nucest', 'structure', 'sparcle', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'clone', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'probe', 'proteinclusters', 'pcassay', 'biosystems', 'pccompound', 'pcsubstance', 'pubmedhealth', 'seqannot', 'snp', 'sra', 'taxonomy', 'unigene', 'gencoll', 'gtr']}

A richer data structure is returned if you specify the database in your request.

In [5]:
info_handler = Entrez.einfo( db = 'clinvar' )
info_records = Entrez.read( info_handler ) 

In [6]:
info_records.keys()

dict_keys(['DbInfo'])

In [7]:
info_records['DbInfo'].keys()

dict_keys(['Count', 'LastUpdate', 'MenuName', 'DbBuild', 'FieldList', 'DbName', 'Description', 'LinkList'])

In [8]:
info_records['DbInfo']['Count']

'170659'

In [9]:
info_records['DbInfo']['LastUpdate']

'2016/10/12 15:46'

In [10]:
info_records['DbInfo']['MenuName']

'ClinVar'

In [11]:
info_records['DbInfo']['Description']

'ClinVar Database'

In [12]:
pd.DataFrame(info_records['DbInfo']['LinkList'])

Unnamed: 0,DbTo,Description,Menu,Name
0,dbvar,Related record in dbVar,dbVar,clinvar_dbvar
1,gene,Related genes,Gene,clinvar_gene
2,gtr,Testing for clinical variations,GTR (all),clinvar_gtr
3,medgen,Related phenotypes in MedGen,MedGen,clinvar_medgen
4,omim,Gene or disease records in OMIM,OMIM,clinvar_omim
5,orgtrack,Organizations with information about this vari...,Orgtrack (all),clinvar_orgtrack
6,pmc,Full text articles in PMC,PMC,clinvar_pmc
7,pubmed,Publications associated with clinical variation,PubMed,clinvar_pubmed
8,pubmed,Publications calculated to be associated with ...,PubMed (calculated),clinvar_pubmed_calculated
9,snp,Related record in dbSNP,dbSNP,clinvar_snp


In [13]:
pd.DataFrame( info_records['DbInfo']['FieldList'] )

Unnamed: 0,Description,FullName,Hierarchy,IsDate,IsHidden,IsNumerical,Name,SingleToken,TermCount
0,All terms from all searchable fields,All Fields,N,N,N,N,ALL,N,4833446
1,Unique number assigned to variation,UID,N,N,Y,Y,UID,Y,0
2,Limits the records,Filter,N,N,N,N,FILT,Y,58
3,Constructed from variant and phenotype names,Name of the ClinVar record,N,N,N,N,TITL,Y,390105
4,Free text associated with record,Text Word,N,N,N,N,WORD,Y,2474054
5,scientific and common names of organism,Organism,Y,N,N,N,ORGN,Y,1
6,The last date on which the record was updated,Date modified,N,Y,N,N,MDAT,Y,95
7,Chromosome number or numbers; also 'mitochondr...,Chromosome,N,N,N,N,CHR,Y,26
8,Symbol or symbols of the gene,Gene Name,N,N,N,N,GENE,Y,27286
9,MIM number from OMIM,MIM,N,N,N,N,MIM,Y,19763


In [14]:
info_records['DbInfo']['DbName']

'clinvar'

### ESummary

### ESearch

### EPost

### EFetch

### ELink