# General Description

The `Bio.Entrez` module functionality maps well to the E-Utils programming utilities for interacting with the Entrez system which itself is interface for interacting with the various NCBI databases. Inspecting this module, gives a good understanding of how E-Utils work. I will focus on E-utils for the PubMed database.

In [1]:
import sys
print(sys.version)

3.4.3 |Continuum Analytics, Inc.| (default, Mar  6 2015, 12:07:41) 
[GCC 4.2.1 (Apple Inc. build 5577)]


In [2]:
from Bio import Entrez
import pandas as pd

## Variables

`Bio.Entrez` requires that you set two variables: `email` and `tool`. The eUtils API doesn't like it when you hit it more than 3 times per second. ([See here.](http://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen)) You can get blocked if you do it more often. To ensure you're not blocked, announce yourself to the API by including your email and tool variables in every request. `Bio.Entrez` (the python module) would include these variable for you in the subsequent requsts to E-Utils. You would provide them to `Bio.Entrez` as follows:

In [3]:
Entrez.email = 'firas.wehbe@vanderbilt.edu'
Entrez.tool = 'BioPython Entrez Demo'

## Methods

### EInfo

EInfo is a tool for giving you information about the Entrez databases. First check out this example of how in general `Bio.Entrez` works to handle returned results (hint: they are returned as a stream that you can parse userself using the `.read()` method or it would parse it for you using the `Bio.Entrez.read()` method.

In [4]:
info_handler = Entrez.einfo()
# If you type info_handler.read() you can see the returned XML. Instead I'm parsing it using the built-in parser.
info_records = Entrez.read( info_handler ) 
info_records

{'DbList': ['pubmed', 'protein', 'nuccore', 'nucleotide', 'nucgss', 'nucest', 'structure', 'genome', 'gpipe', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'clone', 'gap', 'gapplus', 'grasp', 'dbvar', 'epigenomics', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'probe', 'proteinclusters', 'pcassay', 'biosystems', 'pccompound', 'pcsubstance', 'pubmedhealth', 'seqannot', 'snp', 'sra', 'taxonomy', 'unigene', 'gencoll', 'gtr']}

A richer data structure is returned if you specify the database in your request.

In [5]:
info_handler = Entrez.einfo( db = 'pubmed' )
info_records = Entrez.read( info_handler ) 

In [6]:
info_records.keys()

dict_keys(['DbInfo'])

In [7]:
info_records['DbInfo'].keys()

dict_keys(['DbName', 'Count', 'Description', 'FieldList', 'LinkList', 'LastUpdate', 'DbBuild', 'MenuName'])

In [8]:
info_records['DbInfo']['Count']

'25251165'

In [9]:
info_records['DbInfo']['LastUpdate']

'2015/09/06 07:56'

In [10]:
info_records['DbInfo']['MenuName']

'PubMed'

In [11]:
info_records['DbInfo']['Description']

'PubMed bibliographic record'

In [12]:
pd.DataFrame(info_records['DbInfo']['LinkList'])

Unnamed: 0,DbTo,Description,Menu,Name
0,assembly,Assembly,Assembly,pubmed_assembly
1,bioproject,Related Projects,Project Links,pubmed_bioproject
2,biosample,BioSample links,BioSample Links,pubmed_biosample
3,biosystems,Pathways and biological systems (BioSystems) t...,BioSystem Links,pubmed_biosystems
4,books,NCBI Bookshelf books that cite the current art...,Cited in Books,pubmed_books_refs
5,cdd,Conserved Domain Database (CDD) records that c...,Domain Links,pubmed_cdd
6,clinvar,Clinical variations associated with publication,ClinVar,pubmed_clinvar
7,clinvar,Clinical variations calculated to be associate...,ClinVar (calculated),pubmed_clinvar_calculated
8,dbvar,Link from PubMed to dbVar,dbVar,pubmed_dbvar
9,epigenomics,Links to experimental data from epigenomic stu...,Epigenomics (experiments),pubmed_epigenomics_experiment


In [13]:
pd.DataFrame( info_records['DbInfo']['FieldList'] )

Unnamed: 0,Description,FullName,Hierarchy,IsDate,IsHidden,IsNumerical,Name,SingleToken,TermCount
0,All terms from all searchable fields,All Fields,N,N,N,N,ALL,N,160598414
1,Unique number assigned to publication,UID,N,N,Y,Y,UID,Y,0
2,Limits the records,Filter,N,N,N,N,FILT,Y,9048
3,Words in title of publication,Title,N,N,N,N,TITL,N,15510539
4,Free text associated with publication,Text Word,N,N,N,N,WORD,N,53709704
5,Medical Subject Headings assigned to publication,MeSH Terms,Y,N,N,N,MESH,Y,603019
6,MeSH terms of major importance to publication,MeSH Major Topic,Y,N,N,N,MAJR,Y,539102
7,Author(s) of publication,Author,N,N,N,N,AUTH,Y,13629428
8,Journal abbreviation of publication,Journal,N,N,N,N,JOUR,Y,174840
9,Author's institutional affiliation and address,Affiliation,N,N,N,N,AFFL,N,32267617


In [14]:
info_records['DbInfo']['DbName']

'pubmed'

### ESummary

### ESearch

### EPost

### EFetch

### ELink