annoPipeline

annoPipeline uses APIs from mygene.info and Entrez esummary to annotate a user-provided list of gene symbols.

Generates a pandas DataFrame with gene symbol, gene name, EntrezID, and bibliographic info for up to 5 pubmed publications where a functional reference was provided (more about functional references at GeneRIF).

Designed to be useful for tasks such as:

identifying relevant publications for a given function
analyzing publications trends for genes belonging to a common pathway
identifying influential PIs for a given gene network.

Reqirements:

Written for use with Python 3.7, not tested on other versions.
annoPipeline requires:
- numpy >= 1.16.2
- pandas >= 0.24.2
- Biopython >= 1.73
- openpyxl >= 2.6.1
- requests >= 2.21.0

To Install:

Required dependencies will be installed if missing, may take a few seconds.

Recommended Method:

pip install annoPipeline

Alternative:

Download / Clone the github repo.
Then, in the annoPipeline directory, run:

python setup.py install

Example usage:

Execute the full annotation pipeline on a list of gene symbols like this:

import annoPipeline as ap

# define a list of genes you would like annotated
geneList = ['CDK2', 'FGFR1', 'SLC6A4']

# annoPipeline will execute full annotation pipeline (see individual functions below). 
df = ap.annoPipeline(geneList) # returns pandas df with annotations for gene and bibliographic info.

ap.annoPipeline will default save annotation output to Excel file named by geneList symbols separated by '_'.

Warning!

If querying a single gene, still pass as a list. For example:

import annoPipeline as ap

df = ap.annoPipeline(['CDK2']) # for single gene queries still include [] - will be fixed in later version

v0.0.1 Functionality

Task 1:

From the MyGeneInfo API, use the “Gene query service" GET method to return details on a given list of human gene symbols.
From the returned json, parse out the “name", “symbol" and “entrezgene" values and print to screen

Use queryGenes():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']

l1 = ap.queryGenes(geneList) # returns list of dicts where keys are default mygene fields (symbol,name,taxid,entrezgene,ensemblgene)

Task 2:

Using the appropriate identifier from the above result, send a query to the MyGeneInfo “Gene annotation services" method for each gene
From the resulting json, collate up to 5 generif descriptions per gene
Write the results to an Excel spreadsheet with columns: gene_symbol, gene_name, entrez_id, generifs

Use getAnno():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList)
l2 = ap.getAnno(l1, saveExcel=True) # saveExcel defaults False

returns pandas df with genes and up to 5 generifs from mygene.info.
default saveExcel=False, to save output to Excel must state True
if True, Excel file will be named by geneList symbols separated by '_'.

Task 3:

Use the Pubmed IDs associated with the above generif content to extract additional bibliographic information.

Use addBibs():

import annoPipeline as ap

geneList = ['CDK2', 'FGFR1', 'SLC6A4']
l1 = ap.queryGenes(geneList)
l2 = ap.getAnno(l1)
l3 = ap.addBibs(l2) # will return df with genes and up to 5 generifs from mygene.info

Currently returns the following bibliographic information when available:
- PubDate
- Source
- Title
- LastAuthor
- DOI
- PmcRefCount

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
annoPipeline.egg-info		annoPipeline.egg-info
annoPipeline		annoPipeline
build/lib		build/lib
dist		dist
test		test
CHANGES.txt		CHANGES.txt
LICENSE.txt		LICENSE.txt
README.md		README.md
REST_test.yml		REST_test.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annoPipeline.egg-info

annoPipeline.egg-info

annoPipeline

build/lib

build/lib

dist

dist

test

test

CHANGES.txt

CHANGES.txt

LICENSE.txt

LICENSE.txt

README.md

README.md

REST_test.yml

REST_test.yml

setup.py

setup.py

Repository files navigation

annoPipeline - an API-enabled gene annotation pipeline

Reqirements:

To Install:

Recommended Method:

Alternative:

Example usage:

Warning!

v0.0.1 Functionality

Task 1:

Task 2:

Task 3:

About

Releases

Packages

Languages

License

jimmyjamesarnold/annoPipeline

Folders and files

Latest commit

History

Repository files navigation

annoPipeline - an API-enabled gene annotation pipeline

Reqirements:

To Install:

Recommended Method:

Alternative:

Example usage:

Warning!

v0.0.1 Functionality

Task 1:

Task 2:

Task 3:

About

Resources

License

Stars

Watchers

Forks

Languages