# Graph Relationships Among Researchers

We are going to create Graphs describing relationships between researchers based on co-authorships. In this notebook we are going to use [Biopython](http://biopython.org/) to query PubMed and get citation information for articles published by various researchers.

Feel free to create your own list of researchers (including yourself!)



### Uncomment and run the cell below if you need to install biopython

In [2]:
#!conda install biopython -y

In [3]:
from Bio import Entrez
import networkx as nx
import os
DATADIR = os.getcwd()
print(os.path.exists(DATADIR))
from IPython.display import Image
import getpass
import gzip
import pickle

True


### An Example List of BMI Faculty

Since our names are not unique identifiers, it can be challenging to query PubMed based on name. For example, I try to be "Brian E Chapman" professionally but I have had papers published as "Brian Chapman". The list below is copied from a spreadsheet with some tweaking to get the names into the most common form for publishing. Since I copied this from a spreadsheet, I have to do a little manipulation to get the names into FIRSTNAME LASTNAME form.


In [4]:
faculty = [tuple(s.split("\t")) for s in 
"""AbdelRahman	Samir E
Adler	Frederick R
Bray	Bruce E
Camp	Nicola J
Chapman	Brian E
Chapman	Wendy W
Conway	Michael A
Cummins	Mollie R
Del Fiol	Guilherme
Drews	Frank A
Egger	Marlene J
Eilbeck	Karen
Evans	R Scott
Facelli	Julio C
Gibson	Bryan S
Gouripeddi	Ramkiran
Haug	Peter J
Huff	Stanley M
Hurdle	John F
Kawamoto	Kensaku
Lee	Younghee
Narus	Scott P
Nebeker	Jonathan
Parker	Dennis L
Piccolo	Stephen
Quinlan	Aaron
Samore	Matthew H
Sauer	Brian C
Staes	Catherine J
Sward	Katherine A
Weir	Charlene R
Yandell	Mark
Dean	J Michael
Gesteland	Per H
Gundlapalli	Adi V
Jackson	Brian R
Lincoln	Michael J
Morris	Alan H
Xu	Wu""".split("\n")]
faculty = ["%s %s"%(f[1],f[0]) for f in faculty]


### Here is a shorter, alternative list
#### Edit and uncomment

In [7]:
faculty = ["Brian E Chapman", "David Gur", "Wendy W Chapman", "Peter J Haug", "Dennis L Parker", "Matthew H Samore"]

### Get the pubmed IDs matching query

In [None]:
email_string = input("Enter your e-mail: ").strip()

In [None]:
def search(query, email=''):
    Entrez.email = email
    handle = Entrez.esearch(db='pubmed', 
                            sort='relevance', 
                            retmax='100',
                            retmode='xml', 
                            term=query)
    results = Entrez.read(handle)
    return results

### Fetch papers corresponding to ids

In [None]:
def fetch_details(id_list, email="brian.chapman@utah.edu"):
    ids = ','.join(id_list)
    Entrez.email = email
    handle = Entrez.efetch(db='pubmed',
                           retmode='xml',
                           id=ids)
    results = Entrez.read(handle)
    return results

### Get Co-authorship

Entrez returns a lot of information. We hone it down to just the names. We need to use exceptions because the returned papers doesn't always have the fields we want.

In [None]:
def get_coauthor_lists(papers):
    paper_authors = {}
    for p in papers:
        try:
            tmp = p['MedlineCitation']
            alist = []
            for a in tmp['Article']['AuthorList']:
                try:
                    s = "%s %s"%(a['ForeName'],a['LastName'])
                    alist.append(s)
                except Exception as error:
                    pass
                    #print(error)
            paper_authors[tmp['Article']['ArticleTitle']] = alist
        except:
            pass
    return paper_authors

In [None]:

def get_faculty_coauthors(faculty, email=''):
    return get_coauthor_lists( 
                              fetch_details(
                                  search(faculty, email=email)['IdList'], email=email)["PubmedArticle"])

### Author:Co-author dictionary

In [None]:
coauthors_with_ext = {"%s"%f : get_faculty_coauthors(f, email=email_string) for f in faculty}

In [None]:
with gzip.open("researchers_pubmed.pickle.gzip", "wb") as f0:
    pickle.dump(coauthors_with_ext, f0)

In [None]:
!ls -l