# Query Researcher and Affiliations by ORCID
This notebook shows how to fetch an researcher by ORCID, getting his/her co-authors and their get the affiliations using Augment API. Then it gives 2 examples to use the data for visualisation.

[Download Notebook](https://github.com/researchgraph/augment-api-beta/blob/main/docs/notebooks/affiliations.ipynb)

Related Notebooks:  
- [orcid notebook](./orcid.ipynb) Query Researcher and Co-author Relationships by ORCID  
- [publications notebook](./publications.ipynb) Publication List for A Researcher in Bibtex Format. Visualise data with bar plot and wordcloud.  
- [doi notebook](./doi.ipynb) Query Data by DOI  

In [1]:
import sys
sys.path.append('../')

# Package for mapping data on world map
# !{sys.executable} -m pip install folium
import folium

# Packages for plotting charts, graphs
import ast
import altair as alt
import networkx as nx
import nx_altair as nxa
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Packages for data manipulation
import pandas as pd
from datetime import datetime, date

# Build-in packages to use API
import requests
import json

# packages to read API_KEY
import os
from os.path import join, dirname
from dotenv import load_dotenv
load_dotenv();

## API Errors  
When using the API, we load API_KEY and ORCID ID you want to search into variables and add them in the url string. Later the python request package will pass those values to the API and get the data you want. This section shows the 2 types of common errors you might get when using augment API. Either the ORCID id passed is invalid or the API_KEY is not load successfully from you environment file.
### ORCID ID Not Found  
Here we assign an invalid value to the ORCID variable. When error occurs, the request.get( ) will be an object with the status code indicating what type error it is with an error message for explanation.

In [2]:
# ORCID ID not found
API_KEY = os.environ.get("API_KEY")
ORCID = "0000-0003-XXXX-XXXX"

url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)

# print a short confirmation on completion
print('Augment API query complete ', r.status_code)

if r.status_code == 400:
    print(r.json()[0]["error"])

Augment API query complete  400
We have failed to identify this ORCID (0000-0003-XXXX-XXXX). If it is a new identifier, it might take a few days to appear on our server.


### Missing API_KEY  
You will receive an authentication error if the API KEY in not valid.

In [3]:
# Missing API_KEY
API_KEY = ''
ORCID = "0000-0002-0715-6126"

url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)

# print a short confirmation on completion
print('Augment API query complete ', r.status_code)

if r.status_code == 401:
    print(f'Authentication error.',r.json()['message'])

Augment API query complete  401
Authentication error. Access denied due to invalid subscription key. Make sure to provide a valid key for an active subscription.


## Data Extraction for Valid ORCID ID  
For valid ORCID records retrieved, it is a nested dictionary structure with all data that is connected to the ORCID requested. First level has 3 keys as shown in the block below.

In [4]:
# ORCID ID does exist
API_KEY = os.environ.get("API_KEY")
ORCID = "0000-0002-0068-716X"

url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)

# print a short confirmation on completion
print('Augment API query complete ', r.status_code)
# Shows data 
print('The data returned has below fields: ',r.json()[0].keys())

Augment API query complete  200
The data returned has below fields:  dict_keys(['nodes', 'relationships', 'stats'])


In node above, data is stored for 5 labels as the researchgraph schema:

In [5]:
r.json()[0]["nodes"].keys()

dict_keys(['datasets', 'grants', 'organisations', 'publications', 'researchers'])

Each data above is stored as a list of dictionaries for each person. To extract the researcher we need, iterate through the list and check for the ORCID.

In [6]:
if r.status_code == 200 and r.json()[0]["nodes"]["researchers"]:    
    researchers = r.json()[0]["nodes"]["researchers"]
    
    researcher = None
    for i in range(len(researchers)):
        if researchers[i]["orcid"] == ORCID:
            researcher = researchers[i]

print()
print(f'ORCID: {researcher["orcid"]}')
print(f'First name: {researcher["first_name"]}')
print(f'Last name: {researcher["last_name"]}')
print()
print(f'The researcher {researcher["full_name"]} is connected to {r.json()[0]["stats"]}.')


ORCID: 0000-0002-0068-716X
First name: Cameron
Last name: Neylon

The researcher Cameron Neylon is connected to {'datasets': 18, 'grants': 9, 'organisations': 245, 'publications': 149, 'researchers': 152}.


### List of co-authors
Now we get all researchers connected to our goal researcher in the data. Note that this includes the requested ORCID and only includes co-authors with ORCID IDs.

In [7]:
rf = pd.DataFrame(r.json()[0]["nodes"]["researchers"], columns=['first_name', 'last_name', 'full_name', 'orcid'])
dfStyler = rf.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])

Unnamed: 0,first_name,last_name,full_name,orcid
0,Bastian,Greshake Tzovaras,Bastian Greshake Tzovaras,0000-0002-9925-9623
1,Chun-Kai,Huang,Chun-Kai Huang,0000-0002-9656-5932
2,Victoria,Garcia Sakai,Victoria Garcia Sakai,0000-0001-6570-4218
3,Olaf,Holderer,Olaf Holderer,0000-0001-6746-7965
4,Nicholas,Dixon,Nicholas Dixon,0000-0002-5958-6945
5,Roberto Arturo,Rossi,Roberto Arturo Rossi,0000-0001-8659-082X
6,Bianca,Kramer,Bianca Kramer,0000-0002-5965-6560
7,Francois,Waldner,Francois Waldner,0000-0002-5599-7456
8,Joan,Leach,Joan Leach,0000-0002-1376-5761
9,Nazeefa,Fatima,Nazeefa Fatima,0000-0001-7791-4984


### List of co-author affiliations  
Researcher affiliations can be extracted from organisation nodes, and an example of the record is like this:

In [8]:
r.json()[0]["nodes"]["organisations"][0]

{'country': 'US',
 'doi': '10.13039/100010087',
 'grid': 'grid.267480.f',
 'isni': '0000000121879315',
 'key': 'researchgraph.com/wikidata/Q2590529',
 'lang': 'en',
 'latitude': '44.8698',
 'logo': 'https://upload.wikimedia.org/wikipedia/en/thumb/2/23/UWStout_seal.png/200px-UWStout_seal.png',
 'longitude': '-91.9278',
 'name': 'University of Wisconsin–Stout',
 'ror': '01gb8pc70',
 'url': 'https://en.wikipedia.org/wiki/University_of_Wisconsin–Stout'}

Note that the key includs a researchgraph prefix with the wikidata for the organisation. To extract data with wikidata id only, we need to format the string using force_wiki_data( ).

In [9]:
# Strip wikidata ID from key
def force_wikidata(n):
    n['key'] = n['key'].split('/')[-1]
    return n

json = map(force_wikidata, r.json()[0]["nodes"]["organisations"])
of = pd.DataFrame(json, columns=['name', 'country', 'key', 'ror', 'lat', 'lon'])
of = of.rename(columns={'key': 'wikidata'})
dfStyler = of.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])

Unnamed: 0,name,country,wikidata,ror,lat,lon
0,University of Wisconsin–Stout,US,Q2590529,01gb8pc70,,
1,Lund University,SE,Q218506,012a77v79,,
2,San Francisco VA Medical Center,US,Q7414132,049peqw80,,
3,Kew Gardens,GB,Q188617,,,
4,University of Adelaide,AU,Q15574,00892tw58,,
5,Yale University,US,Q49112,03v76x132,,
6,Argonne National Laboratory,US,Q649120,05gvnxz63,,
7,Nanyang Technological University,SG,Q721064,02e7b5302,,
8,Dryad,US,Q5309616,00x6h5n95,,
9,University of Lethbridge,CA,Q1689439,044j76961,,


Now we can use ror API to query organisation data to filter the research affiliations and get below fields: name, country, wikidata, ror_id, latitude and longtitude.

In [10]:
data = []
for index, row in of.iterrows():
    url = 'https://api.ror.org/organizations?query=' + row['wikidata']
    r2 = requests.get(url)

    # print an error message if status code != 200
    if r2.status_code != 200:
        print('ROR API query returned an error', r2.status_code)

    if r2.json()['number_of_results'] == 0:
        # we need to work on better aligning with ROR. Main issue seems to be wikidata identifiers for departments which ROR does not support
        print('No ROR record found for wikidata ' + row['name'] + ' ' + row['wikidata'])
    else:
        name = row['name']
        country = row['country']
        wikidata = row['wikidata']
        ror = r2.json()['items'][0]['id'][8:]
        lat = r2.json()['items'][0]['addresses'][0]['lat']
        lon = r2.json()['items'][0]['addresses'][0]['lng']
        data.append([name, country, wikidata, ror, lat, lon])

of2 = pd.DataFrame(data, columns=['name', 'country', 'wikidata', 'ror', 'lat', 'lon'])
dfStyler = of2.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])

No ROR record found for wikidata Kew Gardens Q188617
No ROR record found for wikidata Dryad Q5309616
No ROR record found for wikidata Bloomsbury Publishing Q568642
No ROR record found for wikidata Waterford Kamhlaba Q1550406
No ROR record found for wikidata Stanford University School of Medicine Q4115969
No ROR record found for wikidata University of Toronto Faculty of Information Q7896482
No ROR record found for wikidata Harvard Medical School Q49121
No ROR record found for wikidata University of Toronto Faculty of Arts and Science Q7896481
No ROR record found for wikidata SURF (Samenwerkende Universitaire Rekenfaciliteiten) Q2422744
No ROR record found for wikidata Simon Fraser University - Vancouver Q99284970
No ROR record found for wikidata McGill University Faculty of Agriculture and Environment Q101009534
No ROR record found for wikidata New York University in London Q99285271
No ROR record found for wikidata Harvard College Q49123
No ROR record found for wikidata Leibniz Institu

Unnamed: 0,name,country,wikidata,ror,lat,lon
0,University of Wisconsin–Stout,US,Q2590529,ror.org/01gb8pc70,44.869722,-91.927778
1,Lund University,SE,Q218506,ror.org/012a77v79,55.70584,13.19321
2,San Francisco VA Medical Center,US,Q7414132,ror.org/049peqw80,37.78247,-122.504235
3,University of Adelaide,AU,Q15574,ror.org/00892tw58,-34.920656,138.605756
4,Yale University,US,Q49112,ror.org/03v76x132,41.30815,-72.92816
5,Argonne National Laboratory,US,Q649120,ror.org/05gvnxz63,41.709166,-87.981992
6,Nanyang Technological University,SG,Q721064,ror.org/02e7b5302,1.344722,103.681389
7,University of Lethbridge,CA,Q1689439,ror.org/044j76961,49.67898,-112.846402
8,University of North Carolina at Chapel Hill,US,Q192334,ror.org/0130frc33,35.905164,-79.046945
9,University of Kent,GB,Q1516684,ror.org/00xkeyj56,51.27904,1.07992


We can choose to use the location data and visualise the affiliation on a world map.

In [11]:
# map affiliations on a world map, center around home institution (Curtin University, for now done manually)
m = folium.Map(tiles='cartodbpositron', location=[of2.loc[[89]].lat, of2.loc[[89]].lon], zoom_start=3)

#Adding markers to the map
for index, row in of2.iterrows():
    folium.CircleMarker(location=[row['lat'], row['lon']],popup=row['name'], fill=True,
    color="#8248C6", radius=2).add_to(m)
m

Or visualise the researcher-affiliation relationship as a graph.

In [12]:
# Generate a graph from the co-authors and their affiliations
G = nx.Graph()

# add researchers as graph nodes
for index, row in rf.iterrows():
    G.add_node(row['orcid'], name=row['full_name'], node_color='#54C48C', type='researcher')
# add organisations as graph nodes
for index, row in of2.iterrows():
    G.add_node(row['wikidata'], name=row['name'], node_color='#8248C6', type='organisation')

# Convert from and to for researcher relationships into ORCID IDs (to map the node labels)
def force_pid(n):
    n['from'] = n['from'].split('/')[-1]
    n['to'] = n['to'].split('/')[-1]
    return n

# get co-author relationship with requested researcher
json = map(force_pid, r.json()[0]['relationships']['researcher-researcher'])
ef = pd.DataFrame(json, columns=['from', 'to'])

# get affiliation relationship for researchers
json = map(force_pid, r.json()[0]['relationships']['researcher-organisation'])
eo = pd.DataFrame(json, columns=['from', 'to'])

# add relationships as graph edges
G.add_edges_from(ef.to_numpy())
G.add_edges_from(eo.to_numpy())
    
# Compute positions for viz.
pos = nx.spring_layout(G)

options = {
    "font_size": 12,
    "node_size": 50,
    "edge_color": "lightgray",
    "linewidths": 0.1,
    "width": 1
}

# Show information about the graph
print(nx.info(G))
print("Network density:", nx.density(G))

# export graph to a gephi file
nx.write_gexf(G, "affiliationss.gexf")

# Draw the graph using altair
viz = nxa.draw_networkx(G, pos=pos, node_tooltip='name', node_color='node_color', **options).properties(width=800, height=800)
viz.interactive()

Graph with 397 nodes and 507 edges
Network density: 0.0064498893214258455


[Download Gephi file](https://github.com/researchgraph/augment-api-beta/blob/main/docs/notebooks/affiliationss.gexf)