# Query Researcher and Co-author Relationships by ORCID

This notebook demostrates how to pass an ORCID id to the Augment API and query for researcher data and plotting the data nodes and relationships in a network graph.


[Download Notebook](https://github.com/researchgraph/augment-api-beta/blob/main/docs/notebooks/orcid.ipynb)

Related Notebooks:  
- [publications notebook](./publications.ipynb) Publication List for A Researcher in Bibtex Format. Visualise data with bar plot and wordcloud.  
- [doi notebook](./doi.ipynb) Query Data by DOI    
- [affiliations notebook](./affiliations.ipynb) Query Researcher and Affiliations by ORCID. Mapping affiliation data on worldmap, visualising researcher-organisation relationship.




In [1]:
import sys
sys.path.append('../')

# packages to read API_KEY
import os
from os.path import join, dirname
from dotenv import load_dotenv
load_dotenv();

# Packages to use API
import requests
import json

# Packages for data manipulation
import pandas as pd
from datetime import datetime, date

# Packages for chart plotting
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Packages for graph plotting
import ast
import altair as alt
import networkx as nx
import nx_altair as nxa


## API Errors  
When using the API, we load API_KEY and ORCID ID you want to search into variables and add them in the url string. Later the python request package will pass those values to the API and get the data you want. This section shows the 2 types of common errors you might get when using augment API. Either the ORCID id passed is invalid or the API_KEY is not load successfully from you environment file.
### ORCID ID Not Found  
Here we assign an invalid value to the ORCID variable. When error occurs, the request.get( ) will be an object with the status code indicating what type error it is with an error message for explanation.

In [2]:
# pass an invalid ORCID 
API_KEY = os.environ.get("API_KEY")
ORCID = "0000-0003-XXXX-XXXX"

url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)

# print a short confirmation on completion
print('Augment API query complete ', r.status_code)

if r.status_code == 400:
    print(r.json()[0]["error"])

Augment API query complete  400
We have failed to identify this ORCID (0000-0003-XXXX-XXXX). If it is a new identifier, it might take a few days to appear on our server.


### Missing API_KEY  
You will receive an authentication error if the API KEY in not valid.

In [3]:
# Missing API_KEY
API_KEY = ''
ORCID = "0000-0002-0068-716X"

url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)

# print a short confirmation on completion
print('Augment API query complete ', r.status_code)

if r.status_code == 401:
    print(f'Authentication error.',r.json()['message'])

Augment API query complete  401
Authentication error. Access denied due to invalid subscription key. Make sure to provide a valid key for an active subscription.


## Data Extraction for Valid ORCID ID  
For valid ORCID records retrieved, it is a nested dictionary structure with all data that is connected to the ORCID requested. First level has 3 keys as shown in the block below.

In [4]:
# ORCID ID does exist
API_KEY = os.environ.get("API_KEY")
ORCID = "0000-0002-0068-716X"

url = f'https://augmentapi.researchgraph.com/v1/orcid/{ORCID}?subscription-key={API_KEY}'
r = requests.get(url)

# print a short confirmation on completion
print('Augment API query complete ', r.status_code)

# Shows data 
print('The data returned has below fields: ',r.json()[0].keys())

Augment API query complete  200
The data returned has below fields:  dict_keys(['nodes', 'relationships', 'stats'])


In nodes, data is stored for 5 labels as the researchgraph schema:

In [5]:
r.json()[0]["nodes"].keys()

dict_keys(['datasets', 'grants', 'organisations', 'publications', 'researchers'])

Each data above is stored as a list of dictionaries for each person. To extract the researcher we need, iterate through the list and check for the ORCID.

In [6]:
# Extract Researcher information
if r.status_code == 200 and r.json()[0]["nodes"]["researchers"]:    
    researchers = r.json()[0]["nodes"]["researchers"]
    
    researcher = None
    for i in range(len(researchers)):
        if researchers[i]["orcid"] == ORCID:
            researcher = researchers[i]

    print()
    print(f'ORCID: {researcher["orcid"]}')
    print(f'First name: {researcher["first_name"]}')
    print(f'Last name: {researcher["last_name"]}')
    print()
    print(f'The researcher {researcher["full_name"]} is connected to {r.json()[0]["stats"]}.')


ORCID: 0000-0002-0068-716X
First name: Cameron
Last name: Neylon

The researcher Cameron Neylon is connected to {'datasets': 18, 'grants': 9, 'organisations': 245, 'publications': 149, 'researchers': 152}.


### List of co-authors
The researchers in the list are connected co-authers for the ORCID we need.

In [7]:
rf = pd.DataFrame(r.json()[0]["nodes"]["researchers"], columns=['first_name', 'last_name', 'full_name', 'orcid'])
dfStyler = rf.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])

Unnamed: 0,first_name,last_name,full_name,orcid
0,Bastian,Greshake Tzovaras,Bastian Greshake Tzovaras,0000-0002-9925-9623
1,Chun-Kai,Huang,Chun-Kai Huang,0000-0002-9656-5932
2,Victoria,Garcia Sakai,Victoria Garcia Sakai,0000-0001-6570-4218
3,Olaf,Holderer,Olaf Holderer,0000-0001-6746-7965
4,Nicholas,Dixon,Nicholas Dixon,0000-0002-5958-6945
5,Roberto Arturo,Rossi,Roberto Arturo Rossi,0000-0001-8659-082X
6,Bianca,Kramer,Bianca Kramer,0000-0002-5965-6560
7,Francois,Waldner,Francois Waldner,0000-0002-5599-7456
8,Joan,Leach,Joan Leach,0000-0002-1376-5761
9,Nazeefa,Fatima,Nazeefa Fatima,0000-0001-7791-4984


### Co-author Relationship
Now we can visualise co-authors relationships with our target researcher by extracting data in relationships list. However, the relationship keys needs some formatting to get ORCID. Some relationship examples are shown below.

In [8]:
r.json()[0]['relationships']['researcher-researcher'][:5]

[{'from': 'researchgraph.com/orcid/0000-0002-0068-716X',
  'to': 'researchgraph.com/orcid/0000-0002-7693-4964'},
 {'from': 'researchgraph.com/orcid/0000-0002-0068-716X',
  'to': 'researchgraph.com/orcid/0000-0002-2251-8092'},
 {'from': 'researchgraph.com/orcid/0000-0002-0068-716X',
  'to': 'researchgraph.com/orcid/0000-0002-0026-989X'},
 {'from': 'researchgraph.com/orcid/0000-0002-0068-716X',
  'to': 'researchgraph.com/orcid/0000-0003-0183-6910'},
 {'from': 'researchgraph.com/orcid/0000-0002-0068-716X',
  'to': 'researchgraph.com/orcid/0000-0002-0411-8300'}]

In [11]:
# Format keys from relationship list to ORCID IDs (to map the node labels)
def force_orcid(n):
    n['from'] = n['from'].split('/')[-1]
    n['to'] = n['to'].split('/')[-1]
    return n

 Note the requested researcher is the 'from' node. These are the connections between requested researcher and other researchers, we now add these connections to the graph.

In [12]:
# Generate a graph from the co-authors
G = nx.Graph()

# add co-author researcher nodes to the graph
for index, row in rf.iterrows():
    G.add_node(row['orcid'], name=row['full_name'], color='#54C48C')
# format the relationship data
json = map(force_orcid, r.json()[0]['relationships']['researcher-researcher'])
ef = pd.DataFrame(json, columns=['from', 'to'])

#     add them into graph as edges
G.add_edges_from(ef.to_numpy())

Show current graph:

In [13]:
# Compute positions
pos = nx.spring_layout(G)

options = {
    "font_size": 12,
    "node_size": 50,
    "edge_color": "lightgray",
    "node_color": "#54C48C",
    "linewidths": 0.1,
    "width": 1
}

# Show information about the graph
print(nx.info(G))
print("Network density:", nx.density(G))

# Disable maximum row check for big dataset
alt.data_transformers.disable_max_rows()

# Draw the graph using altair
viz = nxa.draw_networkx(G, pos=pos, node_tooltip='name', **options).properties(width=800, height=800)
viz.interactive()

Graph with 152 nodes and 151 edges
Network density: 0.013157894736842105


Now we have a connection graph where the requested researcher is the center. But what about the connections between co-authors? If we want to learn more about the connections, we can repeat the process of ORCID request for the list co-authors and get researcher relationships for each of them, then add those into the graph.

In [None]:
# Fetch relationships between all co-authors
API_KEY = os.environ.get("API_KEY")


# This may take a while depending on the number of requests
for a in rf['orcid']:
    url = f'https://augmentapi.researchgraph.com/v1/orcid/{a}?subscription-key={API_KEY}'
    r = requests.get(url)

    # print a short confirmation on completion
    print('Augment API query complete ', r.status_code)
    
    json = map(force_orcid, r.json()[0]['relationships']['researcher-researcher'])
    ef = pd.DataFrame(json, columns=['from', 'to'])
    
#     filter the relationship by start node in co-auther list
    ef = ef[ef['from'].isin(rf['orcid'].to_list())]
#     add them into graph as edges
    G.add_edges_from(ef.to_numpy())

Finally we show the graph and store it as gexf file.

In [16]:
# Compute positions
pos = nx.spring_layout(G)

options = {
    "font_size": 12,
    "node_size": 50,
    "edge_color": "lightgray",
    "node_color": "#54C48C",
    "linewidths": 0.1,
    "width": 1
}

# Show information about the graph
print(nx.info(G))
print("Network density:", nx.density(G))

# Disable maximum row check for big dataset
alt.data_transformers.disable_max_rows()

# export graph to a gephi file
nx.write_gexf(G, "co-authors.gexf")

# Draw the graph using altair
viz = nxa.draw_networkx(G, pos=pos, node_tooltip='name', **options).properties(width=800, height=800)
viz.interactive()

Graph with 4672 nodes and 5781 edges
Network density: 0.0005298101371622632
