# Extracting researchers data based on country affiliation 

In this notebook we show how to 

* a. query for grants data using the Dimensions API
* b. use the researchers and affiliations included in these grants to generate a smaller data including only German researchers. 

### Prerequisites

First we load some Python libraries and log into the Dimensions API, using [Dimcli](https://api-lab.dimensions.ai/cookbooks/1-getting-started/1-Using-the-Dimcli-library-to-query-the-API.html), the official Dimensions client. 


In [3]:
# @markdown # Get the API library and login 
# @markdown Click the 'play' button on the left (or shift+enter) after entering your API credentials

username = "" #@param {type: "string"}
password = "" #@param {type: "string"}
endpoint = "https://app.dimensions.ai" #@param {type: "string"}

!pip install dimcli plotly tqdm -U --quiet

# load common libraries
import pandas as pd
from pandas.io.json import json_normalize

import time
import json
import sys
from tqdm.notebook import tqdm as progress

import plotly.express as px
from plotly.offline import plot
if not 'google.colab' in sys.modules:
  # make js dependecies local / needed by html exports 
  from plotly.offline import init_notebook_mode
  init_notebook_mode(connected=True)

import dimcli
from dimcli.shortcuts import *

dimcli.login(username, password, endpoint)
dsl = dimcli.Dsl()


DimCli v0.6.7 - Succesfully connected to <https://app.dimensions.ai> (method: manual login)


### Query the API

For more info on grants search fields see https://docs.dimensions.ai/dsl/datasource-grants.html

In [16]:
data = dslquery("""
search grants in title_abstract_only for "gene therapy" 
  where start_year=2019 and research_org_countries.id="DE" 
return grants[id+title+abstract+research_orgs+investigator_details+funders+funding_usd+research_org_countries+research_org_cities+start_date+end_date] limit 1000
""")

Returned Grants: 88 (total = 88)


### Quick look at the grants data we obtained

In [17]:
grants = data.as_dataframe()
grants.head(5)

Unnamed: 0,funding_usd,research_org_countries,funders,end_date,id,investigator_details,start_date,research_orgs,abstract,research_org_cities,title
0,588764.0,"[{'id': 'CH', 'name': 'Switzerland'}, {'id': '...","[{'id': 'grid.425888.b', 'city_name': 'Bern', ...",2023-11-30,grant.8712748,"[{'first_name': 'Gabriele', 'id': 'ur.01067661...",2019-12-01,"[{'id': 'grid.8591.5', 'city_name': 'Geneva', ...","Worldwide, millions of patients are affected b...","[{'id': 2660646, 'name': 'Genève'}, {'id': 266...","Integrative, transposon based vectors in ocula..."
1,294945.0,"[{'id': 'DE', 'name': 'Germany'}, {'id': 'AU',...","[{'id': 'grid.270680.b', 'city_name': 'Brussel...",2022-07-31,grant.7911778,"[{'last_name': 'SCHMIDT', 'first_name': 'Karin...",2019-08-01,"[{'id': 'grid.6363.0', 'city_name': 'Berlin', ...","Demographic change includes population ageing,...","[{'id': 2950159, 'name': 'Berlin'}, {'id': 215...",Structural basis for the therapeutic efficienc...
2,181848.0,"[{'id': 'DE', 'name': 'Germany'}]","[{'id': 'grid.270680.b', 'city_name': 'Brussel...",2021-07-31,grant.8584963,"[{'last_name': 'Hormanseder', 'first_name': 'E...",2019-08-01,"[{'id': 'grid.4567.0', 'city_name': 'Munich', ...",Vertebrate eggs can induce the reprogramming o...,"[{'id': 2867714, 'name': 'Munich'}]",The Molecular Mechanisms of Cell Fate Reprogra...
3,452410.0,"[{'id': 'CH', 'name': 'Switzerland'}, {'id': '...","[{'id': 'grid.425888.b', 'city_name': 'Bern', ...",2022-06-30,grant.8483684,"[{'first_name': 'Michael', 'id': 'ur.074452635...",2019-07-01,"[{'id': 'grid.7400.3', 'city_name': 'Zurich', ...",The median survival of patients with glioblast...,"[{'id': 2657896, 'name': 'Zürich'}, {'id': 293...",The role of interferon type I signalling in th...
4,2724509.0,"[{'id': 'IE', 'name': 'Ireland'}, {'id': 'GB',...","[{'id': 'grid.270680.b', 'city_name': 'Brussel...",2021-05-31,grant.8586122,[],2019-06-01,,"The detailed analysis of live cells, as shed e...","[{'id': 2964574, 'name': 'Dublin'}, {'id': 263...",Intelligent Live Cell Analysis [iLCA] to trans...


### Quick look at the investigators mentioned in those grants

In [18]:
investigators = data.as_dataframe_investigators()
investigators.head(5)

Unnamed: 0,first_name,id,affiliations,last_name,role,middle_name,grant_id,grant_title,grant_start_date,grant_end_date
0,Gabriele,ur.01067661752.69,"[{'city': None, 'city_id': '2660646', 'country...",Thumann,PI,,grant.8712748,"Integrative, transposon based vectors in ocula...",2019-12-01,2023-11-30
1,Zoltán,ur.0665574302.37,"[{'city': None, 'city_id': '2881279', 'country...",Ivics,Co-PI,,grant.8712748,"Integrative, transposon based vectors in ocula...",2019-12-01,2023-11-30
2,Thais,ur.016520574317.67,"[{'city': None, 'city_id': '2660646', 'country...",Bascuas-Castillo,Co-PI,,grant.8712748,"Integrative, transposon based vectors in ocula...",2019-12-01,2023-11-30
3,Gregg,ur.014444501317.03,,Sealy,Co-PI,,grant.8712748,"Integrative, transposon based vectors in ocula...",2019-12-01,2023-11-30
4,Karin,ur.011513133663.69,,SCHMIDT,PI,,grant.7911778,Structural basis for the therapeutic efficienc...,2019-08-01,2022-07-31


### Let's filter for German investigators only

One way to do this is by building a new dataframe that 'explodes' the `affiliations` column in the investigators list. 

Then we use the field `aff_country_code` to extract only German researchers. 


In [0]:
affiliations = pd.json_normalize(json.loads(investigators.to_json(orient='records')), record_path=['affiliations'], 
               meta=['id', 'first_name', 'last_name',  'role', 'grant_id', 'grant_title', 'grant_start_date', 'grant_end_date'], record_prefix='aff_')

In [23]:
german_affiliations = affiliations.query("aff_country_code == 'DE'")
german_affiliations.head(10)

Unnamed: 0,aff_city,aff_city_id,aff_country,aff_id,aff_name,aff_state_code,aff_state,aff_country_code,id,first_name,last_name,role,grant_id,grant_title,grant_start_date,grant_end_date
1,,2881279,,grid.425396.f,Paul Ehrlich Institut,,,DE,ur.0665574302.37,Zoltán,Ivics,Co-PI,grant.8712748,"Integrative, transposon based vectors in ocula...",2019-12-01,2023-11-30
3,NEUHERBERG,2867714,DE,grid.4567.0,HELMHOLTZ ZENTRUM MUENCHEN DEUTSCHES FORSCHUNG...,,,DE,ur.012511757223.70,Eva,Hormanseder,PI,grant.8584963,The Molecular Mechanisms of Cell Fate Reprogra...,2019-08-01,2021-07-31
5,,2934246,,grid.411327.2,Institut für Neuropathologie Heinrich Heine Un...,,,DE,ur.07524506162.54,Guido,Reifenberger,Co-PI,grant.8483684,The role of interferon type I signalling in th...,2019-07-01,2022-06-30
6,HANNOVER,2910831,DE,grid.10423.34,MEDIZINISCHE HOCHSCHULE HANNOVER,,,DE,ur.01125237061.67,Axel Rainer,Schambach,PI,grant.8104346,Gene therapy of inherited and acquired hearing...,2019-05-01,2024-04-30
8,,2867714,,grid.5252.0,Max-Eder Research Group 4 Pediatric Sarcomas I...,,,DE,ur.013312575257.07,Thomas G. P.,Grünewald,Co-PI,grant.7922081,Ewing Sarcoma metastatic invasion dissected th...,2019-02-01,2022-01-31
10,,2867543,Germany,grid.16149.3b,Universitätsklinikum Münster,,,DE,ur.0714336750.46,Astrid,Jeibmann,PI,grant.8102456,Functional role of genes and pathways in histo...,2019-01-01,
11,,2867543,Germany,grid.16149.3b,Universitätsklinikum Münster,,,DE,ur.01137070015.52,Martin,Hasselblatt,PI,grant.8102456,Functional role of genes and pathways in histo...,2019-01-01,
12,,2928810,Germany,grid.410718.b,Universitätsklinikum Essen,,,DE,ur.01372566033.18,Jens,Siveke,PI,grant.8102625,Identification of regulators of therapy-induce...,2019-01-01,
13,,2918632,Germany,grid.7450.6,Georg-August-Universität Göttingen,,,DE,ur.01321542450.71,Antoine,Huet,PI,grant.8577173,Optical Stimulation of the Auditory Pathway by...,2019-01-01,
14,,2918632,Germany,grid.7450.6,Georg-August-Universität Göttingen,,,DE,ur.01117732564.89,Tobias,Moser,PI,grant.8577173,Optical Stimulation of the Auditory Pathway by...,2019-01-01,


### Downloading the data as CSV


In [0]:
german_affiliations.to_csv("german_affiliations.csv")