# Conference Assistant Queries | Dimensions API

Goal: find out useful information about speakers, while attending a scientific conference. 

Read the blog post for more: [A semi-automated conference assistant](https://www.michelepasin.org/blog/2022/06/30/a-semi-automated-conference-assistant/)

Rough approach:
* Start from a list of researcher names (e.g. conference speakers)
* Map them to Dimensions Researcher IDs
* Analyse who they are 


---

Author: [Michele Pasin](https://www.michelepasin.org/)



## Prerequisites

In [None]:
#@markdown # Dimensions DSL API: logging in
#@markdown This cell installs Python libraries needs to query the Dimensions API and logs you in.
#@markdown Note: on Google Colab you'll be asked to input credentials each time, so to prevent sharing them accidentally.
#@markdown 
#@markdown For more information, see [Dimcli authentication](https://digital-science.github.io/dimcli/getting-started.html#authentication) and [Jupyter notebooks](https://digital-science.github.io/dimcli/getting-started.html#dimcli-with-jupyter-notebooks) documentation sections.
#@markdown 
#@markdown 
#@markdown Dimensions API endpoint:
ENDPOINT = "https://app.dimensions.ai" #@param {type: "string"}

!pip install dimcli pyvis plotly -U --quiet

import sys
import pandas as pd
import plotly.express as px 
import dimcli
from dimcli.utils import *

if 'google.colab' in sys.modules:
  import getpass
  KEY = getpass.getpass(prompt='API Key: ')
else:
  KEY = ""

print("==\nLogging in..")
dimcli.login(key=KEY, endpoint=ENDPOINT)
dsl = dimcli.Dsl()

API Key: ··········
==
Logging in..
[2mDimcli - Dimensions API Client (v0.9.9.1)[0m
[2mConnected to: <https://app.dimensions.ai/api/dsl> - DSL v2.1[0m
[2mMethod: manual login[0m


## Scientometrics: A first look at the broader research area via its top publications

In [None]:
QUERY1 = "scientometric OR bibliometric or \"science of science\" " #@param {type: "string"}


dsl.query(f'''
search publications
    in title_abstract_only
    for """{QUERY1}"""
    where year >= 2010
return publications[id+title+journal+year+times_cited+recent_citations+category_for]
sort by times_cited desc
limit 50
''').as_dataframe(links=True, nice=True)

Returned Publications: 50 (total = 102)
[2mTime: 0.90s[0m


Unnamed: 0,Title,Source title,PubYear,Times cited,FOR (ANZSRC) Categories,Recent Citations
0,Choosing experiments to accelerate collective discovery,Proceedings of the National Academy of Sciences of the United States of America,2015,100,17 Psychology and Cognitive Sciences; 1701 Psychology,31
1,The nearly universal link between the age of past knowledge and tomorrow’s breakthroughs in science and technology: The hotspot,Science Advances,2017,61,"2005 Literary Studies; 20 Language, Communication and Culture",17
2,"Productivity, prominence, and the effects of academic environment",Proceedings of the National Academy of Sciences of the United States of America,2019,55,17 Psychology and Cognitive Sciences; 1701 Psychology,34
3,Over-optimization of academic publishing metrics: observing Goodhart’s Law in action,GigaScience,2019,53,11 Medical and Health Sciences; 1103 Clinical Sciences,29
4,The chaperone effect in scientific publishing,Proceedings of the National Academy of Sciences of the United States of America,2018,53,06 Biological Sciences; 0608 Zoology,33
5,ISRIA statement: ten-point guidelines for an effective process of research impact assessment,Health Research Policy and Systems,2018,52,1605 Policy and Administration; 1117 Public Health and Health Services; 11 Medical and Health Sciences; 16 Studies in Human Society,22
6,pybliometrics: Scriptable bibliometrics using a Python interface to Scopus,SoftwareX,2019,44,08 Information and Computing Sciences; 0806 Information Systems,37
7,Scientific prize network predicts who pushes the boundaries of science,Proceedings of the National Academy of Sciences of the United States of America,2018,39,2103 Historical Studies; 21 History and Archaeology,21
8,Making Climate-Science Communication Evidence-Based — All the Way Down,SSRN Electronic Journal,2013,36,17 Psychology and Cognitive Sciences; 1701 Psychology,8
9,Quantifying the cognitive extent of science,Journal of Informetrics,2015,35,0102 Applied Mathematics; 01 Mathematical Sciences; 0807 Library and Information Studies; 08 Information and Computing Sciences,15


In [None]:
%%dsldf --nice --links

search publications
    in title_abstract_only
    for """ scientometric OR bibliometric or "science of science"  """
    where year >= 2010
return publications[id+title+journal+year+times_cited+recent_citations+category_for]
sort by times_cited desc
limit 100

Returned Publications: 100 (total = 102)
[2mTime: 1.18s[0m


Unnamed: 0,Title,Source title,PubYear,Times cited,FOR (ANZSRC) Categories,Recent Citations
0,Choosing experiments to accelerate collective discovery,Proceedings of the National Academy of Sciences of the United States of America,2015,100,17 Psychology and Cognitive Sciences; 1701 Psychology,31
1,The nearly universal link between the age of past knowledge and tomorrow’s breakthroughs in science and technology: The hotspot,Science Advances,2017,61,"2005 Literary Studies; 20 Language, Communication and Culture",17
2,"Productivity, prominence, and the effects of academic environment",Proceedings of the National Academy of Sciences of the United States of America,2019,55,17 Psychology and Cognitive Sciences; 1701 Psychology,34
3,Over-optimization of academic publishing metrics: observing Goodhart’s Law in action,GigaScience,2019,53,11 Medical and Health Sciences; 1103 Clinical Sciences,29
4,The chaperone effect in scientific publishing,Proceedings of the National Academy of Sciences of the United States of America,2018,53,06 Biological Sciences; 0608 Zoology,33
5,ISRIA statement: ten-point guidelines for an effective process of research impact assessment,Health Research Policy and Systems,2018,52,1605 Policy and Administration; 1117 Public Health and Health Services; 11 Medical and Health Sciences; 16 Studies in Human Society,22
6,pybliometrics: Scriptable bibliometrics using a Python interface to Scopus,SoftwareX,2019,44,08 Information and Computing Sciences; 0806 Information Systems,37
7,Scientific prize network predicts who pushes the boundaries of science,Proceedings of the National Academy of Sciences of the United States of America,2018,39,2103 Historical Studies; 21 History and Archaeology,21
8,Making Climate-Science Communication Evidence-Based — All the Way Down,SSRN Electronic Journal,2013,36,17 Psychology and Cognitive Sciences; 1701 Psychology,8
9,Quantifying the cognitive extent of science,Journal of Informetrics,2015,35,0102 Applied Mathematics; 01 Mathematical Sciences; 0807 Library and Information Studies; 08 Information and Computing Sciences,15


## Complex Systems: A first look at the broader research area via its top publications

In [None]:
QUERY2 = "\"complex systems\" AND networks" #@param {type: "string"}


dsl.query(f'''
search publications
    in title_abstract_only
    for """{QUERY2}"""
    where year >= 2010
return publications[id+title+journal+year+times_cited+recent_citations+category_for]
sort by times_cited desc
limit 50
''').as_dataframe(links=True, nice=True)

Returned Publications: 50 (total = 11316)
[2mTime: 0.93s[0m


Unnamed: 0,Title,Source title,PubYear,Times cited,FOR (ANZSRC) Categories,Recent Citations
0,Community detection in graphs,Physics Reports,2010,6754,01 Mathematical Sciences; 02 Physical Sciences,868
1,Functional Network Organization of the Human Brain,Neuron,2011,2791,1109 Neurosciences; 11 Medical and Health Sciences,643
2,BrainNet Viewer: A Network Visualization Tool for Human Brain Connectomics,PLOS ONE,2013,2410,08 Information and Computing Sciences; 0806 Information Systems,725
3,A Tractable Approach to Coverage and Rate in Cellular Networks,IEEE Transactions on Communications,2011,2258,08 Information and Computing Sciences; 09 Engineering; 10 Technology; 0906 Electrical and Electronic Engineering; 0804 Data Format; 1005 Communications Technologies,182
4,Controllability of complex networks,Nature,2011,2103,0102 Applied Mathematics; 01 Mathematical Sciences; 0101 Pure Mathematics,306
5,The structure and dynamics of multilayer networks,Physics Reports,2014,2015,01 Mathematical Sciences; 02 Physical Sciences,438
6,Multilayer networks,Journal of Complex Networks,2014,1918,0102 Applied Mathematics; 0103 Numerical and Computational Mathematics; 0101 Pure Mathematics; 01 Mathematical Sciences,421
7,Spatial networks,Physics Reports,2011,1567,01 Mathematical Sciences; 02 Physical Sciences,272
8,Structural and molecular interrogation of intact biological systems,Nature,2013,1474,0601 Biochemistry and Cell Biology; 06 Biological Sciences,282
9,False data injection attacks against state estimation in electric power grids,ACM Transactions on Privacy and Security,2011,1267,08 Information and Computing Sciences; 0801 Artificial Intelligence and Image Processing,302


## Who are the presenters? 

In [None]:
# Add here your list of presenters 

people = [
    "Roberta Sinatra",
    "Marcia R. Ferreira",
    "Cassidy Sugimoto",
    "Nicolas Robinson Garcia",
    "Vincent Larivière",
    "Rodrigo Costas",
    "Fariba Karimi",    
]

s = ' OR '.join([f' "{dsl_escape(x)}" ' for x in people]) 


# Disambiguating names using Dimensions IDs

dsl.query(f'''

search researchers for """ {s} """ 
    where obsolete=0 
return researchers[basics+total_grants+total_publications]
sort by total_publications desc

''').as_dataframe(links=True, nice=True)

Returned Researchers: 18 (total = 18)
[2mTime: 1.63s[0m


Unnamed: 0,Researcher ID,First Name,Last Name,Orcid IDs,Research organizations,GRID IDs,Countries,Total Grants,Total Publications
0,ur.013433750473.60,Vincent,Larivière,,Indiana University Bloomington; Simon Fraser University; University of Montreal; Leiden University; Stellenbosch University; University of Quebec; McGill University; University of Quebec at Montreal; Indiana University of Pennsylvania,grid.411377.7; grid.61971.38; grid.14848.31; grid.5132.5; grid.11956.3a; grid.265695.b; grid.14709.3b; grid.38678.32; grid.257427.1,United States; Canada; Canada; Netherlands; South Africa; Canada; Canada; Canada; United States,24,251
1,ur.01340617017.77,Cassidy R,Sugimoto,['0000-0001-8608-3203'],Georgia Institute of Technology; Indiana University; Leiden University; University of North Carolina at Chapel Hill; University of North Carolina System; Stellenbosch University; University of Quebec at Montreal; Indiana University Bloomington,grid.213917.f; grid.257410.5; grid.5132.5; grid.10698.36; grid.410711.2; grid.11956.3a; grid.38678.32; grid.411377.7,United States; United States; Netherlands; United States; United States; South Africa; Canada; United States,5,168
2,ur.013012771111.87,Rodrigo,Costas,['0000-0002-7465-6462'],Stellenbosch University; Spanish National Research Council; Leiden University,grid.11956.3a; grid.4711.3; grid.5132.5,South Africa; Spain; Netherlands,0,140
3,ur.011207156643.42,Fariba,Karimi,['0000-0002-0037-2475'],Umeå University; Leibniz Institute for the Social Sciences; Complexity Science Hub Vienna; University of Koblenz and Landau,grid.12650.30; grid.425053.5; grid.484678.1; grid.5892.6,Sweden; Germany; Austria; Germany,0,42
4,ur.0734240333.93,Roberta,Sinatra,['0000-0002-7558-1028'],Dana-Farber Cancer Institute; University of Copenhagen; Complexity Science Hub Vienna; University of Catania; Institute for Scientific Interchange; Northeastern University; IT University of Copenhagen,grid.65499.37; grid.5254.6; grid.484678.1; grid.8158.4; grid.418750.f; grid.261112.7; grid.32190.39,United States; Denmark; Austria; Italy; Italy; United States; Denmark,1,37
5,ur.014454645231.45,Fariba,Karimi,['0000-0002-8594-3790'],"Islamic Azad University, Isfahan",grid.411757.1,Iran,0,13
6,ur.0726731044.16,Fariba,Karimi,['0000-0002-6107-4826'],Shiraz University of Medical Sciences; Shiraz University,grid.412571.4; grid.412573.6,Iran; Iran,0,10
7,ur.010110137501.98,Márcia R,Ferreira,,Complexity Science Hub Vienna; TU Wien; Leiden University,grid.484678.1; grid.5329.d; grid.5132.5,Austria; Austria; Netherlands,0,10
8,ur.0763666021.02,Fariba,Karimi,,University of Hertfordshire; University of Technology Malaysia,grid.5846.f; grid.410877.d,United Kingdom; Malaysia,0,6
9,ur.015272301103.90,Fariba,Karimi,,Isfahan University of Technology,grid.411751.7,Iran,0,4


### A new list of names, with IDs

> TODO: automate this step - for each name, grab the ID with the highest number of publications.

In [None]:
people2 = {
    "ur.0734240333.93": "Roberta Sinatra",
    "ur.010110137501.98" : "Marcia R. Ferreira",
    "ur.01340617017.77" : "Cassidy Sugimoto",
    "ur.015077364510.21" : "Nicolas Robinson Garcia",
    "ur.013433750473.60" : "Vincent Larivière",
    "ur.013012771111.87": "Rodrigo Costas",
    "ur.011207156643.42" : "Fariba Karimi",    
}

## How often are they collaborating?


In [None]:
#
# get all publications for the speakers, using their IDs
pubs = dsl.query_iterative(f'''
search publications 
    where researchers in {json.dumps(list(people2))}
return publications[id+authors] 
''')



#
# GLOBAL VARIABLES
# master list of author names: {"ID" : "name"} / boostrap from the speakers list to ensure that they are in it even if no publications exist for them
AUTHORS = people2.copy()
 # edges, in the form of {ID_from : {ID_to : count}}
COLLABS = {}


#
# fill in the authors and collabs datasets
for pub1 in pubs.publications:
  for author in pub1['authors']:
    authid_from = author.get("researcher_id", None)
    if authid_from:
      if authid_from not in AUTHORS:
        AUTHORS[authid_from] = author.get("first_name", "") + " " + author.get("last_name", "")
      if authid_from not in COLLABS:
        COLLABS[authid_from] = {}
      # do another iteration to generate edges
      for author2 in pub1['authors']:
        authid_to = author2.get("researcher_id", None)
        if authid_to and authid_to != authid_from:
          try:
            COLLABS[authid_from][authid_to] += 1
          except:
            COLLABS[authid_from][authid_to] = 1



#
# Support functions
#

import itertools

def how_often(res1, res2):
  """"Count how often two IDs collaborate
  Eg how_often('ur.01077072115.46', 'ur.0657111367.32')"""
  try:
    return COLLABS[res1][res2]
  except:
    if False: # for QA
      if res1 not in COLLABS:
        print(res1, "not in index")
      if res1 in COLLABS and res2 not in COLLABS[res1]:
        print(res2, "not in index for", res1)
    return 0


def how_often_bulk(res_list):
  "Calc all combinations for a list of researchers, and return how often then collaborated"
  res = []
  for pair in itertools.combinations(res_list, 2):
    papers = how_often(pair[0], pair[1])
    # print(pair, papers)
    res += [(pair, papers)]
  return res



#
# Visualize the collaboration network
#

from dimcli.utils.networkviz import NetworkViz

net = NetworkViz(notebook=True, width="100%", height="800px")

SIZE_SPEAKERS = 30
SIZE_OTHERS = 20
MIN_PAPERS = 5 # min papers in common to display an edge

for data in how_often_bulk(AUTHORS):

  res_from, res_to, papers = data[0][0], data[0][1], data[1]

  if res_from in list(people2) and res_to in list(people2):
    # rels between speakers
    net.add_node(n_id=res_from, label=AUTHORS[res_from], size=SIZE_SPEAKERS, color={"background": "yellow"})
    net.add_node(n_id=res_to, label=AUTHORS[res_to], size=SIZE_SPEAKERS, color={"background": "yellow"})
    if papers:
      net.add_edge(res_from, res_to, value=papers, label=f"{papers}")

  elif res_from in list(people2) and papers >= MIN_PAPERS:
    net.add_node(n_id=res_to, label=AUTHORS[res_to], size=SIZE_OTHERS,  color={"background": "lightblue"})
    net.add_edge(res_from, res_to, value=papers, label=f"{papers}")



net.show('collaboration.html')



Starting iteration with limit=1000 skip=0 ...[0m
0-540 / 540 (0.94s)[0m
===
Records extracted: 540[0m


## Topics 

In [None]:
# #
# # get all publications+concepts for the speakers, using their IDs
pubs = dsl.query_iterative(f'''
search publications 
    where researchers in {json.dumps(list(people2))}
return publications[id+concepts_scores]
''')

print("===\nExtracting Concepts.. ")
concepts = pubs.as_dataframe_concepts()
concepts_unique = concepts.drop_duplicates("concept")[['concept', 'frequency', 'score_avg']]
print("===\nConcepts Found (total):", len(concepts))
print("===\nUnique Concepts Found:", len(concepts_unique))
print("===\nConcepts with frequency major than 1:", len(concepts_unique.query("frequency > 1")))


#@markdown ## Define the best parameters to isolate 'interesting' concepts
#@markdown Frequency: how many documents include a concept (100 = no upper limit).
#@markdown Tip: concepts with very high frequencies tend to be common words, so it is useful to exclude them.
FREQ_MIN = 2 #@param {type: "slider", min: 1, max: 10, step:1}
FREQ_MAX = 100 #@param {type: "slider", min: 10, max: 100, step:10}
#@markdown ---
#@markdown Score: the average relevancy score of concepts, for the dataset we extracted above.
#@markdown This value tends to be a good indicator of 'interesting' concepts.
SCORE_MIN = 0.6  #@param {type: "slider", min: 0, max: 1, step:0.1}
#@markdown ---
#@markdown Select how many concepts to include in the visualization
MAX_CONCEPTS = 200 #@param {type: "slider", min: 20, max: 1000, step:10}

if FREQ_MAX == 100:
  FREQ_MAX = 100000000
print(f"""You selected:\n->{MAX_CONCEPTS}\n->{FREQ_MIN}-{FREQ_MAX}\n->{SCORE_MIN}""")

filtered_concepts = concepts_unique.query(f"""frequency >= {FREQ_MIN} & frequency <= {FREQ_MAX} & score_avg >= {SCORE_MIN} """)\
                    .sort_values(["score_avg", "frequency"], ascending=False)[:MAX_CONCEPTS]

px.scatter(filtered_concepts,
           x="concept",
           y="frequency",
           height=700,
           color="score_avg",
           size="score_avg")

Starting iteration with limit=1000 skip=0 ...[0m
0-540 / 540 (2.37s)[0m
===
Records extracted: 540[0m


===
Extracting Concepts.. 
===
Concepts Found (total): 25682
===
Unique Concepts Found: 7979
===
Concepts with frequency major than 1: 3124
You selected:
->200
->2-100000000
->0.6
