# QA of DSL Release v1.25
The purpose of this notebook is to review and test the new features in this DSL version.

Docs: https://docs.dimensions.ai/dsl/1.25.0-preview/


## Prerequisites

Please install the latest versions of these libraries to run this notebook. 

In [1]:
#
# load common libraries
import pandas as pd
from pandas.io.json import json_normalize

import time
import json
from tqdm.notebook import tqdm as progress

import plotly.express as px
from plotly.offline import plot 

import dimcli
from dimcli.shortcuts import *

dimcli.login(instance="test")
dsl = dimcli.Dsl() 


DimCli v0.6.8.1 - Succesfully connected to <https://integration.ds-metrics.com> (method: dsl.ini file)


---

## [DSL-420] Concepts_v2

https://uberresearch.atlassian.net/browse/DSL-420

### returning concepts

In [6]:
data = dsl.query("""search publications where id="pub.1059874924" return publications[id+concepts]
""") 

Returned Publications: 1 (total = 1)


In [7]:
data.json

{'_stats': {'total_count': 1},
 '_version': {'release': '1.25', 'version': '1.25.0-preview'},
 'publications': [{'id': 'pub.1059874924',
   'concepts': ['Semantic Web',
    'high-level semantic analysis',
    'CIDOC-CRM',
    'knowledge representation principles',
    'World Wide Web',
    'interoperable approach',
    'Wide Web',
    'formal ontology',
    'formal model',
    'computer ontology',
    'representation principles',
    'semantic analysis',
    'related technologies',
    'digital humanities',
    'Web',
    'ontology',
    'identification of people',
    'integrated approach',
    'database',
    'information',
    'extensibility',
    'modularity',
    'factoid',
    'technology',
    'model',
    'principles',
    'model of structure',
    'recent developments',
    'solution',
    'standards',
    'people',
    'notion',
    'development',
    'data',
    'research',
    'source',
    'prosopography',
    'identification',
    'purpose',
    'branches',
    'Departmen

In [13]:
%dsldf search publications where id="pub.1059874924" return publications[id+concepts]

Returned Publications: 1 (total = 1)


Unnamed: 0,id,concepts
0,pub.1059874924,"[Semantic Web, high-level semantic analysis, C..."


### Returning concepts and scores

In [17]:
%dsl search publications where id="pub.1059874924" return publications[id+concepts+concepts_scores]

Returned Publications: 1 (total = 1)


<dimcli.Dataset object #4700797584. Records: 1/1>

In [19]:
dsl_last_results.publications[0]['concepts_scores']

[{'concept': 'Semantic Web', 'relevance': 0.7716195004057621},
 {'concept': 'high-level semantic analysis', 'relevance': 0.749076263104091},
 {'concept': 'CIDOC-CRM', 'relevance': 0.718139143774516},
 {'concept': 'knowledge representation principles',
  'relevance': 0.7166532653448511},
 {'concept': 'World Wide Web', 'relevance': 0.7045857257572841},
 {'concept': 'interoperable approach', 'relevance': 0.659387349107736},
 {'concept': 'Wide Web', 'relevance': 0.653173190881922},
 {'concept': 'formal ontology', 'relevance': 0.6512618799620411},
 {'concept': 'formal model', 'relevance': 0.64159173738255},
 {'concept': 'computer ontology', 'relevance': 0.6401696373229391},
 {'concept': 'representation principles', 'relevance': 0.63724084595062},
 {'concept': 'semantic analysis', 'relevance': 0.6327758452727871},
 {'concept': 'related technologies', 'relevance': 0.6180713045534031},
 {'concept': 'digital humanities', 'relevance': 0.5889037977116951},
 {'concept': 'Web', 'relevance': 0.58522

### searching in concepts

In [10]:
%dsldf search publications in concepts for "prosopography" return publications[id+title+concepts]

Returned Publications: 20 (total = 237)


Unnamed: 0,id,concepts,title
0,pub.1127126922,"[personal trajectories, pre-modern era, urban ...",Entre servicio regio y estrategia personal: lo...
1,pub.1124902386,"[early Islamic empire, imperial rule, Islamic ...",An Empire of Elites: Mobility in the Early Isl...
2,pub.1124595493,"[Scottish Reformation, clerical career, Church...",Mapping the Scottish Reformation:
3,pub.1125714817,"[Society of Jesus, religious formation, philos...",Kształcenie jezuitów prowincji litewskiej międ...
4,pub.1124114342,"[Roman imperial period, cult personnel, imperi...",Per una prosopografia dei sacerdoti e delle sa...
5,pub.1124795615,"[family origin, Grand Duchy, political influen...",Genealogical Researches of Karpiai Family: the...
6,pub.1122171735,"[history of scholarship, handful of individual...","Writing History from the Geniza: Issues, Metho..."
7,pub.1112882332,"[sugar trade, institutional choice, reputation...",Institutional choice in the governance of the ...
8,pub.1124794164,"[Italian welfare state, traditional political ...",Guerra Total y Reforma Social: avances adminis...
9,pub.1122477591,"[large Australian corporations, prior business...",The Boarding Pass: Pathways to Corporate Netwo...


### Function

With `return_scores=True`

In [20]:
pd.DataFrame.from_dict(
dsl.query("""
extract_concepts("Structured Prosopography provides a formal model for representing prosopography: a branch of historical research that traditionally has focused on the identification of people that appear in historical sources. Since the 1990s, KCL’s Department of Digital Humanities has been involved in the development of structured prosopographical databases using a general ‘factoid-oriented’ model of structure that links people to the information about them via spots in primary sources that assert that information. Recent developments, particularly the World Wide Web, and its related technologies around the Semantic Web, have promoted the possibility to both interconnecting dispersed data, and allowing it to be queried semantically. To the purpose of making available our prosopographical databases on the Semantic Web, in this article we review the principles behind our established factoid-based model and reformulate it using a more interoperable approach, based on knowledge representation principles and formal ontologies. In particular, we are going to focus primarily on a high-level semantic analysis of the factoid notion, on its relation to other cultural heritage standards such as CIDOC-CRM, and on the modularity and extensibility of the proposed solutions.", return_scores=true)
""")['extracted_concepts']
)

Unnamed: 0,concept,relevance
0,Semantic Web,0.771271
1,high-level semantic analysis,0.748738
2,CIDOC-CRM,0.717815
3,knowledge representation principles,0.716329
4,World Wide Web,0.704267
5,interoperable approach,0.659089
6,Wide Web,0.652878
7,formal ontology,0.650968
8,formal model,0.641302
9,representation principles,0.636953


With `return_scores=false`

In [21]:
pd.DataFrame.from_dict(
dsl.query("""
extract_concepts("Structured Prosopography provides a formal model for representing prosopography: a branch of historical research that traditionally has focused on the identification of people that appear in historical sources. Since the 1990s, KCL’s Department of Digital Humanities has been involved in the development of structured prosopographical databases using a general ‘factoid-oriented’ model of structure that links people to the information about them via spots in primary sources that assert that information. Recent developments, particularly the World Wide Web, and its related technologies around the Semantic Web, have promoted the possibility to both interconnecting dispersed data, and allowing it to be queried semantically. To the purpose of making available our prosopographical databases on the Semantic Web, in this article we review the principles behind our established factoid-based model and reformulate it using a more interoperable approach, based on knowledge representation principles and formal ontologies. In particular, we are going to focus primarily on a high-level semantic analysis of the factoid notion, on its relation to other cultural heritage standards such as CIDOC-CRM, and on the modularity and extensibility of the proposed solutions.", return_scores=false)
""")['extracted_concepts']
)

Unnamed: 0,0
0,Semantic Web
1,high-level semantic analysis
2,CIDOC-CRM
3,knowledge representation principles
4,World Wide Web
5,interoperable approach
6,Wide Web
7,formal ontology
8,formal model
9,representation principles


## [DSL-237] Plan how to stop returning results where Researcher.obsolete = 1

https://uberresearch.atlassian.net/browse/DSL-237

* we keep things as they are (queries return obsolete and not) 
* add a new payload object (eg notes) that acts as a reminder 
* DOCUMENT this well
* deprecate the function check_researchers_id as it made sense to have it before the researchers core existed 

In [22]:
%dsl search researchers for "penrose" return researchers

Returned _notes: 1
Returned Researchers: 0


<dimcli.Dataset object #4700622928. Records: 1/0>

In [23]:
%dsl search researchers for "penrose" where obsolete=0 return researchers

Returned _notes: 1
Returned Researchers: 0


<dimcli.Dataset object #4700632080. Records: 1/0>

In [24]:
dsl.query('check_researcher_ids(["ur.011301404166.06", "ur.07433432213.73"])')

Function check_researcher_ids is considered deprecated. Please consult the documentation for more details: https://docs.dimensions.ai/dsl/releasenotes.html#deprecated-functions


<dimcli.Dataset object #4700819728. Records: 2/2>

## [DSL-438] warning message wrong url

https://uberresearch.atlassian.net/browse/DSL-438


In [25]:
%%dsl
search publications
where funders = "grid.25111.36" and year = 2019
return publications[basics+extras+authors+concepts]
limit 1000

Returned Errors: 1
1 EvaluationError found
The response generated by your query is too large, e.g. because it includes records with lots of data. Please review it by keeping in mind the guidelines on https://docs.dimensions.ai/dsl/faq.html#queries-and-errors [code: 2]


<dimcli.Dataset object #4700822032. Errors: 1>

## [DSL-428] remove publications bogus

https://uberresearch.atlassian.net/browse/DSL-428


In [29]:
dsl.query('search publications where id="pub.1090905599" return publications[all]').as_dataframe_authors()

Returned Publications: 1 (total = 1)
Field 'FOR' is deprecated in favor of category_for. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_HC' is deprecated in favor of category_hrcs_hc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'RCDC' is deprecated in favor of category_rcdc. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'references' is deprecated in favor of reference_ids. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'open_access' is deprecated in favor of open_access_categories. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'category_ua' is deprecated in favor of category_uoa. Please refer to https://docs.dimensions.ai/dsl/releasenotes.html for more details
Field 'HRCS_RAC' is deprecated in favor of category_hrcs_rac. Please refer to https://docs.dimensions.ai/dsl/releas

Unnamed: 0,first_name,last_name,raw_affiliation,initials,corresponding,orcid,current_organization_id,researcher_id,affiliations,pub_id
0,Unnur,Styrkarsdottir,[],,,['0000-0001-8146-8278'],grid.421812.c,ur.01205523707.73,[],pub.1090905599
1,Hannes,Helgason,[],,,,,,[],pub.1090905599
2,Asgeir,Sigurdsson,[],,,,grid.137628.9,ur.01067302737.32,[],pub.1090905599
3,Gudmundur L,Norddahl,[],,,,grid.4514.4,ur.0771421022.72,[],pub.1090905599
4,Arna B,Agustsdottir,[],,,,,ur.01067375637.38,[],pub.1090905599
5,Louise N,Reynard,[],,,,grid.1006.7,ur.0647426411.14,[],pub.1090905599
6,Amanda,Villalvilla,[],,,,,,[],pub.1090905599
7,Gisli H,Halldorsson,[],,,['0000-0001-7067-9862'],grid.421812.c,ur.01334675114.72,[],pub.1090905599
8,Aslaug,Jonasdottir,[],,,,grid.421812.c,ur.0726475763.17,[],pub.1090905599
9,Audur,Magnusdottir,[],,,,grid.8761.8,ur.0724300310.21,[],pub.1090905599
